NeatSeq-Flow Cheat-Sheet
Author: Menachem Sklarz
Page Contents:
Input Files
Sample file
Passed to NeatSeq-Flow with the -s
argument.
Includes four sections:
Title
A title for the project:
Title Project_title
Project file information
Two tab-separated columns:
File type
File path
#Type Path
Nucleotide /path/to/genome.fasta
Samples file information
Three tab-separated columns:
Sample ID
File type
File path
Additional columns will be ignored:
#SampleID Type Path lane
Sample1 Forward /path/to/Sample1_F1.fastq.gz 1
Sample1 Forward /path/to/Sample1_F2.fastq.gz 2
Sample1 Reverse /path/to/Sample1_R1.fastq.gz 1
Sample1 Reverse /path/to/Sample1_R2.fastq.gz 2
ChIP-seq
Define ChIP and Control (‘input’) pairs:
Sample_Control anti_sample1:input_sample1
Sample_Control anti_sample2:input_sample2
Parameter file
Passed to NeatSeq-Flow with the -p
argument.
YAML-formatted file with the following three sections.
Tip
The Vars
section is recommended but not compulsory.
Global parameters section
Parameter |
Description |
---|---|
|
SGE, Local or SLURM. (Default: SGE) |
|
The cluster queue (or partition) to use. Default value for qsub |
|
Default nodes on which to execute jobs (Default: All nodes in queue) |
|
Other parameters to pass to qsub |
|
The full path to qsub. Obtain by running |
|
Default: 10. Leave as is |
|
List of paths to repositories of additional modules. (Must be a python directory, containing |
|
|
|
|
|
Setting in global parameters is equivalent to setting |
Attention
The default executor is SGE. For SLURM, sbatch
is used instead of qsub
, e.g. Qsub_nodes
defines the nodes to be used by sbatch.
Attention
If NeatSeq-Flow is executed from within a conda environment with both NeatSeq-Flow and it’s modules installed, module_path
will automatically include the modules repo. If not, you will have to give the path to the location where the modules were installed.
Vars section
Replacements to be made in the parameter file. In YAML format. Referred to in other sections by the dot-notification.
Example:
Vars:
paths:
bwa: /path/to/bwa
samtools: /path/to/samtools
genome: /path/to/genomeDir
In parameter section:
This… |
Becomes this… |
---|---|
|
/path/to/bwa |
|
/path/to/samtools |
|
/path/to/genomeDir |
Step-wise parameters
A series of YAML blocks, one per workflow step to perform. Each block takes the following form:
fqc_trimgal:
module: fastqc_html
base: trim_gal
script_path: {Vars.paths.fastqc}
Types of step parameters:
Required parameters
Parameter |
Description |
---|---|
|
The name of the module of which this step is an instance. |
|
The name of the step(s) on which the current step is based (not required for the |
|
The full path to the script executed by this step. |
Cluster parameters
Passed in a qsub_params
block.
Parameter |
Description |
---|---|
|
A node or YAML list of nodes on which to run the step scripts (overrides global parameter Qsub_nodes) |
|
Will limit the execution of the step’s scripts to this queue (overrides global parameter Qsub_q) |
|
Will set the -pe parameter for all scripts for this module (see SGE qsub manual). |
|
Set the value of qsub parameter -XXX to YYY. This is a way to define other SGE parameters for all step scripts. |
Additional parameters
Parameter |
Description |
---|---|
|
All instances downstream to the tagged instance will have the same tag. All steps with the same tag can be executed with one master script |
|
Will add a line to scripts/95.remove_intermediates.sh for deleting the results of this step |
|
Set various environment variables for the duration of script execution. A string with format |
|
Additional code to be added before the actual script. Rarely used |
|
Use sample- or project-wise files. Check per-module documentation for whether and how this parameter is defined |
|
Limit this step to a subset of the samples. See section Sample list. |
|
Is used to define step specific conda parameters. The syntax is the same as for the global conda definition (see here). |
|
Set the delimiter between program argument and value, e.g. ‘=’ (Default: ‘ ‘) |
|
Use a local directory for intermediate files before copying results to final destination in data dir. |
Redirected parameters
Parameters to be redirected to the actual program executed by the step.
Redirected parameters are specified within a redirects:
block. The parameter name must include the -
or --
required by the program defined in script_path.
Sample list
The sample list enables limiting the instance scripts to a subset of the samples. It can be expressed in two ways:
A YAML list or a comma-separated list of sample names:
sample_list: [sample1, sample2]
By levels of a category (see section Mapping file):
sample_list: category: Category1 levels: [level1,level2]
For using all but a subset of samples, use exclude_sample_list
instead of sample_list
.
Mapping file
Passed to NeatSeq-Flow with --mapping
.
A tab-separated table with at least two columns:
Sample ID
First category name
Additional categories…
Example:
#SampleID Category1 Category2
Sample1 A C
Sample2 A D
Sample3 B C
Sample4 B D
Flow control
Import
Basic mode
NeatSeq-Flow will attempt to guess all the parameters it requires.
Example:
Merge_files:
module: Import
script_path:
Advanced mode
Define source and target slots and how to concatenate the files. Attempts to guess information left out by the user.
Parameter |
Description |
---|---|
|
source slot. |
|
target slot |
|
concatenated file extension. |
|
the scope of the file |
|
the code to use for merging, or one of the following values: |
|
a command through which to pipe the file before storing. |
Value |
Description |
---|---|
|
Guess (script_path, trg and ext) |
|
Do not copy the file, just import it into its slot (only if one file defined for src). |
|
Do not import the file type. |
Example:
merge_data:
module: Import
src: [Forward, Reverse, Nucl]
trg: [fastq.F, fastq.R, fasta.nucl]
script_path: [..import.., cat, 'curl -L']
ext: [null, null, txt]
scope: [sample, sample, project]
manage_types
Import raw data files into the data/ directory.
Value |
Possible values |
Description |
---|---|---|
operation |
add | del | mv | cp |
The operation to perform on the file type. |
scope |
project|sample |
The scope on which to perform the operation. (For ‘mv’ and ‘cp’ this is the source scope) |
type |
The file type on which to perform the operation. (For ‘mv’ and ‘cp’ this is the source type) |
|
scope_trg |
project|sample |
The destination scope for ‘mv’ and ‘cp’ operations |
type_trg |
The destination type for ‘mv’ and ‘cp’ operations. |
|
Path |
For |
Example:
manage_types1:
module: manage_types
base: trinity1
script_path:
scope:[project, sample, sample, project]
operation: [mv,del,cp,add]
type: [fasta.nucl, fasta.nucl, fastq.F, bam]
type_trg: [transcripts.nucl, None ,fastq.main, None]
scope_trg: sample
path: [None, None, None, /path/to/mapping.bam]
merge_table
Used for concatenating tables from samples into one project table, or for concatenating tables from sample sub-samples, according to a mapping file. Any text file can be merged in this way.
Parameter |
Description |
---|---|
header |
The number of header lines the files contain. |
add_filename |
Set to append the source filename to each line in the resulting file. |
ext |
The extension to use in the resulting file. If not specified, uses merged file exts. |
scope |
project or group, if group, you must also specify category. |
Example:
merge_blast_tables:
module: merge_table
base: merge1
script_path:
type: [blast.prot,fasta.nucl]
header: 0
ext: [out,fna]
Reserved words
When writing new modules, the following words are conserved and should not be used for as parameters:
module
base
script_path
setenv
redirect
qsub_params
tag
conda
precode