NeatSeq-Flow Cheat-Sheet¶
Author: Menachem Sklarz
Page Contents:
Input Files¶
Sample file¶
Passed to NeatSeq-Flow with the -s
argument.
Includes four sections:
Project file information¶
Two tab-separated columns:
- File type
- File path
#Type Path
Nucleotide /path/to/genome.fasta
Samples file information¶
Three tab-separated columns:
- Sample ID
- File type
- File path
Additional columns will be ignored:
#SampleID Type Path lane
Sample1 Forward /path/to/Sample1_F1.fastq.gz 1
Sample1 Forward /path/to/Sample1_F2.fastq.gz 2
Sample1 Reverse /path/to/Sample1_R1.fastq.gz 1
Sample1 Reverse /path/to/Sample1_R2.fastq.gz 2
Parameter file¶
Passed to NeatSeq-Flow with the -p
argument.
YAML-formatted file with the following three sections.
Tip
The Vars
section is recommended but not compulsory.
Global parameters section¶
Parameter | Description |
---|---|
Executor |
SGE, Local or SLURM. (Default: SGE) |
Qsub_q |
The cluster queue (or partition) to use. Default value for qsub –q parameter. Required |
Qsub_nodes |
Default nodes on which to execute jobs (Default: All nodes in queue) |
Qsub_opts |
Other parameters to pass to qsub |
Qsub_path |
The full path to qsub. Obtain by running which qsub (default: qsub is in path) |
Default_wait |
Default: 10. Leave as is |
module_path |
List of paths to repositories of additional modules. (Must be a python directory, containing __init__.py |
job_limit |
|
conda |
path and env , defining the path to the environment you want to use and its name (see here). |
setenv |
Setting in global parameters is equivalent to setting setenv in all steps (see section Additional parameters. |
Attention
The default executor is SGE. For SLURM, sbatch
is used instead of qsub
, e.g. Qsub_nodes
defines the nodes to be used by sbatch.
Attention
If NeatSeq-Flow is executed from within a conda environment with both NeatSeq-Flow and it’s modules installed, module_path
will automatically include the modules repo. If not, you will have to give the path to the location where the modules were installed.
Vars section¶
Replacements to be made in the parameter file. In YAML format. Referred to in other sections by the dot-notification.
Example:
Vars:
paths:
bwa: /path/to/bwa
samtools: /path/to/samtools
genome: /path/to/genomeDir
In parameter section:
This… | Becomes this… |
---|---|
{Vars.paths.bwa} |
/path/to/bwa |
{Vars.paths.samtools} |
/path/to/samtools |
{Vars.genome} |
/path/to/genomeDir |
Step-wise parameters¶
A series of YAML blocks, one per workflow step to perform. Each block takes the following form:
fqc_trimgal:
module: fastqc_html
base: trim_gal
script_path: {Vars.paths.fastqc}
Types of step parameters:
Required parameters¶
Parameter | Description |
---|---|
module |
The name of the module of which this step is an instance. |
base |
The name of the step(s) on which the current step is based (not required for the Import step, which is always first and single) |
script_path |
The full path to the script executed by this step. |
Cluster parameters¶
Passed in a qsub_params
block.
Parameter | Description |
---|---|
node |
A node or YAML list of nodes on which to run the step scripts (overrides global parameter Qsub_nodes) |
queue or -q |
Will limit the execution of the step’s scripts to this queue (overrides global parameter Qsub_q) |
-pe |
Will set the -pe parameter for all scripts for this module (see SGE qsub manual). |
-XXX: YYY |
Set the value of qsub parameter -XXX to YYY. This is a way to define other SGE parameters for all step scripts. |
Additional parameters¶
Parameter | Description |
---|---|
tag |
All instances downstream to the tagged instance will have the same tag. All steps with the same tag can be executed with one master script |
intermediate |
Will add a line to scripts/95.remove_intermediates.sh for deleting the results of this step |
setenv |
Set various environment variables for the duration of script execution. A string with format ENV="value for env1" ENV2="new value for env2" |
precode |
Additional code to be added before the actual script. Rarely used |
scope |
Use sample- or project-wise files. Check per-module documentation for whether and how this parameter is defined |
sample_list |
Limit this step to a subset of the samples. See section Sample list. |
conda |
Is used to define step specific conda parameters. The syntax is the same as for the global conda definition (see here). |
arg_separator |
Set the delimiter between program argument and value, e.g. ‘=’ (Default: ‘ ‘) |
local |
Use a local directory for intermediate files before copying results to final destination in data dir. |
Redirected parameters¶
Parameters to be redirected to the actual program executed by the step.
Redirected parameters are specified within a redirects:
block. The parameter name must include the -
or --
required by the program defined in script_path.
Sample list¶
The sample list enables limiting the instance scripts to a subset of the samples. It can be expressed in two ways:
A YAML list or a comma-separated list of sample names:
sample_list: [sample1, sample2]
By levels of a category (see section Mapping file):
sample_list: category: Category1 levels: [level1,level2]
For using all but a subset of samples, use exclude_sample_list
instead of sample_list
.
Mapping file¶
Passed to NeatSeq-Flow with --mapping
.
A tab-separated table with at least two columns:
- Sample ID
- First category name
- Additional categories…
Example:
#SampleID Category1 Category2
Sample1 A C
Sample2 A D
Sample3 B C
Sample4 B D
Flow control¶
Import
¶
Basic mode¶
NeatSeq-Flow will attempt to guess all the parameters it requires.
Example:
Merge_files:
module: Import
script_path:
Advanced mode¶
Define source and target slots and how to concatenate the files. Attempts to guess information left out by the user.
Parameter | Description |
---|---|
src |
source slot. |
trg |
target slot |
ext |
concatenated file extension. |
scope |
the scope of the file |
script_path |
the code to use for merging, or one of the following values: |
pipe |
a command through which to pipe the file before storing. |
Value | Description |
---|---|
..guess.. |
Guess (script_path, trg and ext) |
..import.. |
Do not copy the file, just import it into its slot (only if one file defined for src). |
..skip.. |
Do not import the file type. |
Example:
merge_data:
module: Import
src: [Forward, Reverse, Nucl]
trg: [fastq.F, fastq.R, fasta.nucl]
script_path: [..import.., cat, 'curl -L']
ext: [null, null, txt]
scope: [sample, sample, project]
manage_types
¶
Import raw data files into the data/ directory.
Value | Possible values | Description |
---|---|---|
operation | add | del | mv | cp | The operation to perform on the file type. |
scope | project|sample | The scope on which to perform the operation. (For ‘mv’ and ‘cp’ this is the source scope) |
type | The file type on which to perform the operation. (For ‘mv’ and ‘cp’ this is the source type) | |
scope_trg | project|sample | The destination scope for ‘mv’ and ‘cp’ operations |
type_trg | The destination type for ‘mv’ and ‘cp’ operations. | |
Path | For add operation, the value to insert in the file type. |
Example:
manage_types1:
module: manage_types
base: trinity1
script_path:
scope:[project, sample, sample, project]
operation: [mv,del,cp,add]
type: [fasta.nucl, fasta.nucl, fastq.F, bam]
type_trg: [transcripts.nucl, None ,fastq.main, None]
scope_trg: sample
path: [None, None, None, /path/to/mapping.bam]
merge_table
¶
Used for concatenating tables from samples into one project table, or for concatenating tables from sample sub-samples, according to a mapping file. Any text file can be merged in this way.
Parameter | Description |
---|---|
header | The number of header lines the files contain. |
add_filename | Set to append the source filename to each line in the resulting file. |
ext | The extension to use in the resulting file. If not specified, uses merged file exts. |
scope | project or group, if group, you must also specify category. |
Example:
merge_blast_tables:
module: merge_table
base: merge1
script_path:
type: [blast.prot,fasta.nucl]
header: 0
ext: [out,fna]
Reserved words¶
When writing new modules, the following words are conserved and should not be used for as parameters:
- module
- base
- script_path
- setenv
- redirect
- qsub_params
- tag
- conda
- precode