Preparation and QC
Modules included in this section
fastqc_html *
- Authors
Menachem Sklarz
- Affiliation
Bioinformatics core facility
- Organization
National Institute of Biotechnology in the Negev, Ben Gurion University.
A module for running fastqc.
Creates scripts that run fastqc on all available fastq files.
Requires
fastq files in one of the following slots:
sample_data[<sample>]["fastq.F"]sample_data[<sample>]["fastq.R"]sample_data[<sample>]["fastq.S"]
Output
puts fastqc output files in the following slots:
sample_data[<sample>]["fastqc_fastq.F_html"]sample_data[<sample>]["fastqc_fastq.R_html"]sample_data[<sample>]["fastqc_fastq.S_html"]
puts fastqc zip files in the following slots:
sample_data[<sample>]["fastqc_fastq.F_zip"]sample_data[<sample>]["fastqc_fastq.R_zip"]sample_data[<sample>]["fastqc_fastq.S_zip"]
Lines for parameter file
fqc_merge1:
module: fastqc_html
base: merge1
script_path: /path/to/FastQC/fastqc
qsub_params:
-pe: shared 15
redirects:
--threads: 15
References
Andrews, S., 2010. FastQC: a quality control tool for high throughput sequence data.
trimmo *
- Authors
Menachem Sklarz
- Affiliation
Bioinformatics core facility
- Organization
National Institute of Biotechnology in the Negev, Ben Gurion University.
A module for running trimmomatic on fastq files
Requires
fastq files in at least one of the following slots:
sample_data[<sample>]["fastq.F"]
sample_data[<sample>]["fastq.R"]
sample_data[<sample>]["fastq.S"]
Output
puts fastq output files in the following slots:
sample_data[<sample>]["fastq.F"|"fastq.R"|"fastq.S"]
Parameters that can be set
Parameter |
Values |
Comments |
|---|---|---|
spec_dir |
path |
If trimmomatic must be executed within a particular directory, specify that directory here |
todo |
LEADING:20 TRAILING:20 |
The trimmomatic arguments |
Lines for parameter file
trim1:
module: trimmo
base: merge1
script_path: java -jar trimmomatic-0.32.jar
qsub_params:
-pe: shared 20
node: node1
spec_dir: /path/to/Trimmomatic_dir/
todo: LEADING:20 TRAILING:20
redirects:
-threads: 20
References
Bolger, A.M., Lohse, M. and Usadel, B., 2014. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics, 30(15), pp.2114-2120.
Multiqc *
- Authors
Menachem Sklarz
- Affiliation
Bioinformatics core facility
- Organization
National Institute of Biotechnology in the Negev, Ben Gurion University.
A module for preparing a MultiQC report for all samples.
Tip
By default, the module will search for parsable reports in the directories of all the modules in the branch leading to this instance. To search only in the directories of the explicit base steps, specify the bases_only parameter.
Requires
No real requirements. Will give a report with information if one of the base steps produces reports that MultiQC can read, e.g. fastqc, bowtie2, samtools etc.
Output
puts report dir in the following slot:
self.sample_data[<sample>]["Multiqc_report"]
Parameters that can be set
Parameter |
Values |
Comments |
|---|---|---|
bases_only |
Search directories of explicit base steps only. |
Lines for parameter file
firstMultQC:
module: Multiqc
base:
- sam_bwt2_1
- fqc_trim1
bases_only:
script_path: /path/to/multiqc
References
Ewels, P., Magnusson, M., Lundin, S. and Käller, M., 2016. MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics, 32(19), pp.3047-3048.
Cutadapt
- Authors
Levin Liron
- Affiliation
Bioinformatics core facility
- Organization
National Institute of Biotechnology in the Negev, Ben Gurion University.
Short Description
A module for running cutadapt on fastqc files
Requires
- fastq files in at least one of the following slots:
sample_data[<sample>]["fastq.F"]sample_data[<sample>]["fastq.R"]sample_data[<sample>]["fastq.S"]
Output
- puts fastq output files in the following slots:
sample_data[<sample>]["fastq.F"]sample_data[<sample>]["fastq.R"]sample_data[<sample>]["fastq.S"]
Parameters that can be set
Parameter |
Values |
Comments |
|---|---|---|
Lines for parameter file
Step_Name: # Name of this step
module: Cutadapt # Name of the module used
base: # Name of the step [or list of names] to run after [must be after a merge step]
script_path: # Command for running the Cutadapt script
paired: # Analyse Forward and Reverse reads together.
Demultiplexing: # Use to Demultiplex the adaptors, needs to be in the format of name=adaptor_seq
qsub_params:
-pe: # Number of CPUs to reserve for this analysis
redirects:
--too-short-output: # will replace @ with the location of the sample dir [e.g. @too_short.fq]
-a: # Use to trim poly A in SE reads [e.g. "A{100} -A T{100}"]
References
Martin, Marcel. “Cutadapt removes adapter sequences from high-throughput sequencing reads.” EMBnet. journal 17.1 (2011): pp-10
Trim_Galore
- Authors
Liron Levin
- Affiliation
Bioinformatics core facility
- Organization
National Institute of Biotechnology in the Negev, Ben Gurion University.
Short Description
A module for running Trim Galore on fastq files
Requires
- fastq files in at least one of the following slots:
sample_data[<sample>]["fastq.F"]sample_data[<sample>]["fastq.R"]sample_data[<sample>]["fastq.S"]
Output
- puts fastq output files in the following slots:
sample_data[<sample>]["fastq.F"]sample_data[<sample>]["fastq.R"]sample_data[<sample>]["fastq.S"]
- puts unpaired fastq output files in the following slots:
sample_data[<sample>]["fastq.F.unpaired"]sample_data[<sample>]["fastq.R.unpaired"]
Parameters that can be set
Parameter |
Values |
Comments |
|---|---|---|
Comments
- This module was tested on:
Trim Galore v0.4.2Cutadapt v1.12.1
Lines for parameter file
Step_Name: # Name of this step
module: Trim_Galore # Name of the module used
base: # Name of the step [or list of names] to run after [must be after a merge step]
script_path: # Command for running the Trim Galore script
qsub_params:
-pe: # Number of CPUs to reserve for this analysis
cutadapt_path: # Location of cutadapt executable
redirects:
--length: # Parameters for running Trim Galore
-q: # Parameters for running Trim Galore
References
- Cutadapt:
Martin, Marcel. “Cutadapt removes adapter sequences from high-throughput sequencing reads.” EMBnet journal 17.1 (2011):pp-10
- Trim Galore:
Krueger F: Trim Galore. [http://www.bioinformatics.babraham.ac.uk/projects/]
fastq_screen
- Authors
Menachem Sklarz
- Affiliation
Bioinformatics core facility
- Organization
National Institute of Biotechnology in the Negev, Ben Gurion University.
A module for executing fastq_screen on sequence files.
Input files are specified with the type parameter or taken from the fastq slots, one script per fastq file.
In regular mode, no output file are produced. However, if the --tag is included, the tagged file will be stored in the equivalent fastq.X slot.
If a --filter tag is included, the filtered file will be stored in the equivalent fastq.X slot.
The parameters can be passed through a configuration file specified in the redirected parameters with the --conf parameter.
Alternatively, if you do not specify the configuration file, one will be produced for you. For this, you must include:
A
genomessection specifying genome indices to screen against (see examples below) andan
alignersection specifying the aligning program to use and it’s path.
Additionally, if a --threads parameter is included in the redirects, it will be incorporated into the configuration file.
Attention
If a --bisulfite redirected parameter is included, it should contain the path to Bismark, which will be included in the configuration file.
Requires
fastq files in at least one of the following slots:
sample_data[<sample>]["fastq.F"]sample_data[<sample>]["fastq.R"]sample_data[<sample>]["fastq.S"]
Output
If
--tagand/or--filteror--nohitsare included, puts output fastq files in:sample_data[<sample>]["fastq.F"]sample_data[<sample>]["fastq.R"]sample_data[<sample>]["fastq.S"]
Parameters that can be set
Parameter |
Values |
Comments |
|---|---|---|
genomes |
|
If |
aligner |
|
If |
Lines for parameter file
No configuration file:
fastq_screen:
module: fastq_screen
base: merge1
script_path: {Vars.paths.fastq_screen}
qsub_params:
-pe: shared 60
aligner:
bowtie2: {Vars.paths.bowtie2}
genomes:
Human: {Vars.databases.human}
Mouse: {Vars.databases.moiuse}
PhiX: {Vars.databases.phix}
redirects:
--filter: 200
--tag:
# --nohits:
--force:
--threads: 60
With configuration file:
fastq_screen:
module: fastq_screen
base: merge1
script_path: {Vars.paths.fastq_screen}
qsub_params:
-pe: shared 60
redirects:
--conf: {Vars.paths.fastq_screen_conf_file}
--filter: 200
--tag:
# --nohits:
--force:
References
Wingett, S.W. and Andrews, S., 2018. FastQ Screen: A tool for multi-genome mapping and quality control. F1000Research, 7.
Comments