Preparation and QC

Modules included in this section

fastqc_html ^*
trimmo ^*
Multiqc ^*
Cutadapt
Trim_Galore
fastq_screen

`fastqc_html` ^*

Authors: Menachem Sklarz
Affiliation: Bioinformatics core facility
Organization: National Institute of Biotechnology in the Negev, Ben Gurion University.

A module for running fastqc.

Creates scripts that run fastqc on all available fastq files.

Requires

fastq files in one of the following slots:
- sample_data[<sample>]["fastq.F"]
- sample_data[<sample>]["fastq.R"]
- sample_data[<sample>]["fastq.S"]

Output

puts fastqc output files in the following slots:
- sample_data[<sample>]["fastqc_fastq.F_html"]
- sample_data[<sample>]["fastqc_fastq.R_html"]
- sample_data[<sample>]["fastqc_fastq.S_html"]
puts fastqc zip files in the following slots:
- sample_data[<sample>]["fastqc_fastq.F_zip"]
- sample_data[<sample>]["fastqc_fastq.R_zip"]
- sample_data[<sample>]["fastqc_fastq.S_zip"]

Lines for parameter file

fqc_merge1:
    module: fastqc_html
    base: merge1
    script_path: /path/to/FastQC/fastqc
    qsub_params:
        -pe: shared 15
    redirects:
        --threads: 15

References

Andrews, S., 2010. FastQC: a quality control tool for high throughput sequence data.

`trimmo` ^*

Authors: Menachem Sklarz
Affiliation: Bioinformatics core facility
Organization: National Institute of Biotechnology in the Negev, Ben Gurion University.

A module for running trimmomatic on fastq files

Requires

fastq files in at least one of the following slots:

sample_data[<sample>]["fastq.F"]

sample_data[<sample>]["fastq.R"]

sample_data[<sample>]["fastq.S"]

Output

puts fastq output files in the following slots:

sample_data[<sample>]["fastq.F"|"fastq.R"|"fastq.S"]

Parameters that can be set

Parameter	Values	Comments
spec_dir	path	If trimmomatic must be executed within a particular directory, specify that directory here
todo	LEADING:20 TRAILING:20	The trimmomatic arguments

Lines for parameter file

trim1:
    module: trimmo
    base: merge1
    script_path: java -jar trimmomatic-0.32.jar
    qsub_params:
        -pe: shared 20
        node: node1
    spec_dir: /path/to/Trimmomatic_dir/
    todo: LEADING:20 TRAILING:20
    redirects:
        -threads: 20

References

Bolger, A.M., Lohse, M. and Usadel, B., 2014. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics, 30(15), pp.2114-2120.

`Multiqc` ^*

Authors: Menachem Sklarz
Affiliation: Bioinformatics core facility
Organization: National Institute of Biotechnology in the Negev, Ben Gurion University.

A module for preparing a MultiQC report for all samples.

Tip

By default, the module will search for parsable reports in the directories of all the modules in the branch leading to this instance. To search only in the directories of the explicit base steps, specify the bases_only parameter.

Requires

No real requirements. Will give a report with information if one of the base steps produces reports that MultiQC can read, e.g. fastqc, bowtie2, samtools etc.

Output

puts report dir in the following slot:
- self.sample_data[<sample>]["Multiqc_report"]

Parameters that can be set

Parameter	Values	Comments
bases_only		Search directories of explicit base steps only.

Lines for parameter file

firstMultQC:
    module: Multiqc
    base:
        - sam_bwt2_1
        - fqc_trim1
    bases_only:
    script_path: /path/to/multiqc

References

Ewels, P., Magnusson, M., Lundin, S. and Käller, M., 2016. MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics, 32(19), pp.3047-3048.

`Cutadapt`

Authors: Levin Liron
Affiliation: Bioinformatics core facility
Organization: National Institute of Biotechnology in the Negev, Ben Gurion University.

Short Description

A module for running cutadapt on fastqc files

Requires

fastq files in at least one of the following slots:
sample_data[<sample>]["fastq.F"] sample_data[<sample>]["fastq.R"] sample_data[<sample>]["fastq.S"]

Output

puts fastq output files in the following slots:
sample_data[<sample>]["fastq.F"] sample_data[<sample>]["fastq.R"] sample_data[<sample>]["fastq.S"]

Parameters that can be set

Parameter	Values	Comments

Comments

This module was tested on:
Cutadapt v1.12.1

Lines for parameter file

Step_Name:                       # Name of this step
    module: Cutadapt             # Name of the module used
    base:                        # Name of the step [or list of names] to run after [must be after a merge step]
    script_path:                 # Command for running the Cutadapt script
    paired:                      # Analyse Forward and Reverse reads together.
    Demultiplexing:              # Use to Demultiplex the adaptors, needs to be in the format of name=adaptor_seq
    qsub_params:
        -pe:                     # Number of CPUs to reserve for this analysis
    redirects:
        --too-short-output:      # will replace @ with the location of the sample dir  [e.g. @too_short.fq] 
        -a:                      # Use to trim poly A in SE reads [e.g. "A{100} -A T{100}"]

References

Martin, Marcel. “Cutadapt removes adapter sequences from high-throughput sequencing reads.” EMBnet. journal 17.1 (2011): pp-10

`Trim_Galore`

Authors: Liron Levin
Affiliation: Bioinformatics core facility
Organization: National Institute of Biotechnology in the Negev, Ben Gurion University.

Short Description

A module for running Trim Galore on fastq files

Requires

fastq files in at least one of the following slots:
sample_data[<sample>]["fastq.F"] sample_data[<sample>]["fastq.R"] sample_data[<sample>]["fastq.S"]

Output

puts fastq output files in the following slots:
sample_data[<sample>]["fastq.F"] sample_data[<sample>]["fastq.R"] sample_data[<sample>]["fastq.S"]

puts unpaired fastq output files in the following slots:
sample_data[<sample>]["fastq.F.unpaired"] sample_data[<sample>]["fastq.R.unpaired"]

Parameters that can be set

Parameter	Values	Comments

Comments

This module was tested on:
Trim Galore v0.4.2 Cutadapt v1.12.1

Lines for parameter file

Step_Name:                       # Name of this step
    module: Trim_Galore          # Name of the module used
    base:                        # Name of the step [or list of names] to run after [must be after a merge step]
    script_path:                 # Command for running the Trim Galore script
    qsub_params:
        -pe:                     # Number of CPUs to reserve for this analysis
    cutadapt_path:               # Location of cutadapt executable 
    redirects:
        --length:                # Parameters for running Trim Galore
        -q:                      # Parameters for running Trim Galore

References

Cutadapt:
Martin, Marcel. “Cutadapt removes adapter sequences from high-throughput sequencing reads.” EMBnet journal 17.1 (2011):pp-10

Trim Galore:
Krueger F: Trim Galore. [http://www.bioinformatics.babraham.ac.uk/projects/]

`fastq_screen`

Authors: Menachem Sklarz
Affiliation: Bioinformatics core facility
Organization: National Institute of Biotechnology in the Negev, Ben Gurion University.

A module for executing fastq_screen on sequence files.

Input files are specified with the type parameter or taken from the fastq slots, one script per fastq file.

In regular mode, no output file are produced. However, if the --tag is included, the tagged file will be stored in the equivalent fastq.X slot. If a --filter tag is included, the filtered file will be stored in the equivalent fastq.X slot.

The parameters can be passed through a configuration file specified in the redirected parameters with the --conf parameter.

Alternatively, if you do not specify the configuration file, one will be produced for you. For this, you must include:

A genomes section specifying genome indices to screen against (see examples below) and
an aligner section specifying the aligning program to use and it’s path.

Additionally, if a --threads parameter is included in the redirects, it will be incorporated into the configuration file.

Attention

If a --bisulfite redirected parameter is included, it should contain the path to Bismark, which will be included in the configuration file.

Requires

fastq files in at least one of the following slots:
- sample_data[<sample>]["fastq.F"]
- sample_data[<sample>]["fastq.R"]
- sample_data[<sample>]["fastq.S"]

Output

If --tag and/or --filter or --nohits are included, puts output fastq files in:
- sample_data[<sample>]["fastq.F"]
- sample_data[<sample>]["fastq.R"]
- sample_data[<sample>]["fastq.S"]

Parameters that can be set

Parameter	Values	Comments
genomes	`name: index` pairs (see examples)	If `--conf` not provided, genomes to screen against.
aligner	`name: index` single pair	If `--conf` not provided, path to aligner to use.

Lines for parameter file

No configuration file:

fastq_screen:
    module:         fastq_screen
    base:           merge1
    script_path:    {Vars.paths.fastq_screen}
    qsub_params:
        -pe:        shared 60
    aligner:
        bowtie2:    {Vars.paths.bowtie2}
    genomes:
        Human:      {Vars.databases.human}
        Mouse:      {Vars.databases.moiuse}
        PhiX:       {Vars.databases.phix}
    redirects:
        --filter:   200
        --tag:
        # --nohits:
        --force: 
        --threads:  60 

With configuration file:

fastq_screen:
    module:         fastq_screen
    base:           merge1
    script_path:    {Vars.paths.fastq_screen}
    qsub_params:
        -pe:        shared 60
    redirects:
        --conf:     {Vars.paths.fastq_screen_conf_file}
        --filter:   200
        --tag:
        # --nohits:
        --force: 

References

Wingett, S.W. and Andrews, S., 2018. FastQ Screen: A tool for multi-genome mapping and quality control. F1000Research, 7.

Preparation and QC

fastqc_html *

Requires

Output

Lines for parameter file

References

trimmo *

Requires

Output

Parameters that can be set

Lines for parameter file

References

Multiqc *

Requires

Output

Parameters that can be set

Lines for parameter file

References

Cutadapt

Short Description

Requires

Output

Parameters that can be set

Comments

Lines for parameter file

References

Trim_Galore

Short Description

Requires

Output

Parameters that can be set

Comments

Lines for parameter file

References

fastq_screen

Requires

Output

Parameters that can be set

Lines for parameter file

References

`fastqc_html` ^*

`trimmo` ^*

`Multiqc` ^*

`Cutadapt`

`Trim_Galore`

`fastq_screen`