Preparation and QC

fastqc_html *

Authors

Menachem Sklarz

Affiliation

Bioinformatics core facility

Organization

National Institute of Biotechnology in the Negev, Ben Gurion University.

A module for running fastqc.

Creates scripts that run fastqc on all available fastq files.

Requires

  • fastq files in one of the following slots:

    • sample_data[<sample>]["fastq.F"]

    • sample_data[<sample>]["fastq.R"]

    • sample_data[<sample>]["fastq.S"]

Output

  • puts fastqc output files in the following slots:

    • sample_data[<sample>]["fastqc_fastq.F_html"]

    • sample_data[<sample>]["fastqc_fastq.R_html"]

    • sample_data[<sample>]["fastqc_fastq.S_html"]

  • puts fastqc zip files in the following slots:

    • sample_data[<sample>]["fastqc_fastq.F_zip"]

    • sample_data[<sample>]["fastqc_fastq.R_zip"]

    • sample_data[<sample>]["fastqc_fastq.S_zip"]

Lines for parameter file

fqc_merge1:
    module: fastqc_html
    base: merge1
    script_path: /path/to/FastQC/fastqc
    qsub_params:
        -pe: shared 15
    redirects:
        --threads: 15

References

Andrews, S., 2010. FastQC: a quality control tool for high throughput sequence data.

trimmo *

Authors

Menachem Sklarz

Affiliation

Bioinformatics core facility

Organization

National Institute of Biotechnology in the Negev, Ben Gurion University.

A module for running trimmomatic on fastq files

Requires

  • fastq files in at least one of the following slots:

    • sample_data[<sample>]["fastq.F"]

    • sample_data[<sample>]["fastq.R"]

    • sample_data[<sample>]["fastq.S"]

Output

  • puts fastq output files in the following slots:

    • sample_data[<sample>]["fastq.F"|"fastq.R"|"fastq.S"]

Parameters that can be set

Parameter

Values

Comments

spec_dir

path

If trimmomatic must be executed within a particular directory, specify that directory here

todo

LEADING:20 TRAILING:20

The trimmomatic arguments

Lines for parameter file

trim1:
    module: trimmo
    base: merge1
    script_path: java -jar trimmomatic-0.32.jar
    qsub_params:
        -pe: shared 20
        node: node1
    spec_dir: /path/to/Trimmomatic_dir/
    todo: LEADING:20 TRAILING:20
    redirects:
        -threads: 20

References

Bolger, A.M., Lohse, M. and Usadel, B., 2014. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics, 30(15), pp.2114-2120.

Multiqc *

Authors

Menachem Sklarz

Affiliation

Bioinformatics core facility

Organization

National Institute of Biotechnology in the Negev, Ben Gurion University.

A module for preparing a MultiQC report for all samples.

Tip

By default, the module will search for parsable reports in the directories of all the modules in the branch leading to this instance. To search only in the directories of the explicit base steps, specify the bases_only parameter.

Requires

  • No real requirements. Will give a report with information if one of the base steps produces reports that MultiQC can read, e.g. fastqc, bowtie2, samtools etc.

Output

  • puts report dir in the following slot:

    • self.sample_data[<sample>]["Multiqc_report"]

Parameters that can be set

Parameter

Values

Comments

bases_only

Search directories of explicit base steps only.

Lines for parameter file

firstMultQC:
    module: Multiqc
    base:
        - sam_bwt2_1
        - fqc_trim1
    bases_only:
    script_path: /path/to/multiqc

References

Ewels, P., Magnusson, M., Lundin, S. and Käller, M., 2016. MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics, 32(19), pp.3047-3048.

Cutadapt

Authors

Levin Liron

Affiliation

Bioinformatics core facility

Organization

National Institute of Biotechnology in the Negev, Ben Gurion University.

Short Description

A module for running cutadapt on fastqc files

Requires

  • fastq files in at least one of the following slots:

    sample_data[<sample>]["fastq.F"] sample_data[<sample>]["fastq.R"] sample_data[<sample>]["fastq.S"]

Output

  • puts fastq output files in the following slots:

    sample_data[<sample>]["fastq.F"] sample_data[<sample>]["fastq.R"] sample_data[<sample>]["fastq.S"]

Parameters that can be set

Parameter

Values

Comments

Comments

  • This module was tested on:

    Cutadapt v1.12.1

Lines for parameter file

Step_Name:                       # Name of this step
    module: Cutadapt             # Name of the module used
    base:                        # Name of the step [or list of names] to run after [must be after a merge step]
    script_path:                 # Command for running the Cutadapt script
    paired:                      # Analyse Forward and Reverse reads together.
    Demultiplexing:              # Use to Demultiplex the adaptors, needs to be in the format of name=adaptor_seq
    qsub_params:
        -pe:                     # Number of CPUs to reserve for this analysis
    redirects:
        --too-short-output:      # will replace @ with the location of the sample dir  [e.g. @too_short.fq] 
        -a:                      # Use to trim poly A in SE reads [e.g. "A{100} -A T{100}"]

References

Martin, Marcel. “Cutadapt removes adapter sequences from high-throughput sequencing reads.” EMBnet. journal 17.1 (2011): pp-10

Trim_Galore

Authors

Liron Levin

Affiliation

Bioinformatics core facility

Organization

National Institute of Biotechnology in the Negev, Ben Gurion University.

Short Description

A module for running Trim Galore on fastq files

Requires

  • fastq files in at least one of the following slots:

    sample_data[<sample>]["fastq.F"] sample_data[<sample>]["fastq.R"] sample_data[<sample>]["fastq.S"]

Output

  • puts fastq output files in the following slots:

    sample_data[<sample>]["fastq.F"] sample_data[<sample>]["fastq.R"] sample_data[<sample>]["fastq.S"]

  • puts unpaired fastq output files in the following slots:

    sample_data[<sample>]["fastq.F.unpaired"] sample_data[<sample>]["fastq.R.unpaired"]

Parameters that can be set

Parameter

Values

Comments

Comments

  • This module was tested on:

    Trim Galore v0.4.2 Cutadapt v1.12.1

Lines for parameter file

Step_Name:                       # Name of this step
    module: Trim_Galore          # Name of the module used
    base:                        # Name of the step [or list of names] to run after [must be after a merge step]
    script_path:                 # Command for running the Trim Galore script
    qsub_params:
        -pe:                     # Number of CPUs to reserve for this analysis
    cutadapt_path:               # Location of cutadapt executable 
    redirects:
        --length:                # Parameters for running Trim Galore
        -q:                      # Parameters for running Trim Galore

References

fastq_screen

Authors

Menachem Sklarz

Affiliation

Bioinformatics core facility

Organization

National Institute of Biotechnology in the Negev, Ben Gurion University.

A module for executing fastq_screen on sequence files.

Input files are specified with the type parameter or taken from the fastq slots, one script per fastq file.

In regular mode, no output file are produced. However, if the --tag is included, the tagged file will be stored in the equivalent fastq.X slot. If a --filter tag is included, the filtered file will be stored in the equivalent fastq.X slot.

The parameters can be passed through a configuration file specified in the redirected parameters with the --conf parameter.

Alternatively, if you do not specify the configuration file, one will be produced for you. For this, you must include:

  1. A genomes section specifying genome indices to screen against (see examples below) and

  2. an aligner section specifying the aligning program to use and it’s path.

Additionally, if a --threads parameter is included in the redirects, it will be incorporated into the configuration file.

Attention

If a --bisulfite redirected parameter is included, it should contain the path to Bismark, which will be included in the configuration file.

Requires

  • fastq files in at least one of the following slots:

    • sample_data[<sample>]["fastq.F"]

    • sample_data[<sample>]["fastq.R"]

    • sample_data[<sample>]["fastq.S"]

Output

  • If --tag and/or --filter or --nohits are included, puts output fastq files in:

    • sample_data[<sample>]["fastq.F"]

    • sample_data[<sample>]["fastq.R"]

    • sample_data[<sample>]["fastq.S"]

Parameters that can be set

Parameter

Values

Comments

genomes

name: index pairs (see examples)

If --conf not provided, genomes to screen against.

aligner

name: index single pair

If --conf not provided, path to aligner to use.

Lines for parameter file

No configuration file:

fastq_screen:
    module:         fastq_screen
    base:           merge1
    script_path:    {Vars.paths.fastq_screen}
    qsub_params:
        -pe:        shared 60
    aligner:
        bowtie2:    {Vars.paths.bowtie2}
    genomes:
        Human:      {Vars.databases.human}
        Mouse:      {Vars.databases.moiuse}
        PhiX:       {Vars.databases.phix}
    redirects:
        --filter:   200
        --tag:
        # --nohits:
        --force: 
        --threads:  60 

With configuration file:

fastq_screen:
    module:         fastq_screen
    base:           merge1
    script_path:    {Vars.paths.fastq_screen}
    qsub_params:
        -pe:        shared 60
    redirects:
        --conf:     {Vars.paths.fastq_screen_conf_file}
        --filter:   200
        --tag:
        # --nohits:
        --force: 

References

Wingett, S.W. and Andrews, S., 2018. FastQ Screen: A tool for multi-genome mapping and quality control. F1000Research, 7.