Mapping

bowtie2_builder *

Authors

Menachem Sklarz

Affiliation

Bioinformatics core facility

Organization

National Institute of Biotechnology in the Negev, Ben Gurion University.

A module for running bowtie2 index builder:

Builds a bowtie2 index for a fasta file stored at the project or sample level.

Determine which one will be used by specifying scope as either project or sample.

Requires

  • fasta files in one of the following slots:

    • sample_data[<sample>]["fasta.nucl"]

    • sample_data["fasta.nucl"]

Output

  • Puts output index files in one of the following slots:
    • self.sample_data[<sample>]["bowtie2.index"]

    • self.sample_data["project_data"]["bowtie2.index"]

  • Puts the fasta file in the following slot:
    • self.sample_data[<sample>]["reference"]

Parameters that can be set

Parameter

Values

Comments

scope

project | sample

Indicates whether to use a project fasta or a sample fasta.

Lines for parameter file

bwt2_build:
    module: bowtie2_builder
    base: trinity1
    script_path: /path/to/bowtie2-build
    scope: project

References

Langmead, B. and Salzberg, S.L., 2012. Fast gapped-read alignment with Bowtie 2. Nature methods, 9(4), pp.357-359.

bowtie2_mapper *

Authors

Menachem Sklarz

Affiliation

Bioinformatics core facility

Organization

National Institute of Biotechnology in the Negev, Ben Gurion University.

A module for running bowtie2 mapper:

The reads stored in each sample are aligned to one of the following bowtie2 indices:

  • An external index passed with the -x parameter.

  • A bowtie2 index on a project fasta files, such as an assembly from all samples. Specify with bowtie2_mapper:scope  project

  • A sample bowtie2 index on a sample-specific fasta file, such as from a sample-wise assembly or from the sample file. Specify with bowtie2_mapper:scope  sample

The latter two options must come after a bowtie2_builder instance.

Tip

See the documentation for the bowtie2_builder module.

Note

fastq files are never defined project-wide

The scope parameter controls the origin of the index files, i.e. wheather the fasta file to map to is an assembly of the sample reads (scope: sample) or an assembly of all reads in the project (scope: project). The reads to be mapped are always saple reads, as a ‘fastq’ slot is not defined at the project level.

Requires

  • fastq files in one of the following slots:

    • sample_data[<sample>]["fastq.F"]

    • sample_data[<sample>]["fastq.R"]

    • sample_data[<sample>]["fastq.S"]

Output

  • Puts output sam files in the following slots:
    • self.sample_data[<sample>]["sam"]

  • Puts the name of the mapper in:
    • self.sample_data[<sample>]["mapper"]

  • puts fasta of reference genome (if one is given in param file) in:
    • self.sample_data[<sample>]["reference"]

Parameters that can be set

Parameter

Values

Comments

-x

path to bowtie2 index

If not given, will look for a project bowtie2 index and then for a sample bowtie2 index

ref_genome

path to genome fasta

If -x is NOT given, will use the equivalent internal fasta. If -x is passed, and ref_genome is NOT passed, will leave the reference slot empty

get_map_log

Store the log produced by bowtie2 (This is bowtie2 standard output)

scope

project | sample

Indicates whether to use a project or sample bowtie2 index.

Lines for parameter file

For external index:

bwt2_1:
    module: bowtie2_mapper
    base: trim1
    script_path: /path/to/bowtie2
    qsub_params:
        -pe: shared 20
    get_map_log:
    ref_genome: /path/to/ref_genome.fna
    redirects:
        -p: 20
        -q: null
        -x: /path/to/bowtie2.index/ref_genome

Using a bowtie2 index constructed from a project fasta:

bwt2_1:
    module: bowtie2_mapper
    base: bwt2_bld1
    script_path: /path/to/bowtie2
    qsub_params:
        -pe: shared 20
    get_map_log:
    scope: project
    redirects:
        -p: 20
        -q: null

References

Langmead, B. and Salzberg, S.L., 2012. Fast gapped-read alignment with Bowtie 2. Nature methods, 9(4), pp.357-359.

bowtie1_builder *

Authors

Menachem Sklarz

Affiliation

Bioinformatics core facility

Organization

National Institute of Biotechnology in the Negev, Ben Gurion University.

A module for running bowtie1 index builder:

Requires

  • fasta files in one of the following slots:

    • sample_data["fasta.nucl"]

    • sample_data[<sample>]["fasta.nucl"]

output

Puts output index files in one of the following slot:
  • self.sample_data[<sample>]["bowtie1.index"]

  • self.sample_data["project_data"]["bowtie1.index"]

Parameters that can be set

Parameter

Values

Comments

scope

path to bowtie1 index

If not given, will look for a project bowtie1 index and then for a sample bowtie1 index

Lines for parameter file

bwt1_bld_ind:
    module: bowtie1_builder
    base: trinity1
    script_path: /path/to/bowtie
    scope: project

References

Langmead, B., Trapnell, C., Pop, M. and Salzberg, S.L., 2009. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome biology, 10(3), p.R25.

bowtie1_mapper *

Authors

Menachem Sklarz

Affiliation

Bioinformatics core facility

Organization

National Institute of Biotechnology in the Negev, Ben Gurion University.

A module for running bowtie1 mapper:

The reads stored in each sample are aligned to one of the following bowtie indices:

  • An external index passed with the ebwt parameter.

  • A bowtie index on a project fasta files, such as an assembly from all samples. Specify with bowtie1_mapper:scope  project

  • A sample bowtie1 index on a sample-specific fasta file, such as from a sample-wise assembly or from the sample file. Specify with bowtie1_mapper:scope  sample

The latter two options must come after a bowtie1_builder instance.

Requires

  • fastq files in one of the following slots:

    • sample_data[<sample>]["fastq.F"]

    • sample_data[<sample>]["fastq.R"]

    • sample_data[<sample>]["fastq.S"]

Output

  • Puts output sam files in the following slots:

    self.sample_data[<sample>]["sam"]

  • Puts the name of the mapper in:

    self.sample_data[<sample>]["mapper"]

  • Puts fasta of reference genome (if one is given in param file) in:

    self.sample_data[<sample>]["reference"]

Parameters that can be set

Parameter

Values

Comments

ebwt

path to bowtie1 index

If not given, will look for a project bowtie1 index and then for a sample bowtie1 index

ref_genome

path to genome fasta

If ebwt is NOT given, will use the equivalent internal fasta. If ebwt IS given, and ref_genome is NOT passed, will leave the reference slot empty.

scope

project | sample

Indicates whether to use a project or sample bowtie1 index.

Lines for parameter file

For external index:

bwt1:
    module: bowtie1_mapper
    base: trim1
    script_path: /path/to/bowtie
    qsub_params:
        -pe: shared 20
    ebwt: /path/to/bowtie1.index/ref_genome
    ref_genome: /path/to/ref_genome.fna
    redirects:
        -p: 20

For project bowtie index:

bwt1_1:
    module: bowtie1_mapper
    base: bwt1_bld_ind
    script_path: /path/to/bowtie
    scope: project

References

Langmead, B., Trapnell, C., Pop, M. and Salzberg, S.L., 2009. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome biology, 10(3), p.R25.

bwa_builder *

Authors

Menachem Sklarz

Affiliation

Bioinformatics core facility

Organization

National Institute of Biotechnology in the Negev, Ben Gurion University.

A module for running bwa index builder:

Builds a bwa index for a fasta file stored at the project or sample level.

Determine which one will be used by specifying scope as either project or sample.

Requires

  • fasta files in one of the following slots:

    • sample_data[<sample>]["fasta.nucl"]

    • sample_data["fasta.nucl"]

Output

  • Puts output index files in one of the following slots:
    • self.sample_data[<sample>]["bwa_index"]

    • self.sample_data["project_data"]["bwa_index"]

  • Puts the fasta file in one of the following slot:
    • self.sample_data[<sample>]["reference"]

Parameters that can be set

Parameter

Values

Comments

scope

project | sample

Indicates whether to use a project fasta or a sample fasta.

Lines for parameter file

bwa_bld_ind:
    module: bwa_builder
    base: spades1
    script_path: /path/to/bwa index
    scope: project

References

Li, H. and Durbin, R., 2009. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics, 25(14), pp.1754-1760.

bwa_mapper *

Authors

Menachem Sklarz

Affiliation

Bioinformatics core facility

Organization

National Institute of Biotechnology in the Negev, Ben Gurion University.

A module for running bwa mapper:

The reads stored in each sample are aligned to one of the following bwa indices:

  • An external index passed with the ref_index parameter.

  • A bwa index on a project fasta files, such as an assembly from all samples. Specify with bwa_mapper:scope  project

  • A sample bwa index on a sample-specific fasta file, such as from a sample-wise assembly or from the sample fasta file. Specify with bwa_mapper:scope  sample

The latter two options must come after a bwa_builder instance.

Requires

  • fastq files in one of the following slots:

    • sample_data[<sample>]["fastq.F"]

    • sample_data[<sample>]["fastq.R"]

    • sample_data[<sample>]["fastq.S"]

  • If mod is one of samse, sampe, the sai files are required as well (created by a bwa aln step:
    • self.sample_data[<sample>]["saiF|saiR|saiS"]

Output

  • Puts output sam files in the following slots:
    • If mod is one of mem, samse, sampe, bwasw:
      • self.sample_data[<sample>]["sam"]

    • If mod is aln:
      • self.sample_data[<sample>]["saiF|saiR|saiS"]

  • Puts the name of the mapper in:
    • self.sample_data[<sample>]["mapper"]

  • puts fasta of reference genome (if one is given in param file) in:
    • self.sample_data[<sample>]["reference"]

Parameters that can be set

Parameter

Values

Comments

ref_index

path to bwa index

If not given, will look for a project bwa index and then for a sample bwa index

ref_genome

path to genome fasta

If ref_index is NOT given, will use the equivalent internal fasta. If ref_index is passed, and ref_genome is NOT passed, will leave the reference slot empty

scope

project | sample

Indicates whether to use a project or sample bwa index.

Lines for parameter file

For external index:

  1. Using mem:

bwa_mem_1:
    module: bwa_mapper
    base: trim1
    script_path: /path/to/bwa
    mod: mem
    qsub_params:
        -pe: shared 20
    ref_genome: /path/to/ref_genome.fna
    ref_index: /path/to/bwa_index/ref_genome
    redirects:
        -t: 20

2. Using ``aln - samse/sampe``:

bwa_aln_1:
    module: bwa_mapper
    base: trim1
    script_path: /path/to/bwa_mapper
    mod: aln
    qsub_params:
        -pe: shared 20
    ref_genome: /path/to/ref_genome.fna
    ref_index: /path/to/bwa_index/ref_genome
    redirects:
        -t: 20
bwa_samse_1:
    module: bwa_mapper
    base: bwt2_1
    script_path: /path/to/bwa
    mod: samse
    ref_genome: /path/to/ref_genome.fna
    ref_index: /path/to/bwa_index/ref_genome

For project bwa index:

bwa_1:
    module: bwa_mapper
    base: bwa_bld_ind
    script_path: /path/to/bwa
    mod: mem
    scope: project

References

Li, H. and Durbin, R., 2009. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics, 25(14), pp.1754-1760.

STAR_mapper

Authors

Menachem Sklarz

Affiliation

Bioinformatics core facility

Organization

National Institute of Biotechnology in the Negev, Ben Gurion University.

A module for running STAR mapper:

Requires

  • fastq files in one of the following slots:

    • sample_data[<sample>]["fastq.F"]

    • sample_data[<sample>]["fastq.R"]

    • sample_data[<sample>]["fastq.S"]

  • If scope is set (must come after STAR_builder module which populates the required slots):

    • STAR index directories in:

      • sample_data[<sample>]["STAR.index"] if scope = “sample”

      • sample_data["STAR.index"] if scope = “project”

    • Reference fasta files in:

      • sample_data[<sample>]["STAR.fasta"] if scope = “sample”

      • sample_data["STAR.fasta"] if scope = “project”

Output

  • Puts output sam files in the following slots:

    • self.sample_data[<sample>]["sam"]

  • Alternatively, if --outSAMtype is set to BAM, puts output BAM files in the following slots:

    • self.sample_data[<sample>]["bam"]

    • self.sample_data[<sample>]["bam_unsorted"]

  • High confidence collapsed splice junctions (SJ.out.tab file) will be stored in:

    • self.sample_data[<sample>]["SJ.out.tab"]

  • If --quantMode contains TranscriptomeSAM, alignments BAM translated into transcript coordinates will be stored in:

    • self.sample_data[<sample>]["TranscriptomeSAM"]

  • If --quantMode contains GeneCounts, the ReadsPerGene.out.tab file will be stored:

    • self.sample_data[<sample>]["GeneCounts"]

  • If --outWigType is set, will store outputs in:

    • if --outWigType is wiggle

      • self.sample_data[<sample>]["wig2_UniqueMultiple"]

      • self.sample_data[<sample>]["wig2_Unique"]

      • self.sample_data[<sample>]["wig1_UniqueMultiple"]

      • self.sample_data[<sample>]["wig1_Unique"]

      • self.sample_data[<sample>]["wig"]

    • if --outWigType is bedGraph

      • self.sample_data[<sample>]["bdg2_UniqueMultiple"]

      • self.sample_data[<sample>]["bdg2_Unique"]

      • self.sample_data[<sample>]["bdg1_UniqueMultiple"]

      • self.sample_data[<sample>]["bdg1_Unique"]

      • self.sample_data[<sample>]["bdg"]

  • Puts the name of the mapper in:

    self.sample_data[<sample>]["mapper"]

  • Puts fasta of reference genome (if one is given in param file) in:

    self.sample_data[<sample>]["reference"]

Parameters that can be set

Parameter

Values

Comments

ref_genome

path to genome fasta

scope

project | sample

The scope from which to take the genome directory

Note

You can set the RG atrribute of the resulting SAM/BAM files with the redirected parameter --outSAMattrRGline This will set the equivalent STAR parameter.

By default, the parameter will be set to include ID and SM tags, both set to the sample name. You can set the SM tag, but any ID tags will be removed and replaced with the sample name.

Lines for parameter file

For external index:

STAR_map:
    module:             STAR_mapper
    base:               STAR_bld_ind
    script_path:        /path/to/STAR
    redirects:
        --readMapNumber:    1000
        --genomeDir:        /path/to/genome/STAR_index/

For project STAR index:

STAR_map:
    module:             STAR_mapper
    base:               STAR_bld_ind
    script_path:        /path/to/STAR
    scope:              project
    redirects:
        --readMapNumber:    1000

References

Dobin, A., Davis, C.A., Schlesinger, F., Drenkow, J., Zaleski, C., Jha, S., Batut, P., Chaisson, M. and Gingeras, T.R., 2013. STAR: ultrafast universal RNA-seq aligner. Bioinformatics, 29(1), pp.15-21.

STAR_builder

Authors

Menachem Sklarz

Affiliation

Bioinformatics core facility

Organization

National Institute of Biotechnology in the Negev, Ben Gurion University.

A module for running STAR genome index construction:

Requires

  • fasta files in one of the following slots:

    • sample_data["fasta.nucl"]

    • sample_data[<sample>]["fasta.nucl"]

  • If --sjdbGTFfile is set in redirects, but left empty, will expect to find a GTF file here:

    • sample_data["gtf"] if scope = “project”

    • sample_data[<sample>]["gtf"] if scope = “sample”

  • If --sjdbFileChrStartEnd is set in redirects, but left empty, will expect to find an SJ file here:

    • sample_data["SJ.out.tab"] if scope = “project”

    • sample_data[<sample>]["SJ.out.tab"] if scope = “sample”

Output

Puts output index files in one of the following slot:

  • self.sample_data[<sample>]["STAR.index"]

  • self.sample_data["project_data"]["STAR.index"]

Puts the reference fasta file in one of the following slot:

  • self.sample_data[<sample>]["STAR.fasta"]

  • self.sample_data["project_data"]["STAR.fasta"]

Parameters that can be set

Parameter

Values

Comments

scope

project | sample

Not used

Lines for parameter file

STAR_bld_ind:
    module:             STAR_builder
    base:               trinity1
    script_path:        /path/to/STAR
    scope:              project
    qsub_params:
        queue:          star.q
    redirects:
        --genomeSAindexNbases:  12
        --genomeChrBinNbits:    10

References

Dobin, A., Davis, C.A., Schlesinger, F., Drenkow, J., Zaleski, C., Jha, S., Batut, P., Chaisson, M. and Gingeras, T.R., 2013. STAR: ultrafast universal RNA-seq aligner. Bioinformatics, 29(1), pp.15-21.

STAR_LoadRemoveGenome

Authors

Menachem Sklarz

Affiliation

Bioinformatics core facility

Organization

National Institute of Biotechnology in the Negev, Ben Gurion University.

A module for loading a STAR genome into RAM for use by subsequent STAR mapping jobs.

Note

This module saves memory and time. Set parameter --genomeLoad in the STAR mapping instance to LoadAndKeep. This will load the genome once into memory and use it repeatedly for all instances executed on the same node. When all mapping jobs are completed, Scripts produced by this instance will remove the genome from RAM for all nodes used.

Tip

Make sure you set the node parameter in qsub_params to all the nodes in use by the base STAR_mapper instance.

Attention

Currently defined for project-scope or external genomes only. Not used for sample-scope genomes.

Note

Loading a genome is not really required. It will be loaded by the first instance of STAR.

Requires

  • A STAR genome in:

    • sample_data["STAR.index"]

Alternatively, a STAR genome index can be passed with the --genomeDir parameter.

Output

No output is created

Parameters that can be set

Parameter

Values

Comments

genome

load|remove

Load or remove genome from RAM

qsub_params:node

Nodes on which to load/unload genome

scope

project | sample

The scope from which to take the genome directory. Currently not in use

Lines for parameter file

For external index:

STAR_remove_genome:
    module:             STAR_LoadRemoveGenome
    base:               STAR_map
    script_path:        '{Vars.paths.STAR}STAR'
    genome:             remove
    qsub_params:
        queue:          queue.q
        node:           {Vars.nodes}
    redirects:
        --genomeDir:    /path/to/STAR/genome_directory

For project STAR index:

STAR_remove_genome:
    module:             STAR_LoadRemoveGenome
    base:               STAR_map
    script_path:        '{Vars.paths.STAR}STAR'
    genome:             remove
    qsub_params:
        queue:          queue.q
        node:           {Vars.nodes}

References

Dobin, A., Davis, C.A., Schlesinger, F., Drenkow, J., Zaleski, C., Jha, S., Batut, P., Chaisson, M. and Gingeras, T.R., 2013. STAR: ultrafast universal RNA-seq aligner. Bioinformatics, 29(1), pp.15-21.

samtools *

Authors

Menachem Sklarz

Affiliation

Bioinformatics core facility

Organization

National Institute of Biotechnology in the Negev, Ben Gurion University.

A class that defines a module for executing samtools on a SAM or BAM file.

Warning

This module is in beta stage. Please report issues and we’ll try solving them

Attention

The module was tested on samtools 1.9

Currently, the samtools programs included in the module are the following:

  • view

  • sort

  • index

  • flagstat

  • stats

  • idxstats

  • depth

  • fastq/a

  • merge

  • mpileup

Note

Order of samtools subprogram execution

  • The samtools programs are executed in the order given in the parameter file

  • File types are passed from one program to the next

  • In order to execute one program more than once, append digits to the program name, e.g. sort2, index3 etc.

Arguments can be passed to the tools following the program name in the parameter file, e.g.:

sort: -n -@ 10

Alternatively, they can be passed in a redirects block:

sort:
    redirects: -n -@ 10

Please do NOT pass input and output arguments - they are set by the module.

Some of the tools are defined only when the scope is sample:

  • merge merges the sample-wise BAM files into a project BAM file.

  • mpileup creates a project VCF/BCF/mpileup file from the sample BAM files.

Attention

Treatment of regions

If you want to limit the program to a specific region, pass the program name a block with a ‘region’ section. If you want to set the region and pass some redirects, add a ‘redirects’ section as well. For example:

mpileup:
    redirects:      --max-depth INT -v
    region:         chr2:212121-32323232

Attention

Treatment of BED files

In samtools view, bedcov, depth and mpileup, you can pass a BED file by adding a bed field in the tool block, with one of the following values:

  • sample - use a sample-scope BED file

  • project - use a project-scope BED file

  • A full path to a BED file.

Example:

view:
     redirects:      -uh  -q 30 -@ 20 -F 4
     bed:            /path/to/external/bed

Requires

  • A SAM file in the following location:

    • sample_data[<sample>]["sam"] (for scope=sample)

    • sample_data["project_data"]["sam"] (for scope=project)

  • Or a BAM file in:

    • sample_data[<sample>]["bam"] (for scope=sample)

    • sample_data["project_data"]["bam"] (for scope=project)

Note

If both BAM and SAM files exist, select the one to use with type2use (see section Parameters that can be set).

Output

Depending on the parameters, will put files in different types (e.g. bam, cram, sam, bam, bai, crai, vcf, bcf, mpileup, fasta.{F,R,S}, fastq.{F,R,S}) Please use stop_and_show to see the types produced by your instance of samtools_new.

Note

If scope is set to project, the above mentioned output files will be created in the project scope.

Note

merge and mpileup are only defined when scope is sample. See above

By default, all files are saved. To keep only the output from specific programs, add a keep_output section containing a list of programs for which the output should be saved. All other files will be discarded.

Parameters that can be set

Parameters that can be set:

Parameter

Values

Comments

project

sample|project

Scope of SAM/BAM top operate on. Defaults to sample.

view

e.g.: -buh -q 30

samtools view parameters.

sort

e.g.: -@ 20

samtools sort parameters.

index

samtools index parameters.

flagstat

Leave empty. flagstat takes no parameters

stats

samtools stats parameters

idxstats

samtools idxstats parameters

fastq/a

samtools fastq/a parameters

merge

samtools merge parameters

region

A region to limit the region-limitable programs, such as view, merge, mpileup, etc..

type2use

sam|bam

Type of file to use. Must exist in scope

keep_output

[sort, view, sort2]

A list of programs for which to store the output files. By deafult, all files are saved.

Lines for parameter file

sam_bwt1:
    module:             samtools_new
    base:               bwt1
    script_path:        {Vars.paths.samtools}
    qsub_params:
        -pe:            shared 20
    region:             chr2:212121-32323232
    scope:              sample
    # First 'view'. Use FLAG to filter alignments:
    view:               -uh  -q 30 -@ 20 -F 4 -O bam
    # First 'sort'. Sort by coordinates:
    sort:               -@ 20
    # Second 'view'. Use region to filter alignments:
    view2:
        redirects:      -buh  -q 30 -@ 20
        region:         chr2:212121-32323232
    index:
    flagstat:
    stats:              --remove-dups
    idxstats:
    # Second 'sort'. Sort by name:
    sort2:               -n -@ 20
    # Get sequences from name-sorted BAM file:
    fastq:
    # Merge BAM name sorted BAM files
    merge:
        region:         chr2:212121-32323232
    # Create VCF from Merge BAM name sorted BAM files
    mpileup:
        redirects:      --max-depth INT -v
        region:         chr2:212121-32323232
    keep_output:        [sort, view, index, flagstat, stats, fastq, mpileup, merge]
    # stop_and_show:

References

Li, H., Handsaker, B., Wysoker, A., Fennell, T., Ruan, J., Homer, N., Marth, G., Abecasis, G. and Durbin, R., 2009. The sequence alignment/map format and SAMtools. Bioinformatics, 25(16), pp.2078-2079.

Multiqc *

Authors

Menachem Sklarz

Affiliation

Bioinformatics core facility

Organization

National Institute of Biotechnology in the Negev, Ben Gurion University.

A module for preparing a MultiQC report for all samples.

Tip

By default, the module will search for parsable reports in the directories of all the modules in the branch leading to this instance. To search only in the directories of the explicit base steps, specify the bases_only parameter.

Requires

  • No real requirements. Will give a report with information if one of the base steps produces reports that MultiQC can read, e.g. fastqc, bowtie2, samtools etc.

Output

  • puts report dir in the following slot:

    • self.sample_data[<sample>]["Multiqc_report"]

Parameters that can be set

Parameter

Values

Comments

bases_only

Search directories of explicit base steps only.

Lines for parameter file

firstMultQC:
    module: Multiqc
    base:
        - sam_bwt2_1
        - fqc_trim1
    bases_only:
    script_path: /path/to/multiqc

References

Ewels, P., Magnusson, M., Lundin, S. and Käller, M., 2016. MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics, 32(19), pp.3047-3048.

RSEM

Authors

Liron Levin

Affiliation

Bioinformatics core facility

Organization

National Institute of Biotechnology in the Negev, Ben Gurion University.

Short Description

A module for running RSEM

Requires

  • fastq file in

    self.sample_data[sample]["fastq.F"] self.sample_data[sample]["fastq.R"] self.sample_data[sample]["fastq.S"]

  • or bam file in

    self.sample_data[sample]["bam"]

Output

  • puts output bam files (if the input is fastq) in:

    self.sample_data[sample]["bam"]

  • puts the location of RSEM results in:

    self.sample_data[sample]["RSEM"] self.sample_data[sample]["genes.results"] self.sample_data[sample]["isoforms.results"]

Parameters that can be set

Parameter

Values

Comments

mode

transcriptome/genome

Is the reference is a genome or a transcriptome?

gff3

None

Use if the mode is genome and the annotation file is in gff3 format

Comments

  • This module was tested on:

    RSEM v1.2.25 bowtie2 v2.2.6

Lines for parameter file

Step_Name:                                                   # Name of this step
    module: RSEM                                             # Name of the module used
    base:                                                    # Name of the step [or list of names] to run after [must be after a bam file generator step or merge with fastq files]
    script_path:                                             # Command for running the RSEM script 
    qsub_params:
        -pe:                                                 # Number of CPUs to reserve for this analysis
    mode:                                                    # transcriptome or genome
    export_transcriptome:                                    # In genome mode set the extracted transcriptome as the new project level fasta.nucl and extract the ranscript-to-gene-map file as project level gene_trans_map
    annotation:                                              # For Genome mode: the location of GTF file [the default] , for GFF3 use the gff3 flag. For Transcriptome mode: transcript-to-gene-map file.
                                                             # If annotation is set to Trinity the transcript-to-gene-map file will be generated using the from_Trinity_to_gene_map script
                                                             # If not set will use only the reference file as unrelated transcripts
    from_Trinity_to_gene_map_script_path:                    # If the mode is transcriptome and the reference was assembled using Trinity it is possible to generate the transcript-to-gene-map file automatically using this script
                                                             # If annotation is set to Trinity and this line is empty or missing it will try using the module's associated script
    gff3:                                                    # Use if the mode is genome and the annotation file is in gff3 format
    mapper:                                                  # bowtie/bowtie2/star 
    mapper_path:                                             # Location of mapper script
    rsem_prepare_reference_script_path:                      # Location of preparing reference script
    plot_stat:                                               # Generate statistical plots
    plot_stat_script_path:                                   # Location of statistical plot generating script
    reference:                                               # The reference genome/transcriptome location [FASTA file]
    rsem_generate_data_matrix_script_path:                   # Location of the final matrix generating script
                                                             # If this line is empty or missing it will try using the module's associated script
    redirects:
        --append-names:                                      # RSEM will append gene_name/transcript_name to the result files
        --estimate-rspd:                                     # Enables RSEM to learn from the data how the reads are distributed across a transcript
        -p:                                                  # Number of CPUs to use in this analysis
        --bam:                                               # Will use bam files and not fastq
        --no-bam-output:
        --output-genome-bam:                                 # Alignments in genomic coordinates (only if mode is genome)

References

Li, Bo, and Colin N. Dewey. “RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome.” BMC bioinformatics 12.1 (2011): 323.‏

htseq_count

Authors

Menachem Sklarz

Affiliation

Bioinformatics core facility

Organization

National Institute of Biotechnology in the Negev, Ben Gurion University.

A module for running htseq-count:

See htseq-count documentation.

Requires

  • fastq files in one of the following slots:

    • sample_data[<sample>]["bam"]

    • sample_data[<sample>]["sam"]

Output

  • Puts the output file in:

    self.sample_data[<sample>]["HTSeq.counts"]

Parameters that can be set

Parameter

Values

Comments

gff

path to bowtie1 index

If not given, will look for a project bowtie1 index and then for a sample bowtie1 index

-f|–format

sam | bam

In redirects. Tells htseq-count which file to use. If not specified, will use whichever file exists.

Lines for parameter file

For external index:

htseq_c1:
    module:         htseq_count
    base:           samtools_STAR1
    script_path:    /storage16/app/bioinfo/python_packages/bin/htseq-count
    gtf:            /fastspace/bioinfo_databases/STAR_GRCh38_Gencode21/gencode.v21.annotation.gtf
    redirects:
        --format:   bam
        -s:         'no'
        -m:         intersection-nonempty

References

Anders, S., Pyl, P.T. and Huber, W., 2015. HTSeq—a Python framework to work with high-throughput sequencing data. Bioinformatics, 31(2), pp.166-169.

RSEM_prep

Authors

Menachem Sklarz

Affiliation

Bioinformatics core facility

Organization

National Institute of Biotechnology in the Negev, Ben Gurion University.

A module for running rsem-prepare-reference:

Requires

  • fasta files in one of the following slots:

    • sample_data["fasta.nucl"] (scope = project)

    • sample_data[<sample>]["fasta.nucl"] (scope = sample)

  • If neither exists, please supply reference parameter.

Attention

If type “gene_trans_map” exists, its value will be used for “–transcript-to-gene-map”, unless “–transcript-to-gene-map” is explicitly passed in redirects!

Output

Puts output index files in one of the following slot:

  • self.sample_data[<sample>]["RSEM.index"]

  • self.sample_data["project_data"]["RSEM.index"]

Parameters that can be set

Parameter

Values

Comments

scope

project | sample

Where to take the reference from

reference

path to reference

Use this fasta file. See the definition for reference_fasta_file(s) in the ARGUMENTS section of rsem-prepare-reference help

Lines for parameter file

RSEM_prep_ind:
    module:             RSEM_prep
    base:               merge1
    script_path:        /path/to/RSEM
    reference:              /path/to/fasta
    redirects:
        --gtf:          /path/to/gtf
        --transcript-to-gene-map: /path/to/map_file

References

RSEM_mapper

Authors

Menachem Sklarz

Affiliation

Bioinformatics core facility

Organization

National Institute of Biotechnology in the Negev, Ben Gurion University.

A module for running rsem-calculate-expression:

Requires

  • fasta files in one of the following slots:

    • sample_data["project_data"]["fasta.nucl"] (scope = project)

    • sample_data[<sample>]["fasta.nucl"] (scope = sample)

  • If neither exists, please supply reference parameter.

Output

Puts output index files in one of the following slot:

  • self.sample_data[<sample>]["genes.counts"]

  • self.sample_data[<sample>]["isoforms.counts"]

And the following BAMs, depending on redirected params:

  • self.sample_data[<sample>]["genome.unsorted.bam"]

  • self.sample_data[<sample>]["genome.bam"]

  • self.sample_data[<sample>]["transcript.unsorted.bam"]

  • self.sample_data[<sample>]["transcript.bam"]

Parameters that can be set

Parameter

Values

Comments

scope

project | sample

The scope of the RSEM index. Must match the scope in the RSEM_prep instance.

result2use

genes | isoforms

Summarize counts at the gene or isoform level.

Lines for parameter file

Mapping fastq files:

RSEM_map:
    module:             RSEM_mapper
    base:               merge1
    script_path:        {Vars.paths.RSEM.rsem-calculate-expression}
    reference:              /path/to/fasta
    redirects:
        --gtf:          /path/to/gtf
        --transcript-to-gene-map: /path/to/map_file

Parsing an existing BAM alignment file:

RSEM_parse_bam:
    module:         RSEM_mapper
    base:           mv_transcript_bam_to_bam
    script_path:    {Vars.paths.RSEM.rsem-calculate-expression}
    scope:          project
    qsub_params:
        -pe:        shared 20
    redirects:
        --num-threads:  20

References