Mapping

Modules included in this section

bowtie2_builder ^*
bowtie2_mapper ^*
bowtie1_builder ^*
bowtie1_mapper ^*
bwa_builder ^*
bwa_mapper ^*
STAR_mapper
STAR_builder
STAR_LoadRemoveGenome
Multiqc ^*
RSEM
htseq_count
RSEM_prep
RSEM_mapper

`bowtie2_builder` ^*

Authors: Menachem Sklarz
Affiliation: Bioinformatics core facility
Organization: National Institute of Biotechnology in the Negev, Ben Gurion University.

A module for running bowtie2 index builder:

Builds a bowtie2 index for a fasta file stored at the project or sample level.

Determine which one will be used by specifying scope as either project or sample.

Requires

fasta files in one of the following slots:
- sample_data[<sample>]["fasta.nucl"]
- sample_data["fasta.nucl"]

Output

Puts output index files in one of the following slots:
- self.sample_data[<sample>]["bowtie2.index"]
- self.sample_data["project_data"]["bowtie2.index"]
Puts the fasta file in the following slot:
- self.sample_data[<sample>]["reference"]

Parameters that can be set

Parameter	Values	Comments
scope	project \| sample	Indicates whether to use a project fasta or a sample fasta.

Lines for parameter file

bwt2_build:
    module: bowtie2_builder
    base: trinity1
    script_path: /path/to/bowtie2-build
    scope: project

References

Langmead, B. and Salzberg, S.L., 2012. Fast gapped-read alignment with Bowtie 2. Nature methods, 9(4), pp.357-359.

`bowtie2_mapper` ^*

Authors: Menachem Sklarz
Affiliation: Bioinformatics core facility
Organization: National Institute of Biotechnology in the Negev, Ben Gurion University.

A module for running bowtie2 mapper:

The reads stored in each sample are aligned to one of the following bowtie2 indices:

An external index passed with the -x parameter.
A bowtie2 index on a project fasta files, such as an assembly from all samples. Specify with bowtie2_mapper:scope project
A sample bowtie2 index on a sample-specific fasta file, such as from a sample-wise assembly or from the sample file. Specify with bowtie2_mapper:scope sample

The latter two options must come after a bowtie2_builder instance.

Tip

See the documentation for the bowtie2_builder module.

Note

fastq files are never defined project-wide

The scope parameter controls the origin of the index files, i.e. wheather the fasta file to map to is an assembly of the sample reads (scope: sample) or an assembly of all reads in the project (scope: project). The reads to be mapped are always saple reads, as a ‘fastq’ slot is not defined at the project level.

Requires

fastq files in one of the following slots:
- sample_data[<sample>]["fastq.F"]
- sample_data[<sample>]["fastq.R"]
- sample_data[<sample>]["fastq.S"]

Output

Puts output sam files in the following slots:
- self.sample_data[<sample>]["sam"]
Puts the name of the mapper in:
- self.sample_data[<sample>]["mapper"]
puts fasta of reference genome (if one is given in param file) in:
- self.sample_data[<sample>]["reference"]

Parameters that can be set

Parameter	Values	Comments
-x	path to bowtie2 index	If not given, will look for a project bowtie2 index and then for a sample bowtie2 index
ref_genome	path to genome fasta	If -x is NOT given, will use the equivalent internal fasta. If -x is passed, and ref_genome is NOT passed, will leave the reference slot empty
get_map_log		Store the log produced by bowtie2 (This is bowtie2 standard output)
scope	project \| sample	Indicates whether to use a project or sample bowtie2 index.

Lines for parameter file

For external index:

bwt2_1:
    module: bowtie2_mapper
    base: trim1
    script_path: /path/to/bowtie2
    qsub_params:
        -pe: shared 20
    get_map_log:
    ref_genome: /path/to/ref_genome.fna
    redirects:
        -p: 20
        -q: null
        -x: /path/to/bowtie2.index/ref_genome

Using a bowtie2 index constructed from a project fasta:

bwt2_1:
    module: bowtie2_mapper
    base: bwt2_bld1
    script_path: /path/to/bowtie2
    qsub_params:
        -pe: shared 20
    get_map_log:
    scope: project
    redirects:
        -p: 20
        -q: null

References

Langmead, B. and Salzberg, S.L., 2012. Fast gapped-read alignment with Bowtie 2. Nature methods, 9(4), pp.357-359.

`bowtie1_builder` ^*

Authors: Menachem Sklarz
Affiliation: Bioinformatics core facility
Organization: National Institute of Biotechnology in the Negev, Ben Gurion University.

A module for running bowtie1 index builder:

Requires

fasta files in one of the following slots:
- sample_data["fasta.nucl"]
- sample_data[<sample>]["fasta.nucl"]

output

Puts output index files in one of the following slot:

self.sample_data[<sample>]["bowtie1.index"]
self.sample_data["project_data"]["bowtie1.index"]

Parameters that can be set

Parameter	Values	Comments
scope	path to bowtie1 index	If not given, will look for a project bowtie1 index and then for a sample bowtie1 index

Lines for parameter file

bwt1_bld_ind:
    module: bowtie1_builder
    base: trinity1
    script_path: /path/to/bowtie
    scope: project

References

Langmead, B., Trapnell, C., Pop, M. and Salzberg, S.L., 2009. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome biology, 10(3), p.R25.

`bowtie1_mapper` ^*

Authors: Menachem Sklarz
Affiliation: Bioinformatics core facility
Organization: National Institute of Biotechnology in the Negev, Ben Gurion University.

A module for running bowtie1 mapper:

The reads stored in each sample are aligned to one of the following bowtie indices:

An external index passed with the ebwt parameter.
A bowtie index on a project fasta files, such as an assembly from all samples. Specify with bowtie1_mapper:scope project
A sample bowtie1 index on a sample-specific fasta file, such as from a sample-wise assembly or from the sample file. Specify with bowtie1_mapper:scope sample

The latter two options must come after a bowtie1_builder instance.

Requires

fastq files in one of the following slots:
- sample_data[<sample>]["fastq.F"]
- sample_data[<sample>]["fastq.R"]
- sample_data[<sample>]["fastq.S"]

Output

Puts output sam files in the following slots:
self.sample_data[<sample>]["sam"]
Puts the name of the mapper in:
self.sample_data[<sample>]["mapper"]
Puts fasta of reference genome (if one is given in param file) in:
self.sample_data[<sample>]["reference"]

Parameters that can be set

Parameter	Values	Comments
ebwt	path to bowtie1 index	If not given, will look for a project bowtie1 index and then for a sample bowtie1 index
ref_genome	path to genome fasta	If ebwt is NOT given, will use the equivalent internal fasta. If ebwt IS given, and ref_genome is NOT passed, will leave the reference slot empty.
scope	project \| sample	Indicates whether to use a project or sample bowtie1 index.

Lines for parameter file

For external index:

bwt1:
    module: bowtie1_mapper
    base: trim1
    script_path: /path/to/bowtie
    qsub_params:
        -pe: shared 20
    ebwt: /path/to/bowtie1.index/ref_genome
    ref_genome: /path/to/ref_genome.fna
    redirects:
        -p: 20

For project bowtie index:

bwt1_1:
    module: bowtie1_mapper
    base: bwt1_bld_ind
    script_path: /path/to/bowtie
    scope: project

References

Langmead, B., Trapnell, C., Pop, M. and Salzberg, S.L., 2009. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome biology, 10(3), p.R25.

`bwa_builder` ^*

Authors: Menachem Sklarz
Affiliation: Bioinformatics core facility
Organization: National Institute of Biotechnology in the Negev, Ben Gurion University.

A module for running bwa index builder:

Builds a bwa index for a fasta file stored at the project or sample level.

Determine which one will be used by specifying scope as either project or sample.

Requires

fasta files in one of the following slots:
- sample_data[<sample>]["fasta.nucl"]
- sample_data["fasta.nucl"]

Output

Puts output index files in one of the following slots:
- self.sample_data[<sample>]["bwa_index"]
- self.sample_data["project_data"]["bwa_index"]
Puts the fasta file in one of the following slot:
- self.sample_data[<sample>]["reference"]

Parameters that can be set

Parameter	Values	Comments
scope	project \| sample	Indicates whether to use a project fasta or a sample fasta.

Lines for parameter file

bwa_bld_ind:
    module: bwa_builder
    base: spades1
    script_path: /path/to/bwa index
    scope: project

References

Li, H. and Durbin, R., 2009. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics, 25(14), pp.1754-1760.

`bwa_mapper` ^*

Authors: Menachem Sklarz
Affiliation: Bioinformatics core facility
Organization: National Institute of Biotechnology in the Negev, Ben Gurion University.

A module for running bwa mapper:

The reads stored in each sample are aligned to one of the following bwa indices:

An external index passed with the ref_index parameter.
A bwa index on a project fasta files, such as an assembly from all samples. Specify with bwa_mapper:scope project
A sample bwa index on a sample-specific fasta file, such as from a sample-wise assembly or from the sample fasta file. Specify with bwa_mapper:scope sample

The latter two options must come after a bwa_builder instance.

Requires

fastq files in one of the following slots:
- sample_data[<sample>]["fastq.F"]
- sample_data[<sample>]["fastq.R"]
- sample_data[<sample>]["fastq.S"]
If mod is one of samse, sampe, the sai files are required as well (created by a bwa aln step:
- self.sample_data[<sample>]["saiF|saiR|saiS"]

Output

Puts output sam files in the following slots:
- If mod is one of mem, samse, sampe, bwasw:
  
  self.sample_data[<sample>]["sam"]
- If mod is aln:
  
  self.sample_data[<sample>]["saiF|saiR|saiS"]
Puts the name of the mapper in:
- self.sample_data[<sample>]["mapper"]
puts fasta of reference genome (if one is given in param file) in:
- self.sample_data[<sample>]["reference"]

Parameters that can be set

Parameter	Values	Comments
ref_index	path to bwa index	If not given, will look for a project bwa index and then for a sample bwa index
ref_genome	path to genome fasta	If ref_index is NOT given, will use the equivalent internal fasta. If ref_index is passed, and ref_genome is NOT passed, will leave the reference slot empty
scope	project \| sample	Indicates whether to use a project or sample bwa index.

Lines for parameter file

For external index:

Using mem:

bwa_mem_1:
    module: bwa_mapper
    base: trim1
    script_path: /path/to/bwa
    mod: mem
    qsub_params:
        -pe: shared 20
    ref_genome: /path/to/ref_genome.fna
    ref_index: /path/to/bwa_index/ref_genome
    redirects:
        -t: 20

2. Using ``aln - samse/sampe``:

bwa_aln_1:
    module: bwa_mapper
    base: trim1
    script_path: /path/to/bwa_mapper
    mod: aln
    qsub_params:
        -pe: shared 20
    ref_genome: /path/to/ref_genome.fna
    ref_index: /path/to/bwa_index/ref_genome
    redirects:
        -t: 20
bwa_samse_1:
    module: bwa_mapper
    base: bwt2_1
    script_path: /path/to/bwa
    mod: samse
    ref_genome: /path/to/ref_genome.fna
    ref_index: /path/to/bwa_index/ref_genome

For project bwa index:

bwa_1:
    module: bwa_mapper
    base: bwa_bld_ind
    script_path: /path/to/bwa
    mod: mem
    scope: project

References

Li, H. and Durbin, R., 2009. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics, 25(14), pp.1754-1760.

`STAR_mapper`

Authors: Menachem Sklarz
Affiliation: Bioinformatics core facility
Organization: National Institute of Biotechnology in the Negev, Ben Gurion University.

A module for running STAR mapper:

Requires

fastq files in one of the following slots:
- sample_data[<sample>]["fastq.F"]
- sample_data[<sample>]["fastq.R"]
- sample_data[<sample>]["fastq.S"]
If scope is set (must come after STAR_builder module which populates the required slots):
- STAR index directories in:
  sample_data[<sample>]["STAR.index"] if scope = “sample”
  
  sample_data["STAR.index"] if scope = “project”
- Reference fasta files in:
  sample_data[<sample>]["STAR.fasta"] if scope = “sample”
  
  sample_data["STAR.fasta"] if scope = “project”

Output

Puts output sam files in the following slots:
- self.sample_data[<sample>]["sam"]
Alternatively, if --outSAMtype is set to BAM, puts output BAM files in the following slots:
- self.sample_data[<sample>]["bam"]
- self.sample_data[<sample>]["bam_unsorted"]
High confidence collapsed splice junctions (SJ.out.tab file) will be stored in:
- self.sample_data[<sample>]["SJ.out.tab"]
If --quantMode contains TranscriptomeSAM, alignments BAM translated into transcript coordinates will be stored in:
- self.sample_data[<sample>]["TranscriptomeSAM"]
If --quantMode contains GeneCounts, the ReadsPerGene.out.tab file will be stored:
- self.sample_data[<sample>]["GeneCounts"]
If --outWigType is set, will store outputs in:
- if --outWigType is wiggle
  self.sample_data[<sample>]["wig2_UniqueMultiple"]
  
  self.sample_data[<sample>]["wig2_Unique"]
  
  self.sample_data[<sample>]["wig1_UniqueMultiple"]
  
  self.sample_data[<sample>]["wig1_Unique"]
  
  self.sample_data[<sample>]["wig"]
- if --outWigType is bedGraph
  self.sample_data[<sample>]["bdg2_UniqueMultiple"]
  
  self.sample_data[<sample>]["bdg2_Unique"]
  
  self.sample_data[<sample>]["bdg1_UniqueMultiple"]
  
  self.sample_data[<sample>]["bdg1_Unique"]
  
  self.sample_data[<sample>]["bdg"]
Puts the name of the mapper in:
self.sample_data[<sample>]["mapper"]
Puts fasta of reference genome (if one is given in param file) in:
self.sample_data[<sample>]["reference"]

Parameters that can be set

Parameter	Values	Comments
ref_genome	path to genome fasta
scope	project \| sample	The scope from which to take the genome directory

Note

You can set the RG atrribute of the resulting SAM/BAM files with the redirected parameter --outSAMattrRGline This will set the equivalent STAR parameter.

By default, the parameter will be set to include ID and SM tags, both set to the sample name. You can set the SM tag, but any ID tags will be removed and replaced with the sample name.

Lines for parameter file

For external index:

STAR_map:
    module:             STAR_mapper
    base:               STAR_bld_ind
    script_path:        /path/to/STAR
    redirects:
        --readMapNumber:    1000
        --genomeDir:        /path/to/genome/STAR_index/

For project STAR index:

STAR_map:
    module:             STAR_mapper
    base:               STAR_bld_ind
    script_path:        /path/to/STAR
    scope:              project
    redirects:
        --readMapNumber:    1000

References

Dobin, A., Davis, C.A., Schlesinger, F., Drenkow, J., Zaleski, C., Jha, S., Batut, P., Chaisson, M. and Gingeras, T.R., 2013. STAR: ultrafast universal RNA-seq aligner. Bioinformatics, 29(1), pp.15-21.

`STAR_builder`

Authors: Menachem Sklarz
Affiliation: Bioinformatics core facility
Organization: National Institute of Biotechnology in the Negev, Ben Gurion University.

A module for running STAR genome index construction:

Requires

fasta files in one of the following slots:
- sample_data["fasta.nucl"]
- sample_data[<sample>]["fasta.nucl"]
If --sjdbGTFfile is set in redirects, but left empty, will expect to find a GTF file here:
- sample_data["gtf"] if scope = “project”
- sample_data[<sample>]["gtf"] if scope = “sample”
If --sjdbFileChrStartEnd is set in redirects, but left empty, will expect to find an SJ file here:
- sample_data["SJ.out.tab"] if scope = “project”
- sample_data[<sample>]["SJ.out.tab"] if scope = “sample”

Output

Puts output index files in one of the following slot:

self.sample_data[<sample>]["STAR.index"]

self.sample_data["project_data"]["STAR.index"]

Puts the reference fasta file in one of the following slot:

self.sample_data[<sample>]["STAR.fasta"]

self.sample_data["project_data"]["STAR.fasta"]

Parameters that can be set

Parameter	Values	Comments
scope	project \| sample	Not used

Lines for parameter file

STAR_bld_ind:
    module:             STAR_builder
    base:               trinity1
    script_path:        /path/to/STAR
    scope:              project
    qsub_params:
        queue:          star.q
    redirects:
        --genomeSAindexNbases:  12
        --genomeChrBinNbits:    10

References

Dobin, A., Davis, C.A., Schlesinger, F., Drenkow, J., Zaleski, C., Jha, S., Batut, P., Chaisson, M. and Gingeras, T.R., 2013. STAR: ultrafast universal RNA-seq aligner. Bioinformatics, 29(1), pp.15-21.

`STAR_LoadRemoveGenome`

Authors: Menachem Sklarz
Affiliation: Bioinformatics core facility
Organization: National Institute of Biotechnology in the Negev, Ben Gurion University.

A module for loading a STAR genome into RAM for use by subsequent STAR mapping jobs.

Note

This module saves memory and time. Set parameter --genomeLoad in the STAR mapping instance to LoadAndKeep. This will load the genome once into memory and use it repeatedly for all instances executed on the same node. When all mapping jobs are completed, Scripts produced by this instance will remove the genome from RAM for all nodes used.

Tip

Make sure you set the node parameter in qsub_params to all the nodes in use by the base STAR_mapper instance.

Attention

Currently defined for project-scope or external genomes only. Not used for sample-scope genomes.

Note

Loading a genome is not really required. It will be loaded by the first instance of STAR.

Requires

A STAR genome in:
- sample_data["STAR.index"]

Alternatively, a STAR genome index can be passed with the --genomeDir parameter.

Output

No output is created

Parameters that can be set

Parameter	Values	Comments
genome	load\|remove	Load or remove genome from RAM
qsub_params:node		Nodes on which to load/unload genome
scope	project \| sample	The scope from which to take the genome directory. Currently not in use

Lines for parameter file

For external index:

STAR_remove_genome:
    module:             STAR_LoadRemoveGenome
    base:               STAR_map
    script_path:        '{Vars.paths.STAR}STAR'
    genome:             remove
    qsub_params:
        queue:          queue.q
        node:           {Vars.nodes}
    redirects:
        --genomeDir:    /path/to/STAR/genome_directory

For project STAR index:

STAR_remove_genome:
    module:             STAR_LoadRemoveGenome
    base:               STAR_map
    script_path:        '{Vars.paths.STAR}STAR'
    genome:             remove
    qsub_params:
        queue:          queue.q
        node:           {Vars.nodes}

References

Dobin, A., Davis, C.A., Schlesinger, F., Drenkow, J., Zaleski, C., Jha, S., Batut, P., Chaisson, M. and Gingeras, T.R., 2013. STAR: ultrafast universal RNA-seq aligner. Bioinformatics, 29(1), pp.15-21.

`Multiqc` ^*

Authors: Menachem Sklarz
Affiliation: Bioinformatics core facility
Organization: National Institute of Biotechnology in the Negev, Ben Gurion University.

A module for preparing a MultiQC report for all samples.

Tip

By default, the module will search for parsable reports in the directories of all the modules in the branch leading to this instance. To search only in the directories of the explicit base steps, specify the bases_only parameter.

Requires

No real requirements. Will give a report with information if one of the base steps produces reports that MultiQC can read, e.g. fastqc, bowtie2, samtools etc.

Output

puts report dir in the following slot:
- self.sample_data[<sample>]["Multiqc_report"]

Parameters that can be set

Parameter	Values	Comments
bases_only		Search directories of explicit base steps only.

Lines for parameter file

firstMultQC:
    module: Multiqc
    base:
        - sam_bwt2_1
        - fqc_trim1
    bases_only:
    script_path: /path/to/multiqc

References

Ewels, P., Magnusson, M., Lundin, S. and Käller, M., 2016. MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics, 32(19), pp.3047-3048.

`RSEM`

Authors: Liron Levin
Affiliation: Bioinformatics core facility
Organization: National Institute of Biotechnology in the Negev, Ben Gurion University.

Short Description

A module for running RSEM

Requires

fastq file in
self.sample_data[sample]["fastq.F"] self.sample_data[sample]["fastq.R"] self.sample_data[sample]["fastq.S"]

or bam file in
self.sample_data[sample]["bam"]

Output

puts output bam files (if the input is fastq) in:
self.sample_data[sample]["bam"]

puts the location of RSEM results in:
self.sample_data[sample]["RSEM"] self.sample_data[sample]["genes.results"] self.sample_data[sample]["isoforms.results"]

Parameters that can be set

Parameter	Values	Comments
mode	transcriptome/genome	Is the reference is a genome or a transcriptome?
gff3	None	Use if the mode is genome and the annotation file is in gff3 format

Comments

This module was tested on:
RSEM v1.2.25 bowtie2 v2.2.6

Lines for parameter file

Step_Name:                                                   # Name of this step
    module: RSEM                                             # Name of the module used
    base:                                                    # Name of the step [or list of names] to run after [must be after a bam file generator step or merge with fastq files]
    script_path:                                             # Command for running the RSEM script 
    qsub_params:
        -pe:                                                 # Number of CPUs to reserve for this analysis
    mode:                                                    # transcriptome or genome
    export_transcriptome:                                    # In genome mode set the extracted transcriptome as the new project level fasta.nucl and extract the ranscript-to-gene-map file as project level gene_trans_map
    annotation:                                              # For Genome mode: the location of GTF file [the default] , for GFF3 use the gff3 flag. For Transcriptome mode: transcript-to-gene-map file.
                                                             # If annotation is set to Trinity the transcript-to-gene-map file will be generated using the from_Trinity_to_gene_map script
                                                             # If not set will use only the reference file as unrelated transcripts
    from_Trinity_to_gene_map_script_path:                    # If the mode is transcriptome and the reference was assembled using Trinity it is possible to generate the transcript-to-gene-map file automatically using this script
                                                             # If annotation is set to Trinity and this line is empty or missing it will try using the module's associated script
    gff3:                                                    # Use if the mode is genome and the annotation file is in gff3 format
    mapper:                                                  # bowtie/bowtie2/star 
    mapper_path:                                             # Location of mapper script
    rsem_prepare_reference_script_path:                      # Location of preparing reference script
    plot_stat:                                               # Generate statistical plots
    plot_stat_script_path:                                   # Location of statistical plot generating script
    reference:                                               # The reference genome/transcriptome location [FASTA file]
    rsem_generate_data_matrix_script_path:                   # Location of the final matrix generating script
                                                             # If this line is empty or missing it will try using the module's associated script
    redirects:
        --append-names:                                      # RSEM will append gene_name/transcript_name to the result files
        --estimate-rspd:                                     # Enables RSEM to learn from the data how the reads are distributed across a transcript
        -p:                                                  # Number of CPUs to use in this analysis
        --bam:                                               # Will use bam files and not fastq
        --no-bam-output:
        --output-genome-bam:                                 # Alignments in genomic coordinates (only if mode is genome)

References

Li, Bo, and Colin N. Dewey. “RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome.” BMC bioinformatics 12.1 (2011): 323.‏

`htseq_count`

Authors: Menachem Sklarz
Affiliation: Bioinformatics core facility
Organization: National Institute of Biotechnology in the Negev, Ben Gurion University.

A module for running htseq-count:

See htseq-count documentation.

Requires

fastq files in one of the following slots:
- sample_data[<sample>]["bam"]
- sample_data[<sample>]["sam"]

Output

Puts the output file in:
self.sample_data[<sample>]["HTSeq.counts"]

Parameters that can be set

Parameter	Values	Comments
gff	path to bowtie1 index	If not given, will look for a project bowtie1 index and then for a sample bowtie1 index
-f\|–format	sam \| bam	In redirects. Tells htseq-count which file to use. If not specified, will use whichever file exists.

Lines for parameter file

For external index:

htseq_c1:
    module:         htseq_count
    base:           samtools_STAR1
    script_path:    /storage16/app/bioinfo/python_packages/bin/htseq-count
    gtf:            /fastspace/bioinfo_databases/STAR_GRCh38_Gencode21/gencode.v21.annotation.gtf
    redirects:
        --format:   bam
        -s:         'no'
        -m:         intersection-nonempty

References

Anders, S., Pyl, P.T. and Huber, W., 2015. HTSeq—a Python framework to work with high-throughput sequencing data. Bioinformatics, 31(2), pp.166-169.

`RSEM_prep`

Authors: Menachem Sklarz
Affiliation: Bioinformatics core facility
Organization: National Institute of Biotechnology in the Negev, Ben Gurion University.

A module for running rsem-prepare-reference:

Requires

fasta files in one of the following slots:
- sample_data["fasta.nucl"] (scope = project)
- sample_data[<sample>]["fasta.nucl"] (scope = sample)
If neither exists, please supply reference parameter.

Attention

If type “gene_trans_map” exists, its value will be used for “–transcript-to-gene-map”, unless “–transcript-to-gene-map” is explicitly passed in redirects!

Output

Puts output index files in one of the following slot:

self.sample_data[<sample>]["RSEM.index"]

self.sample_data["project_data"]["RSEM.index"]

Parameters that can be set

Parameter	Values	Comments
scope	project \| sample	Where to take the reference from
reference	path to reference	Use this fasta file. See the definition for reference_fasta_file(s) in the ARGUMENTS section of rsem-prepare-reference help

Lines for parameter file

RSEM_prep_ind:
    module:             RSEM_prep
    base:               merge1
    script_path:        /path/to/RSEM
    reference:              /path/to/fasta
    redirects:
        --gtf:          /path/to/gtf
        --transcript-to-gene-map: /path/to/map_file

References

`RSEM_mapper`

Authors: Menachem Sklarz
Affiliation: Bioinformatics core facility
Organization: National Institute of Biotechnology in the Negev, Ben Gurion University.

A module for running rsem-calculate-expression:

Requires

fasta files in one of the following slots:
- sample_data["project_data"]["fasta.nucl"] (scope = project)
- sample_data[<sample>]["fasta.nucl"] (scope = sample)
If neither exists, please supply reference parameter.

Output

Puts output index files in one of the following slot:

self.sample_data[<sample>]["genes.counts"]

self.sample_data[<sample>]["isoforms.counts"]

And the following BAMs, depending on redirected params:

self.sample_data[<sample>]["genome.unsorted.bam"]

self.sample_data[<sample>]["genome.bam"]

self.sample_data[<sample>]["transcript.unsorted.bam"]

self.sample_data[<sample>]["transcript.bam"]

Parameters that can be set

Parameter	Values	Comments
scope	project \| sample	The scope of the RSEM index. Must match the scope in the RSEM_prep instance.
result2use	genes \| isoforms	Summarize counts at the gene or isoform level.

Lines for parameter file

Mapping fastq files:

RSEM_map:
    module:             RSEM_mapper
    base:               merge1
    script_path:        {Vars.paths.RSEM.rsem-calculate-expression}
    reference:              /path/to/fasta
    redirects:
        --gtf:          /path/to/gtf
        --transcript-to-gene-map: /path/to/map_file

Parsing an existing BAM alignment file:

RSEM_parse_bam:
    module:         RSEM_mapper
    base:           mv_transcript_bam_to_bam
    script_path:    {Vars.paths.RSEM.rsem-calculate-expression}
    scope:          project
    qsub_params:
        -pe:        shared 20
    redirects:
        --num-threads:  20