Mapping
Modules included in this section
bowtie2_builder
*
- Authors
Menachem Sklarz
- Affiliation
Bioinformatics core facility
- Organization
National Institute of Biotechnology in the Negev, Ben Gurion University.
A module for running bowtie2 index builder:
Builds a bowtie2 index for a fasta file stored at the project or sample level.
Determine which one will be used by specifying scope
as either project
or sample
.
Requires
fasta files in one of the following slots:
sample_data[<sample>]["fasta.nucl"]
sample_data["fasta.nucl"]
Output
- Puts output index files in one of the following slots:
self.sample_data[<sample>]["bowtie2.index"]
self.sample_data["project_data"]["bowtie2.index"]
- Puts the fasta file in the following slot:
self.sample_data[<sample>]["reference"]
Parameters that can be set
Parameter |
Values |
Comments |
---|---|---|
scope |
project | sample |
Indicates whether to use a project fasta or a sample fasta. |
Lines for parameter file
bwt2_build:
module: bowtie2_builder
base: trinity1
script_path: /path/to/bowtie2-build
scope: project
References
Langmead, B. and Salzberg, S.L., 2012. Fast gapped-read alignment with Bowtie 2. Nature methods, 9(4), pp.357-359.
bowtie2_mapper
*
- Authors
Menachem Sklarz
- Affiliation
Bioinformatics core facility
- Organization
National Institute of Biotechnology in the Negev, Ben Gurion University.
A module for running bowtie2 mapper:
The reads stored in each sample are aligned to one of the following bowtie2 indices:
An external index passed with the
-x
parameter.A bowtie2 index on a project fasta files, such as an assembly from all samples. Specify with
bowtie2_mapper:scope project
A sample bowtie2 index on a sample-specific fasta file, such as from a sample-wise assembly or from the sample file. Specify with
bowtie2_mapper:scope sample
The latter two options must come after a bowtie2_builder
instance.
Tip
See the documentation for the bowtie2_builder
module.
Note
fastq files are never defined project-wide
The scope
parameter controls the origin of the index files, i.e. wheather the fasta file to map to is an assembly of the sample reads (scope: sample) or an assembly of all reads in the project (scope: project). The reads to be mapped are always saple reads, as a ‘fastq’ slot is not defined at the project level.
Requires
fastq files in one of the following slots:
sample_data[<sample>]["fastq.F"]
sample_data[<sample>]["fastq.R"]
sample_data[<sample>]["fastq.S"]
Output
- Puts output sam files in the following slots:
self.sample_data[<sample>]["sam"]
- Puts the name of the mapper in:
self.sample_data[<sample>]["mapper"]
- puts fasta of reference genome (if one is given in param file) in:
self.sample_data[<sample>]["reference"]
Parameters that can be set
Parameter |
Values |
Comments |
---|---|---|
-x |
path to bowtie2 index |
If not given, will look for a project bowtie2 index and then for a sample bowtie2 index |
ref_genome |
path to genome fasta |
If -x is NOT given, will use the equivalent internal fasta. If -x is passed, and ref_genome is NOT passed, will leave the reference slot empty |
get_map_log |
Store the log produced by bowtie2 (This is bowtie2 standard output) |
|
scope |
project | sample |
Indicates whether to use a project or sample bowtie2 index. |
Lines for parameter file
For external index:
bwt2_1:
module: bowtie2_mapper
base: trim1
script_path: /path/to/bowtie2
qsub_params:
-pe: shared 20
get_map_log:
ref_genome: /path/to/ref_genome.fna
redirects:
-p: 20
-q: null
-x: /path/to/bowtie2.index/ref_genome
Using a bowtie2 index constructed from a project fasta:
bwt2_1:
module: bowtie2_mapper
base: bwt2_bld1
script_path: /path/to/bowtie2
qsub_params:
-pe: shared 20
get_map_log:
scope: project
redirects:
-p: 20
-q: null
References
Langmead, B. and Salzberg, S.L., 2012. Fast gapped-read alignment with Bowtie 2. Nature methods, 9(4), pp.357-359.
bowtie1_builder
*
- Authors
Menachem Sklarz
- Affiliation
Bioinformatics core facility
- Organization
National Institute of Biotechnology in the Negev, Ben Gurion University.
A module for running bowtie1 index builder:
Requires
fasta files in one of the following slots:
sample_data["fasta.nucl"]
sample_data[<sample>]["fasta.nucl"]
output
- Puts output index files in one of the following slot:
self.sample_data[<sample>]["bowtie1.index"]
self.sample_data["project_data"]["bowtie1.index"]
Parameters that can be set
Parameter |
Values |
Comments |
---|---|---|
scope |
path to bowtie1 index |
If not given, will look for a project bowtie1 index and then for a sample bowtie1 index |
Lines for parameter file
bwt1_bld_ind:
module: bowtie1_builder
base: trinity1
script_path: /path/to/bowtie
scope: project
References
Langmead, B., Trapnell, C., Pop, M. and Salzberg, S.L., 2009. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome biology, 10(3), p.R25.
bowtie1_mapper
*
- Authors
Menachem Sklarz
- Affiliation
Bioinformatics core facility
- Organization
National Institute of Biotechnology in the Negev, Ben Gurion University.
A module for running bowtie1 mapper:
The reads stored in each sample are aligned to one of the following bowtie indices:
An external index passed with the
ebwt
parameter.A bowtie index on a project fasta files, such as an assembly from all samples. Specify with
bowtie1_mapper:scope project
A sample bowtie1 index on a sample-specific fasta file, such as from a sample-wise assembly or from the sample file. Specify with
bowtie1_mapper:scope sample
The latter two options must come after a bowtie1_builder
instance.
Requires
fastq files in one of the following slots:
sample_data[<sample>]["fastq.F"]
sample_data[<sample>]["fastq.R"]
sample_data[<sample>]["fastq.S"]
Output
- Puts output sam files in the following slots:
self.sample_data[<sample>]["sam"]
- Puts the name of the mapper in:
self.sample_data[<sample>]["mapper"]
- Puts fasta of reference genome (if one is given in param file) in:
self.sample_data[<sample>]["reference"]
Parameters that can be set
Parameter |
Values |
Comments |
---|---|---|
ebwt |
path to bowtie1 index |
If not given, will look for a project bowtie1 index and then for a sample bowtie1 index |
ref_genome |
path to genome fasta |
If ebwt is NOT given, will use the equivalent internal fasta. If ebwt IS given, and ref_genome is NOT passed, will leave the reference slot empty. |
scope |
project | sample |
Indicates whether to use a project or sample bowtie1 index. |
Lines for parameter file
For external index:
bwt1:
module: bowtie1_mapper
base: trim1
script_path: /path/to/bowtie
qsub_params:
-pe: shared 20
ebwt: /path/to/bowtie1.index/ref_genome
ref_genome: /path/to/ref_genome.fna
redirects:
-p: 20
For project bowtie index:
bwt1_1:
module: bowtie1_mapper
base: bwt1_bld_ind
script_path: /path/to/bowtie
scope: project
References
Langmead, B., Trapnell, C., Pop, M. and Salzberg, S.L., 2009. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome biology, 10(3), p.R25.
bwa_builder
*
- Authors
Menachem Sklarz
- Affiliation
Bioinformatics core facility
- Organization
National Institute of Biotechnology in the Negev, Ben Gurion University.
A module for running bwa index builder:
Builds a bwa index for a fasta file stored at the project or sample level.
Determine which one will be used by specifying scope
as either project
or sample
.
Requires
fasta files in one of the following slots:
sample_data[<sample>]["fasta.nucl"]
sample_data["fasta.nucl"]
Output
- Puts output index files in one of the following slots:
self.sample_data[<sample>]["bwa_index"]
self.sample_data["project_data"]["bwa_index"]
- Puts the fasta file in one of the following slot:
self.sample_data[<sample>]["reference"]
Parameters that can be set
Parameter |
Values |
Comments |
---|---|---|
scope |
project | sample |
Indicates whether to use a project fasta or a sample fasta. |
Lines for parameter file
bwa_bld_ind:
module: bwa_builder
base: spades1
script_path: /path/to/bwa index
scope: project
References
Li, H. and Durbin, R., 2009. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics, 25(14), pp.1754-1760.
bwa_mapper
*
- Authors
Menachem Sklarz
- Affiliation
Bioinformatics core facility
- Organization
National Institute of Biotechnology in the Negev, Ben Gurion University.
A module for running bwa mapper:
The reads stored in each sample are aligned to one of the following bwa indices:
An external index passed with the
ref_index
parameter.A bwa index on a project fasta files, such as an assembly from all samples. Specify with
bwa_mapper:scope project
A sample bwa index on a sample-specific fasta file, such as from a sample-wise assembly or from the sample fasta file. Specify with
bwa_mapper:scope sample
The latter two options must come after a bwa_builder
instance.
Requires
fastq files in one of the following slots:
sample_data[<sample>]["fastq.F"]
sample_data[<sample>]["fastq.R"]
sample_data[<sample>]["fastq.S"]
- If
mod
is one ofsamse, sampe
, the sai files are required as well (created by abwa aln
step: self.sample_data[<sample>]["saiF|saiR|saiS"]
- If
Output
- Puts output sam files in the following slots:
- If
mod
is one ofmem, samse, sampe, bwasw
: self.sample_data[<sample>]["sam"]
- If
- If
mod
isaln
: self.sample_data[<sample>]["saiF|saiR|saiS"]
- If
- Puts the name of the mapper in:
self.sample_data[<sample>]["mapper"]
- puts fasta of reference genome (if one is given in param file) in:
self.sample_data[<sample>]["reference"]
Parameters that can be set
Parameter |
Values |
Comments |
---|---|---|
ref_index |
path to bwa index |
If not given, will look for a project bwa index and then for a sample bwa index |
ref_genome |
path to genome fasta |
If ref_index is NOT given, will use the equivalent internal fasta. If ref_index is passed, and ref_genome is NOT passed, will leave the reference slot empty |
scope |
project | sample |
Indicates whether to use a project or sample bwa index. |
Lines for parameter file
For external index:
Using
mem
:
bwa_mem_1:
module: bwa_mapper
base: trim1
script_path: /path/to/bwa
mod: mem
qsub_params:
-pe: shared 20
ref_genome: /path/to/ref_genome.fna
ref_index: /path/to/bwa_index/ref_genome
redirects:
-t: 20
2. Using ``aln - samse/sampe``:
bwa_aln_1:
module: bwa_mapper
base: trim1
script_path: /path/to/bwa_mapper
mod: aln
qsub_params:
-pe: shared 20
ref_genome: /path/to/ref_genome.fna
ref_index: /path/to/bwa_index/ref_genome
redirects:
-t: 20
bwa_samse_1:
module: bwa_mapper
base: bwt2_1
script_path: /path/to/bwa
mod: samse
ref_genome: /path/to/ref_genome.fna
ref_index: /path/to/bwa_index/ref_genome
For project bwa index:
bwa_1:
module: bwa_mapper
base: bwa_bld_ind
script_path: /path/to/bwa
mod: mem
scope: project
References
Li, H. and Durbin, R., 2009. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics, 25(14), pp.1754-1760.
STAR_mapper
- Authors
Menachem Sklarz
- Affiliation
Bioinformatics core facility
- Organization
National Institute of Biotechnology in the Negev, Ben Gurion University.
A module for running STAR mapper:
Requires
fastq files in one of the following slots:
sample_data[<sample>]["fastq.F"]
sample_data[<sample>]["fastq.R"]
sample_data[<sample>]["fastq.S"]
If
scope
is set (must come afterSTAR_builder
module which populates the required slots):STAR index directories in:
sample_data[<sample>]["STAR.index"]
ifscope
= “sample”sample_data["STAR.index"]
ifscope
= “project”
Reference fasta files in:
sample_data[<sample>]["STAR.fasta"]
ifscope
= “sample”sample_data["STAR.fasta"]
ifscope
= “project”
Output
Puts output sam files in the following slots:
self.sample_data[<sample>]["sam"]
Alternatively, if
--outSAMtype
is set toBAM
, puts output BAM files in the following slots:self.sample_data[<sample>]["bam"]
self.sample_data[<sample>]["bam_unsorted"]
High confidence collapsed splice junctions (SJ.out.tab file) will be stored in:
self.sample_data[<sample>]["SJ.out.tab"]
If
--quantMode
containsTranscriptomeSAM
, alignments BAM translated into transcript coordinates will be stored in:self.sample_data[<sample>]["TranscriptomeSAM"]
If
--quantMode
containsGeneCounts
, theReadsPerGene.out.tab
file will be stored:self.sample_data[<sample>]["GeneCounts"]
If
--outWigType
is set, will store outputs in:if
--outWigType
iswiggle
self.sample_data[<sample>]["wig2_UniqueMultiple"]
self.sample_data[<sample>]["wig2_Unique"]
self.sample_data[<sample>]["wig1_UniqueMultiple"]
self.sample_data[<sample>]["wig1_Unique"]
self.sample_data[<sample>]["wig"]
if
--outWigType
isbedGraph
self.sample_data[<sample>]["bdg2_UniqueMultiple"]
self.sample_data[<sample>]["bdg2_Unique"]
self.sample_data[<sample>]["bdg1_UniqueMultiple"]
self.sample_data[<sample>]["bdg1_Unique"]
self.sample_data[<sample>]["bdg"]
- Puts the name of the mapper in:
self.sample_data[<sample>]["mapper"]
- Puts fasta of reference genome (if one is given in param file) in:
self.sample_data[<sample>]["reference"]
Parameters that can be set
Parameter |
Values |
Comments |
---|---|---|
ref_genome |
path to genome fasta |
|
scope |
project | sample |
The scope from which to take the genome directory |
Note
You can set the RG atrribute of the resulting SAM/BAM files with the redirected parameter --outSAMattrRGline
This will set the equivalent STAR parameter.
By default, the parameter will be set to include ID and SM tags, both set to the sample name. You can set the SM tag, but any ID tags will be removed and replaced with the sample name.
Lines for parameter file
For external index:
STAR_map:
module: STAR_mapper
base: STAR_bld_ind
script_path: /path/to/STAR
redirects:
--readMapNumber: 1000
--genomeDir: /path/to/genome/STAR_index/
For project STAR index:
STAR_map:
module: STAR_mapper
base: STAR_bld_ind
script_path: /path/to/STAR
scope: project
redirects:
--readMapNumber: 1000
References
Dobin, A., Davis, C.A., Schlesinger, F., Drenkow, J., Zaleski, C., Jha, S., Batut, P., Chaisson, M. and Gingeras, T.R., 2013. STAR: ultrafast universal RNA-seq aligner. Bioinformatics, 29(1), pp.15-21.
STAR_builder
- Authors
Menachem Sklarz
- Affiliation
Bioinformatics core facility
- Organization
National Institute of Biotechnology in the Negev, Ben Gurion University.
A module for running STAR genome index construction:
Requires
fasta files in one of the following slots:
sample_data["fasta.nucl"]
sample_data[<sample>]["fasta.nucl"]
If
--sjdbGTFfile
is set in redirects, but left empty, will expect to find aGTF
file here:sample_data["gtf"]
ifscope
= “project”sample_data[<sample>]["gtf"]
ifscope
= “sample”
If
--sjdbFileChrStartEnd
is set in redirects, but left empty, will expect to find an SJ file here:sample_data["SJ.out.tab"]
ifscope
= “project”sample_data[<sample>]["SJ.out.tab"]
ifscope
= “sample”
Output
Puts output index files in one of the following slot:
self.sample_data[<sample>]["STAR.index"]
self.sample_data["project_data"]["STAR.index"]
Puts the reference fasta file in one of the following slot:
self.sample_data[<sample>]["STAR.fasta"]
self.sample_data["project_data"]["STAR.fasta"]
Parameters that can be set
Parameter |
Values |
Comments |
---|---|---|
scope |
project | sample |
Not used |
Lines for parameter file
STAR_bld_ind:
module: STAR_builder
base: trinity1
script_path: /path/to/STAR
scope: project
qsub_params:
queue: star.q
redirects:
--genomeSAindexNbases: 12
--genomeChrBinNbits: 10
References
Dobin, A., Davis, C.A., Schlesinger, F., Drenkow, J., Zaleski, C., Jha, S., Batut, P., Chaisson, M. and Gingeras, T.R., 2013. STAR: ultrafast universal RNA-seq aligner. Bioinformatics, 29(1), pp.15-21.
STAR_LoadRemoveGenome
- Authors
Menachem Sklarz
- Affiliation
Bioinformatics core facility
- Organization
National Institute of Biotechnology in the Negev, Ben Gurion University.
A module for loading a STAR genome into RAM for use by subsequent STAR mapping jobs.
Note
This module saves memory and time. Set parameter --genomeLoad
in the STAR mapping instance to LoadAndKeep
.
This will load the genome once into memory and use it repeatedly for all instances executed on the same node.
When all mapping jobs are completed, Scripts produced by this instance will remove the genome from RAM for all
nodes used.
Tip
Make sure you set the node
parameter in qsub_params
to all the nodes in use by the base STAR_mapper
instance.
Attention
Currently defined for project-scope or external genomes only. Not used for sample-scope genomes.
Note
Loading a genome is not really required. It will be loaded by the first instance of STAR.
Requires
A STAR genome in:
sample_data["STAR.index"]
Alternatively, a STAR genome index can be passed with the --genomeDir
parameter.
Output
No output is created
Parameters that can be set
Parameter |
Values |
Comments |
---|---|---|
genome |
load|remove |
Load or remove genome from RAM |
qsub_params:node |
Nodes on which to load/unload genome |
|
scope |
project | sample |
The scope from which to take the genome directory. Currently not in use |
Lines for parameter file
For external index:
STAR_remove_genome:
module: STAR_LoadRemoveGenome
base: STAR_map
script_path: '{Vars.paths.STAR}STAR'
genome: remove
qsub_params:
queue: queue.q
node: {Vars.nodes}
redirects:
--genomeDir: /path/to/STAR/genome_directory
For project STAR index:
STAR_remove_genome:
module: STAR_LoadRemoveGenome
base: STAR_map
script_path: '{Vars.paths.STAR}STAR'
genome: remove
qsub_params:
queue: queue.q
node: {Vars.nodes}
References
Dobin, A., Davis, C.A., Schlesinger, F., Drenkow, J., Zaleski, C., Jha, S., Batut, P., Chaisson, M. and Gingeras, T.R., 2013. STAR: ultrafast universal RNA-seq aligner. Bioinformatics, 29(1), pp.15-21.
samtools
*
- Authors
Menachem Sklarz
- Affiliation
Bioinformatics core facility
- Organization
National Institute of Biotechnology in the Negev, Ben Gurion University.
A class that defines a module for executing samtools on a SAM or BAM file.
Warning
This module is in beta stage. Please report issues and we’ll try solving them
Attention
The module was tested on samtools 1.9
Currently, the samtools programs included in the module are the following:
view
sort
index
flagstat
stats
idxstats
depth
fastq/a
merge
mpileup
Note
Order of samtools subprogram execution
The
samtools
programs are executed in the order given in the parameter fileFile types are passed from one program to the next
In order to execute one program more than once, append digits to the program name, e.g.
sort2
,index3
etc.
Arguments can be passed to the tools following the program name in the parameter file, e.g.:
sort: -n -@ 10
Alternatively, they can be passed in a redirects
block:
sort:
redirects: -n -@ 10
Please do NOT pass input and output arguments - they are set by the module.
Some of the tools are defined only when the scope
is sample
:
merge
merges the sample-wise BAM files into a project BAM file.mpileup
creates a project VCF/BCF/mpileup file from the sample BAM files.
Attention
Treatment of regions
If you want to limit the program to a specific region, pass the program name a block with a ‘region’ section. If you want to set the region and pass some redirects, add a ‘redirects’ section as well. For example:
mpileup:
redirects: --max-depth INT -v
region: chr2:212121-32323232
Attention
Treatment of BED files
In samtools view
, bedcov
, depth
and mpileup
, you can pass a BED file by adding a bed
field in the tool block, with one of the following values:
sample
- use a sample-scope BED fileproject
- use a project-scope BED fileA full path to a BED file.
Example:
view:
redirects: -uh -q 30 -@ 20 -F 4
bed: /path/to/external/bed
Requires
A SAM file in the following location:
sample_data[<sample>]["sam"]
(forscope=sample
)sample_data["project_data"]["sam"]
(forscope=project
)
Or a BAM file in:
sample_data[<sample>]["bam"]
(forscope=sample
)sample_data["project_data"]["bam"]
(forscope=project
)
Note
If both BAM
and SAM
files exist, select the one to use with type2use
(see section Parameters that can be set).
Output
Depending on the parameters, will put files in different types (e.g. bam
, cram
, sam
, bam
, bai
, crai
, vcf
, bcf
, mpileup
, fasta.{F,R,S}
, fastq.{F,R,S}
)
Please use stop_and_show
to see the types produced by your instance of samtools_new
.
Note
If scope
is set to project
, the above mentioned output files will be created in the project scope.
Note
merge
and mpileup
are only defined when scope
is sample
. See above
By default, all files are saved. To keep only the output from specific programs, add a keep_output
section containing a list of programs for which the output should be saved. All other files will be discarded.
Parameters that can be set
Parameter |
Values |
Comments |
---|---|---|
project |
sample|project |
Scope of SAM/BAM top operate on. Defaults to |
view |
e.g.: -buh -q 30 |
|
sort |
e.g.: -@ 20 |
|
index |
|
|
flagstat |
Leave empty. flagstat takes no parameters |
|
stats |
|
|
idxstats |
|
|
fastq/a |
|
|
merge |
|
|
region |
A region to limit the region-limitable programs, such as |
|
type2use |
sam|bam |
Type of file to use. Must exist in scope |
keep_output |
[sort, view, sort2] |
A list of programs for which to store the output files. By deafult, all files are saved. |
Lines for parameter file
sam_bwt1:
module: samtools_new
base: bwt1
script_path: {Vars.paths.samtools}
qsub_params:
-pe: shared 20
region: chr2:212121-32323232
scope: sample
# First 'view'. Use FLAG to filter alignments:
view: -uh -q 30 -@ 20 -F 4 -O bam
# First 'sort'. Sort by coordinates:
sort: -@ 20
# Second 'view'. Use region to filter alignments:
view2:
redirects: -buh -q 30 -@ 20
region: chr2:212121-32323232
index:
flagstat:
stats: --remove-dups
idxstats:
# Second 'sort'. Sort by name:
sort2: -n -@ 20
# Get sequences from name-sorted BAM file:
fastq:
# Merge BAM name sorted BAM files
merge:
region: chr2:212121-32323232
# Create VCF from Merge BAM name sorted BAM files
mpileup:
redirects: --max-depth INT -v
region: chr2:212121-32323232
keep_output: [sort, view, index, flagstat, stats, fastq, mpileup, merge]
# stop_and_show:
References
Li, H., Handsaker, B., Wysoker, A., Fennell, T., Ruan, J., Homer, N., Marth, G., Abecasis, G. and Durbin, R., 2009. The sequence alignment/map format and SAMtools. Bioinformatics, 25(16), pp.2078-2079.
Multiqc
*
- Authors
Menachem Sklarz
- Affiliation
Bioinformatics core facility
- Organization
National Institute of Biotechnology in the Negev, Ben Gurion University.
A module for preparing a MultiQC report for all samples.
Tip
By default, the module will search for parsable reports in the directories of all the modules in the branch leading to this instance. To search only in the directories of the explicit base steps, specify the bases_only
parameter.
Requires
No real requirements. Will give a report with information if one of the base steps produces reports that MultiQC can read, e.g. fastqc, bowtie2, samtools etc.
Output
puts report dir in the following slot:
self.sample_data[<sample>]["Multiqc_report"]
Parameters that can be set
Parameter |
Values |
Comments |
---|---|---|
bases_only |
Search directories of explicit base steps only. |
Lines for parameter file
firstMultQC:
module: Multiqc
base:
- sam_bwt2_1
- fqc_trim1
bases_only:
script_path: /path/to/multiqc
References
Ewels, P., Magnusson, M., Lundin, S. and Käller, M., 2016. MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics, 32(19), pp.3047-3048.
RSEM
- Authors
Liron Levin
- Affiliation
Bioinformatics core facility
- Organization
National Institute of Biotechnology in the Negev, Ben Gurion University.
Short Description
A module for running RSEM
Requires
- fastq file in
self.sample_data[sample]["fastq.F"]
self.sample_data[sample]["fastq.R"]
self.sample_data[sample]["fastq.S"]
- or bam file in
self.sample_data[sample]["bam"]
Output
- puts output bam files (if the input is fastq) in:
self.sample_data[sample]["bam"]
- puts the location of RSEM results in:
self.sample_data[sample]["RSEM"]
self.sample_data[sample]["genes.results"]
self.sample_data[sample]["isoforms.results"]
Parameters that can be set
Parameter |
Values |
Comments |
---|---|---|
mode |
transcriptome/genome |
Is the reference is a genome or a transcriptome? |
gff3 |
None |
Use if the mode is genome and the annotation file is in gff3 format |
Lines for parameter file
Step_Name: # Name of this step
module: RSEM # Name of the module used
base: # Name of the step [or list of names] to run after [must be after a bam file generator step or merge with fastq files]
script_path: # Command for running the RSEM script
qsub_params:
-pe: # Number of CPUs to reserve for this analysis
mode: # transcriptome or genome
export_transcriptome: # In genome mode set the extracted transcriptome as the new project level fasta.nucl and extract the ranscript-to-gene-map file as project level gene_trans_map
annotation: # For Genome mode: the location of GTF file [the default] , for GFF3 use the gff3 flag. For Transcriptome mode: transcript-to-gene-map file.
# If annotation is set to Trinity the transcript-to-gene-map file will be generated using the from_Trinity_to_gene_map script
# If not set will use only the reference file as unrelated transcripts
from_Trinity_to_gene_map_script_path: # If the mode is transcriptome and the reference was assembled using Trinity it is possible to generate the transcript-to-gene-map file automatically using this script
# If annotation is set to Trinity and this line is empty or missing it will try using the module's associated script
gff3: # Use if the mode is genome and the annotation file is in gff3 format
mapper: # bowtie/bowtie2/star
mapper_path: # Location of mapper script
rsem_prepare_reference_script_path: # Location of preparing reference script
plot_stat: # Generate statistical plots
plot_stat_script_path: # Location of statistical plot generating script
reference: # The reference genome/transcriptome location [FASTA file]
rsem_generate_data_matrix_script_path: # Location of the final matrix generating script
# If this line is empty or missing it will try using the module's associated script
redirects:
--append-names: # RSEM will append gene_name/transcript_name to the result files
--estimate-rspd: # Enables RSEM to learn from the data how the reads are distributed across a transcript
-p: # Number of CPUs to use in this analysis
--bam: # Will use bam files and not fastq
--no-bam-output:
--output-genome-bam: # Alignments in genomic coordinates (only if mode is genome)
References
Li, Bo, and Colin N. Dewey. “RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome.” BMC bioinformatics 12.1 (2011): 323.
htseq_count
- Authors
Menachem Sklarz
- Affiliation
Bioinformatics core facility
- Organization
National Institute of Biotechnology in the Negev, Ben Gurion University.
A module for running htseq-count:
See htseq-count documentation.
Requires
fastq files in one of the following slots:
sample_data[<sample>]["bam"]
sample_data[<sample>]["sam"]
Output
- Puts the output file in:
self.sample_data[<sample>]["HTSeq.counts"]
Parameters that can be set
Parameter |
Values |
Comments |
---|---|---|
gff |
path to bowtie1 index |
If not given, will look for a project bowtie1 index and then for a sample bowtie1 index |
-f|–format |
sam | bam |
In redirects. Tells htseq-count which file to use. If not specified, will use whichever file exists. |
Lines for parameter file
For external index:
htseq_c1:
module: htseq_count
base: samtools_STAR1
script_path: /storage16/app/bioinfo/python_packages/bin/htseq-count
gtf: /fastspace/bioinfo_databases/STAR_GRCh38_Gencode21/gencode.v21.annotation.gtf
redirects:
--format: bam
-s: 'no'
-m: intersection-nonempty
References
Anders, S., Pyl, P.T. and Huber, W., 2015. HTSeq—a Python framework to work with high-throughput sequencing data. Bioinformatics, 31(2), pp.166-169.
RSEM_prep
- Authors
Menachem Sklarz
- Affiliation
Bioinformatics core facility
- Organization
National Institute of Biotechnology in the Negev, Ben Gurion University.
A module for running rsem-prepare-reference
:
Requires
fasta files in one of the following slots:
sample_data["fasta.nucl"]
(scope
=project
)sample_data[<sample>]["fasta.nucl"]
(scope
=sample
)
If neither exists, please supply
reference
parameter.
Attention
If type “gene_trans_map” exists, its value will be used for “–transcript-to-gene-map”, unless “–transcript-to-gene-map” is explicitly passed in redirects!
Output
Puts output index files in one of the following slot:
self.sample_data[<sample>]["RSEM.index"]
self.sample_data["project_data"]["RSEM.index"]
Parameters that can be set
Parameter |
Values |
Comments |
---|---|---|
scope |
project | sample |
Where to take the reference from |
reference |
path to reference |
Use this fasta file. See the definition for reference_fasta_file(s) in the ARGUMENTS section of rsem-prepare-reference help |
Lines for parameter file
RSEM_prep_ind:
module: RSEM_prep
base: merge1
script_path: /path/to/RSEM
reference: /path/to/fasta
redirects:
--gtf: /path/to/gtf
--transcript-to-gene-map: /path/to/map_file
References
RSEM_mapper
- Authors
Menachem Sklarz
- Affiliation
Bioinformatics core facility
- Organization
National Institute of Biotechnology in the Negev, Ben Gurion University.
A module for running rsem-calculate-expression
:
Requires
fasta files in one of the following slots:
sample_data["project_data"]["fasta.nucl"]
(scope
=project
)sample_data[<sample>]["fasta.nucl"]
(scope
=sample
)
If neither exists, please supply
reference
parameter.
Output
Puts output index files in one of the following slot:
self.sample_data[<sample>]["genes.counts"]
self.sample_data[<sample>]["isoforms.counts"]
And the following BAMs, depending on redirected params:
self.sample_data[<sample>]["genome.unsorted.bam"]
self.sample_data[<sample>]["genome.bam"]
self.sample_data[<sample>]["transcript.unsorted.bam"]
self.sample_data[<sample>]["transcript.bam"]
Parameters that can be set
Parameter |
Values |
Comments |
---|---|---|
scope |
project | sample |
The scope of the RSEM index. Must match the scope in the RSEM_prep instance. |
result2use |
genes | isoforms |
Summarize counts at the gene or isoform level. |
Lines for parameter file
Mapping fastq files:
RSEM_map:
module: RSEM_mapper
base: merge1
script_path: {Vars.paths.RSEM.rsem-calculate-expression}
reference: /path/to/fasta
redirects:
--gtf: /path/to/gtf
--transcript-to-gene-map: /path/to/map_file
Parsing an existing BAM alignment file:
RSEM_parse_bam:
module: RSEM_mapper
base: mv_transcript_bam_to_bam
script_path: {Vars.paths.RSEM.rsem-calculate-expression}
scope: project
qsub_params:
-pe: shared 20
redirects:
--num-threads: 20
Comments