Genome Assembly¶
Modules included in this section
clc_assembl
¶
Authors: | Menachem Sklarz |
---|---|
Affiliation: | Bioinformatics core facility |
Organization: | National Institute of Biotechnology in the Negev, Ben Gurion University. |
A class that defines a module for assembling reads using CLC assembler.
Requires¶
fastq files in at least one of the following slots:
sample_data[<sample>]["fastq.F"]
sample_data[<sample>]["fastq.R"]
sample_data[<sample>]["fastq.S"]
Output:¶
puts fasta output files in the following slots:
if
scope
set tosample
:sample_data[<sample>]["fasta.nucl"]
sample_data[<sample>]["clc_assembl.contigs"]
- Also, sets
sample_data[<sample>]["assembler"] = "clc_assembl"
if
scope
set toproject
:sample_data["fasta.nucl"]
sample_data["clc_assembl.contigs"]
- Also, sets
sample_data[<sample>]["assembler"] = "clc_assembl"
Parameters that can be set¶
Parameter | Values | Comments |
---|---|---|
scope | sample|project | Set to project to assembl all project reads into one assembly. |
p | e.g. ‘fb ss 180 250’ | Sets the -p parameter passed to CLC for paired-end reads. Required only if the project includes paired end reads. |
Lines for parameter file¶
clc1:
module: clc_assembl
base: trim1
script_path: /path/to/clc_assembler
qsub_params:
-pe: shared 30
node: sge37
scope: sample
p: fb ss 180 250
redirects:
--cpus: 30
megahit_assembl
¶
Authors: | Menachem Sklarz |
---|---|
Affiliation: | Bioinformatics core facility |
Organization: | National Institute of Biotechnology in the Negev, Ben Gurion University. |
A class that defines a module for assembling reads using MEGAHIT assembler.
Requires¶
fastq files in at least one of the following slots:
sample_data[<sample>]["fastq.F"]
sample_data[<sample>]["fastq.R"]
sample_data[<sample>]["fastq.S"]
Output:¶
puts fasta output files in the following slots:
if
scope
set tosample
:sample_data[<sample>]["fasta.nucl"]
sample_data[<sample>]["megahit_assembl.contigs"]
- Also, sets
sample_data[<sample>]["assembler"] = "megahit_assembl"
if
scope
set toproject
:sample_data["fasta.nucl"]
sample_data["megahit_assembl.contigs"]
- Also, sets
sample_data[<sample>]["assembler"] = "megahit_assembl"
Parameters that can be set¶
Parameter | Values | Comments |
---|---|---|
scope | sample|project | Set to project to assembl all project reads into one assembly. |
Lines for parameter file¶
megahit1:
module: megahit_assembl
base: trim1
script_path: /path/to/megahit
qsub_params:
-pe: shared 30
node: sge37
scope: project
redirects:
--continue:
--num-cpu-threads: 30
References¶
Li, D., Liu, C.M., Luo, R., Sadakane, K. and Lam, T.W., 2015. MEGAHIT: an ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph. Bioinformatics, 31(10), pp.1674-1676.
spades_assembl
*¶
Authors: | Menachem Sklarz |
---|---|
Affiliation: | Bioinformatics core facility |
Organization: | National Institute of Biotechnology in the Negev, Ben Gurion University. |
A class that defines a module for assembling reads using spades assembler.
Requires¶
fastq files in at least one of the following slots:
sample_data[<sample>]["fastq.F"]
sample_data[<sample>]["fastq.R"]
sample_data[<sample>]["fastq.S"]
Output:¶
puts fasta output files in the following slots:
for sample-wise assembly:
sample_data[<sample>]["fasta.nucl"]
sample_data[<sample>]["spades_assembl.contigs"]
sample_data[<sample>]["spades_assembl.scaffolds"]
for mega assembly (not defined yet):
sample_data["fasta.nucl"]
sample_data["spades_assembl.contigs"]
sample_data["spades_assembl.scaffolds"]
Parameters that can be set¶
Parameter | Values | Comments |
---|---|---|
scope | sample|project | Set if project-wide fasta slot should be used |
truncate_names | truncates contig names, e.g. ‘>NODE_82_length_18610_cov_38.4999_ID_165’ will be changed to ‘>NODE_82_length_18610’ |
Lines for parameter file¶
spades1:
module: spades_assembl
base: trim1
script_path: /path/to/bin/spades.py
truncate_names:
redirects:
--careful:
References¶
Bankevich, A., Nurk, S., Antipov, D., Gurevich, A.A., Dvorkin, M., Kulikov, A.S., Lesin, V.M., Nikolenko, S.I., Pham, S., Prjibelski, A.D. and Pyshkin, A.V., 2012. SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. Journal of computational biology, 19(5), pp.455-477.
quast
*¶
Authors: | Menachem Sklarz |
---|---|
Affiliation: | Bioinformatics core facility |
Organization: | National Institute of Biotechnology in the Negev, Ben Gurion University. |
A module for running quast on fasta assemblies:
QUAST is executed on the fasta file along the following lines:
- If ‘scope’ is specified, the appropriate fasta will be used. An error will occur if the fasta does not exist.
- If ‘scope’ is not specified, if a project-wide fasta exists, it will be used. Otherwise, sample-wise fasta files will be used. If none exist, an error will occur.
Note
With compare_mode
, you tell the module to run quast on multiple assemblies. This is done in one of three ways:
- If
scope
is sample and a single base step defined, will compare between the samples. - If
scope
is sample and there is more than one base step defined, will compare between the assemblies found in the base steps for each sample separately. - If
scope
is project, will compare between the assemblies found in the base steps at the project level.
Requires¶
fasta files in one of the following slots:
sample_data["fasta.nucl"]
sample_data[<sample>]["fasta.nucl"]
Output¶
- Puts output directory in one of:
self.sample_data["project_data"]["quast"]
self.sample_data[<sample>]["quast"]
Parameters that can be set¶
Parameter | Values | Comments |
---|---|---|
scope | project | sample | Indicates whether to use a project or sample contigs file. |
compare_mode | If ‘scope’ is ‘sample’, specifies whether to analyse each sample separately or to create a single comparison report for all samples. |
Lines for parameter file¶
- A quast report for each sample separately:
quast1:
module: quast
base: spades1
script_path: /path/to/quast.py
scope: sample
redirects:
--fast:
- A quast report comparing the sample assemblies:
quast1:
module: quast
base: spades1
script_path: /path/to/quast.py
compare_mode:
scope: sample
redirects:
--fast:
- A quast report comparing the project assemblies from different stages of the analysis:
quast1:
module: quast
base:
- spades1
- megahit1
script_path: /path/to/quast.py
compare_mode:
scope: project
redirects:
--fast:
References¶
Gurevich, A., Saveliev, V., Vyahhi, N. and Tesler, G., 2013. QUAST: quality assessment tool for genome assemblies. Bioinformatics, 29(8), pp.1072-1075.