Genome Assembly
Modules included in this section
clc_assembl
- Authors
Menachem Sklarz
- Affiliation
Bioinformatics core facility
- Organization
National Institute of Biotechnology in the Negev, Ben Gurion University.
A class that defines a module for assembling reads using CLC assembler.
Requires
fastq files in at least one of the following slots:
sample_data[<sample>]["fastq.F"]
sample_data[<sample>]["fastq.R"]
sample_data[<sample>]["fastq.S"]
Output:
puts fasta output files in the following slots:
if
scope
set tosample
:sample_data[<sample>]["fasta.nucl"]
sample_data[<sample>]["clc_assembl.contigs"]
Also, sets
sample_data[<sample>]["assembler"] = "clc_assembl"
if
scope
set toproject
:sample_data["fasta.nucl"]
sample_data["clc_assembl.contigs"]
Also, sets
sample_data[<sample>]["assembler"] = "clc_assembl"
Parameters that can be set
Parameter |
Values |
Comments |
---|---|---|
scope |
sample|project |
Set to |
p |
e.g. ‘fb ss 180 250’ |
Sets the |
Lines for parameter file
clc1:
module: clc_assembl
base: trim1
script_path: /path/to/clc_assembler
qsub_params:
-pe: shared 30
node: sge37
scope: sample
p: fb ss 180 250
redirects:
--cpus: 30
megahit_assembl
- Authors
Menachem Sklarz
- Affiliation
Bioinformatics core facility
- Organization
National Institute of Biotechnology in the Negev, Ben Gurion University.
A class that defines a module for assembling reads using MEGAHIT assembler.
Requires
fastq files in at least one of the following slots:
sample_data[<sample>]["fastq.F"]
sample_data[<sample>]["fastq.R"]
sample_data[<sample>]["fastq.S"]
Output:
puts fasta output files in the following slots:
if
scope
set tosample
:sample_data[<sample>]["fasta.nucl"]
sample_data[<sample>]["megahit_assembl.contigs"]
Also, sets
sample_data[<sample>]["assembler"] = "megahit_assembl"
if
scope
set toproject
:sample_data["fasta.nucl"]
sample_data["megahit_assembl.contigs"]
Also, sets
sample_data[<sample>]["assembler"] = "megahit_assembl"
Parameters that can be set
Parameter |
Values |
Comments |
---|---|---|
scope |
sample|project |
Set to |
Lines for parameter file
megahit1:
module: megahit_assembl
base: trim1
script_path: /path/to/megahit
qsub_params:
-pe: shared 30
node: sge37
scope: project
redirects:
--continue:
--num-cpu-threads: 30
References
Li, D., Liu, C.M., Luo, R., Sadakane, K. and Lam, T.W., 2015. MEGAHIT: an ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph. Bioinformatics, 31(10), pp.1674-1676.
spades_assembl
*
- Authors
Menachem Sklarz
- Affiliation
Bioinformatics core facility
- Organization
National Institute of Biotechnology in the Negev, Ben Gurion University.
A class that defines a module for assembling reads using spades assembler.
Requires
fastq files in at least one of the following slots:
sample_data[<sample>]["fastq.F"]
sample_data[<sample>]["fastq.R"]
sample_data[<sample>]["fastq.S"]
Output:
puts fasta output files in the following slots:
for sample-wise assembly:
sample_data[<sample>]["fasta.nucl"]
sample_data[<sample>]["spades_assembl.contigs"]
sample_data[<sample>]["spades_assembl.scaffolds"]
for mega assembly (not defined yet):
sample_data["fasta.nucl"]
sample_data["spades_assembl.contigs"]
sample_data["spades_assembl.scaffolds"]
Parameters that can be set
Parameter |
Values |
Comments |
---|---|---|
scope |
sample|project |
Set if project-wide fasta slot should be used |
truncate_names |
truncates contig names, e.g. ‘>NODE_82_length_18610_cov_38.4999_ID_165’ will be changed to ‘>NODE_82_length_18610’ |
|
use_corrected |
Use the reads files after reads correction for douwnstream usge |
Lines for parameter file
spades1:
module: spades_assembl
base: trim1
script_path: /path/to/bin/spades.py
truncate_names:
redirects:
--careful:
References
Bankevich, A., Nurk, S., Antipov, D., Gurevich, A.A., Dvorkin, M., Kulikov, A.S., Lesin, V.M., Nikolenko, S.I., Pham, S., Prjibelski, A.D. and Pyshkin, A.V., 2012. SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. Journal of computational biology, 19(5), pp.455-477.
quast
*
- Authors
Menachem Sklarz
- Affiliation
Bioinformatics core facility
- Organization
National Institute of Biotechnology in the Negev, Ben Gurion University.
A module for running quast on fasta assemblies:
QUAST is executed on the fasta file along the following lines:
If ‘scope’ is specified, the appropriate fasta will be used. An error will occur if the fasta does not exist.
If ‘scope’ is not specified, if a project-wide fasta exists, it will be used. Otherwise, sample-wise fasta files will be used. If none exist, an error will occur.
Note
With compare_mode
, you tell the module to run quast on multiple assemblies. This is done in one of three ways:
If
scope
is sample and a single base step defined, will compare between the samples.If
scope
is sample and there is more than one base step defined, will compare between the assemblies found in the base steps for each sample separately.If
scope
is project, will compare between the assemblies found in the base steps at the project level.
Requires
fasta files in one of the following slots:
sample_data["fasta.nucl"]
sample_data[<sample>]["fasta.nucl"]
Output
- Puts output directory in one of:
self.sample_data["project_data"]["quast"]
self.sample_data[<sample>]["quast"]
Parameters that can be set
Parameter |
Values |
Comments |
---|---|---|
scope |
project | sample |
Indicates whether to use a project or sample contigs file. |
compare_mode |
If ‘scope’ is ‘sample’, specifies whether to analyse each sample separately or to create a single comparison report for all samples. |
Lines for parameter file
A quast report for each sample separately:
quast1:
module: quast
base: spades1
script_path: /path/to/quast.py
scope: sample
redirects:
--fast:
A quast report comparing the sample assemblies:
quast1:
module: quast
base: spades1
script_path: /path/to/quast.py
compare_mode:
scope: sample
redirects:
--fast:
A quast report comparing the project assemblies from different stages of the analysis:
quast1:
module: quast
base:
- spades1
- megahit1
script_path: /path/to/quast.py
compare_mode:
scope: project
redirects:
--fast:
References
Gurevich, A., Saveliev, V., Vyahhi, N. and Tesler, G., 2013. QUAST: quality assessment tool for genome assemblies. Bioinformatics, 29(8), pp.1072-1075.