Genome Assembly

Modules included in this section

clc_assembl
megahit_assembl
spades_assembl ^*
quast ^*

`clc_assembl`

Authors: Menachem Sklarz
Affiliation: Bioinformatics core facility
Organization: National Institute of Biotechnology in the Negev, Ben Gurion University.

A class that defines a module for assembling reads using CLC assembler.

Requires

fastq files in at least one of the following slots:
- sample_data[<sample>]["fastq.F"]
- sample_data[<sample>]["fastq.R"]
- sample_data[<sample>]["fastq.S"]

Output:

puts fasta output files in the following slots:
- if scope set to sample:
  sample_data[<sample>]["fasta.nucl"]
  
  sample_data[<sample>]["clc_assembl.contigs"]
  
  Also, sets sample_data[<sample>]["assembler"] = "clc_assembl"
- if scope set to project:
  sample_data["fasta.nucl"]
  
  sample_data["clc_assembl.contigs"]
  
  Also, sets sample_data[<sample>]["assembler"] = "clc_assembl"

Parameters that can be set

Parameter	Values	Comments
scope	sample\|project	Set to `project` to assembl all project reads into one assembly.
p	e.g. ‘fb ss 180 250’	Sets the `-p` parameter passed to CLC for paired-end reads. Required only if the project includes paired end reads.

Lines for parameter file

clc1:
    module: clc_assembl
    base: trim1
    script_path: /path/to/clc_assembler
    qsub_params:
        -pe:    shared 30
        node:   sge37
    scope:      sample
    p:          fb ss 180 250 
    redirects:
        --cpus: 30

`megahit_assembl`

Authors: Menachem Sklarz
Affiliation: Bioinformatics core facility
Organization: National Institute of Biotechnology in the Negev, Ben Gurion University.

A class that defines a module for assembling reads using MEGAHIT assembler.

Requires

fastq files in at least one of the following slots:
- sample_data[<sample>]["fastq.F"]
- sample_data[<sample>]["fastq.R"]
- sample_data[<sample>]["fastq.S"]

Output:

puts fasta output files in the following slots:
- if scope set to sample:
  sample_data[<sample>]["fasta.nucl"]
  
  sample_data[<sample>]["megahit_assembl.contigs"]
  
  Also, sets sample_data[<sample>]["assembler"] = "megahit_assembl"
- if scope set to project:
  sample_data["fasta.nucl"]
  
  sample_data["megahit_assembl.contigs"]
  
  Also, sets sample_data[<sample>]["assembler"] = "megahit_assembl"

Parameters that can be set

Parameter	Values	Comments
scope	sample\|project	Set to `project` to assembl all project reads into one assembly.

Lines for parameter file

megahit1:
    module: megahit_assembl
    base: trim1
    script_path: /path/to/megahit
    qsub_params:
        -pe: shared 30
        node: sge37
    scope: project
    redirects:
        --continue: 
        --num-cpu-threads: 30

References

Li, D., Liu, C.M., Luo, R., Sadakane, K. and Lam, T.W., 2015. MEGAHIT: an ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph. Bioinformatics, 31(10), pp.1674-1676.

`spades_assembl` ^*

Authors: Menachem Sklarz
Affiliation: Bioinformatics core facility
Organization: National Institute of Biotechnology in the Negev, Ben Gurion University.

A class that defines a module for assembling reads using spades assembler.

Requires

fastq files in at least one of the following slots:

sample_data[<sample>]["fastq.F"]

sample_data[<sample>]["fastq.R"]

sample_data[<sample>]["fastq.S"]

Output:

puts fasta output files in the following slots:

for sample-wise assembly:

sample_data[<sample>]["fasta.nucl"]

sample_data[<sample>]["spades_assembl.contigs"]

sample_data[<sample>]["spades_assembl.scaffolds"]

for mega assembly (not defined yet):

sample_data["fasta.nucl"]

sample_data["spades_assembl.contigs"]

sample_data["spades_assembl.scaffolds"]

Parameters that can be set

Parameter	Values	Comments
scope	sample\|project	Set if project-wide fasta slot should be used
truncate_names		truncates contig names, e.g. ‘>NODE_82_length_18610_cov_38.4999_ID_165’ will be changed to ‘>NODE_82_length_18610’
use_corrected		Use the reads files after reads correction for douwnstream usge

Lines for parameter file

spades1:
    module: spades_assembl
    base: trim1
    script_path: /path/to/bin/spades.py
    truncate_names: 
    redirects:
        --careful: 

References

Bankevich, A., Nurk, S., Antipov, D., Gurevich, A.A., Dvorkin, M., Kulikov, A.S., Lesin, V.M., Nikolenko, S.I., Pham, S., Prjibelski, A.D. and Pyshkin, A.V., 2012. SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. Journal of computational biology, 19(5), pp.455-477.

`quast` ^*

Authors: Menachem Sklarz
Affiliation: Bioinformatics core facility
Organization: National Institute of Biotechnology in the Negev, Ben Gurion University.

A module for running quast on fasta assemblies:

QUAST is executed on the fasta file along the following lines:

If ‘scope’ is specified, the appropriate fasta will be used. An error will occur if the fasta does not exist.
If ‘scope’ is not specified, if a project-wide fasta exists, it will be used. Otherwise, sample-wise fasta files will be used. If none exist, an error will occur.

Note

With compare_mode, you tell the module to run quast on multiple assemblies. This is done in one of three ways:

If scope is sample and a single base step defined, will compare between the samples.
If scope is sample and there is more than one base step defined, will compare between the assemblies found in the base steps for each sample separately.
If scope is project, will compare between the assemblies found in the base steps at the project level.

Requires

fasta files in one of the following slots:
- sample_data["fasta.nucl"]
- sample_data[<sample>]["fasta.nucl"]

Output

Puts output directory in one of:
- self.sample_data["project_data"]["quast"]
- self.sample_data[<sample>]["quast"]

Parameters that can be set

Parameter	Values	Comments
scope	project \| sample	Indicates whether to use a project or sample contigs file.
compare_mode		If ‘scope’ is ‘sample’, specifies whether to analyse each sample separately or to create a single comparison report for all samples.

Lines for parameter file

A quast report for each sample separately:

quast1:
    module: quast
    base: spades1
    script_path: /path/to/quast.py
    scope: sample
    redirects:
        --fast: 

A quast report comparing the sample assemblies:

quast1:
    module: quast
    base: spades1
    script_path: /path/to/quast.py
    compare_mode: 
    scope: sample
    redirects:
        --fast: 

A quast report comparing the project assemblies from different stages of the analysis:

quast1:
    module: quast
    base: 
        - spades1
        - megahit1
    script_path: /path/to/quast.py
    compare_mode: 
    scope: project
    redirects:
        --fast: 

References

Gurevich, A., Saveliev, V., Vyahhi, N. and Tesler, G., 2013. QUAST: quality assessment tool for genome assemblies. Bioinformatics, 29(8), pp.1072-1075.

Genome Assembly

clc_assembl

Requires

Output:

Parameters that can be set

Lines for parameter file

megahit_assembl

Requires

Output:

Parameters that can be set

Lines for parameter file

References

spades_assembl *

Requires

Output:

Parameters that can be set

Lines for parameter file

References

quast *

Requires

Output

Parameters that can be set

Lines for parameter file

References

`clc_assembl`

`megahit_assembl`

`spades_assembl` ^*

`quast` ^*