Genome Assembly

Modules included in this section

clc_assembl

Authors

Menachem Sklarz

Affiliation

Bioinformatics core facility

Organization

National Institute of Biotechnology in the Negev, Ben Gurion University.

A class that defines a module for assembling reads using CLC assembler.

Requires

  • fastq files in at least one of the following slots:

    • sample_data[<sample>]["fastq.F"]

    • sample_data[<sample>]["fastq.R"]

    • sample_data[<sample>]["fastq.S"]

Output:

  • puts fasta output files in the following slots:

    • if scope set to sample:

      • sample_data[<sample>]["fasta.nucl"]

      • sample_data[<sample>]["clc_assembl.contigs"]

      • Also, sets sample_data[<sample>]["assembler"] = "clc_assembl"

    • if scope set to project:

      • sample_data["fasta.nucl"]

      • sample_data["clc_assembl.contigs"]

      • Also, sets sample_data[<sample>]["assembler"] = "clc_assembl"

Parameters that can be set

Parameter

Values

Comments

scope

sample|project

Set to project to assembl all project reads into one assembly.

p

e.g. ‘fb ss 180 250’

Sets the -p parameter passed to CLC for paired-end reads. Required only if the project includes paired end reads.

Lines for parameter file

clc1:
    module: clc_assembl
    base: trim1
    script_path: /path/to/clc_assembler
    qsub_params:
        -pe:    shared 30
        node:   sge37
    scope:      sample
    p:          fb ss 180 250 
    redirects:
        --cpus: 30

megahit_assembl

Authors

Menachem Sklarz

Affiliation

Bioinformatics core facility

Organization

National Institute of Biotechnology in the Negev, Ben Gurion University.

A class that defines a module for assembling reads using MEGAHIT assembler.

Requires

  • fastq files in at least one of the following slots:

    • sample_data[<sample>]["fastq.F"]

    • sample_data[<sample>]["fastq.R"]

    • sample_data[<sample>]["fastq.S"]

Output:

  • puts fasta output files in the following slots:

    • if scope set to sample:

      • sample_data[<sample>]["fasta.nucl"]

      • sample_data[<sample>]["megahit_assembl.contigs"]

      • Also, sets sample_data[<sample>]["assembler"] = "megahit_assembl"

    • if scope set to project:

      • sample_data["fasta.nucl"]

      • sample_data["megahit_assembl.contigs"]

      • Also, sets sample_data[<sample>]["assembler"] = "megahit_assembl"

Parameters that can be set

Parameter

Values

Comments

scope

sample|project

Set to project to assembl all project reads into one assembly.

Lines for parameter file

megahit1:
    module: megahit_assembl
    base: trim1
    script_path: /path/to/megahit
    qsub_params:
        -pe: shared 30
        node: sge37
    scope: project
    redirects:
        --continue: 
        --num-cpu-threads: 30

References

Li, D., Liu, C.M., Luo, R., Sadakane, K. and Lam, T.W., 2015. MEGAHIT: an ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph. Bioinformatics, 31(10), pp.1674-1676.

spades_assembl *

Authors

Menachem Sklarz

Affiliation

Bioinformatics core facility

Organization

National Institute of Biotechnology in the Negev, Ben Gurion University.

A class that defines a module for assembling reads using spades assembler.

Requires

  • fastq files in at least one of the following slots:

    • sample_data[<sample>]["fastq.F"]

    • sample_data[<sample>]["fastq.R"]

    • sample_data[<sample>]["fastq.S"]

Output:

  • puts fasta output files in the following slots:

    • for sample-wise assembly:

      • sample_data[<sample>]["fasta.nucl"]

      • sample_data[<sample>]["spades_assembl.contigs"]

      • sample_data[<sample>]["spades_assembl.scaffolds"]

    • for mega assembly (not defined yet):

      • sample_data["fasta.nucl"]

      • sample_data["spades_assembl.contigs"]

      • sample_data["spades_assembl.scaffolds"]

Parameters that can be set

Parameter

Values

Comments

scope

sample|project

Set if project-wide fasta slot should be used

truncate_names

truncates contig names, e.g. ‘>NODE_82_length_18610_cov_38.4999_ID_165’ will be changed to ‘>NODE_82_length_18610’

use_corrected

Use the reads files after reads correction for douwnstream usge

Lines for parameter file

spades1:
    module: spades_assembl
    base: trim1
    script_path: /path/to/bin/spades.py
    truncate_names: 
    redirects:
        --careful: 

References

Bankevich, A., Nurk, S., Antipov, D., Gurevich, A.A., Dvorkin, M., Kulikov, A.S., Lesin, V.M., Nikolenko, S.I., Pham, S., Prjibelski, A.D. and Pyshkin, A.V., 2012. SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. Journal of computational biology, 19(5), pp.455-477.

quast *

Authors

Menachem Sklarz

Affiliation

Bioinformatics core facility

Organization

National Institute of Biotechnology in the Negev, Ben Gurion University.

A module for running quast on fasta assemblies:

QUAST is executed on the fasta file along the following lines:

  • If ‘scope’ is specified, the appropriate fasta will be used. An error will occur if the fasta does not exist.

  • If ‘scope’ is not specified, if a project-wide fasta exists, it will be used. Otherwise, sample-wise fasta files will be used. If none exist, an error will occur.

Note

With compare_mode, you tell the module to run quast on multiple assemblies. This is done in one of three ways:

  • If scope is sample and a single base step defined, will compare between the samples.

  • If scope is sample and there is more than one base step defined, will compare between the assemblies found in the base steps for each sample separately.

  • If scope is project, will compare between the assemblies found in the base steps at the project level.

Requires

  • fasta files in one of the following slots:

    • sample_data["fasta.nucl"]

    • sample_data[<sample>]["fasta.nucl"]

Output

  • Puts output directory in one of:
    • self.sample_data["project_data"]["quast"]

    • self.sample_data[<sample>]["quast"]

Parameters that can be set

Parameter

Values

Comments

scope

project | sample

Indicates whether to use a project or sample contigs file.

compare_mode

If ‘scope’ is ‘sample’, specifies whether to analyse each sample separately or to create a single comparison report for all samples.

Lines for parameter file

  1. A quast report for each sample separately:

quast1:
    module: quast
    base: spades1
    script_path: /path/to/quast.py
    scope: sample
    redirects:
        --fast: 
  1. A quast report comparing the sample assemblies:

quast1:
    module: quast
    base: spades1
    script_path: /path/to/quast.py
    compare_mode: 
    scope: sample
    redirects:
        --fast: 
  1. A quast report comparing the project assemblies from different stages of the analysis:

quast1:
    module: quast
    base: 
        - spades1
        - megahit1
    script_path: /path/to/quast.py
    compare_mode: 
    scope: project
    redirects:
        --fast: 

References

Gurevich, A., Saveliev, V., Vyahhi, N. and Tesler, G., 2013. QUAST: quality assessment tool for genome assemblies. Bioinformatics, 29(8), pp.1072-1075.