Sequence Annotation

Modules included in this section

Prokka

Authors

Liron Levin

Affiliation

Bioinformatics core facility

Organization

National Institute of Biotechnology in the Negev, Ben Gurion University.

Note

This module was developed as part of a study led by Dr. Jacob Moran Gilad

Short Description

Runs Prokka on all samples

Requires

  • For each Sample, a fasta.nucl file type [e.g. an assembly result] in:

    sample_data[sample]["fasta.nucl"]

Output

  • For each Sample, puts the location of the Sample’s GFF file in:

    sample_data[sample]["GFF"]

  • For each Sample, puts the location of the Sample’s identified genes file in:

    sample_data[sample]["fasta.nucl"]

  • For each Sample, puts the location of the Sample’s identified genes [translated] file in:

    sample_data[sample]["fasta.prot"]

  • if generate_GFF_dir option exist, puts the directory location of all Samples GFFs in:

    sample_data["GFF_dir"]

Parameters that can be set

Parameter

Values

Comments

generate_GFF_dir

Create GFF directory

Comments

Lines for parameter file

Step_Name:                                  # Name of this step
    module: Prokka                          # Name of the module to use
    base:                                   # Name of the step [or list of names] to run after [must be after a fasta file generator step like an assembly program or start the analysis with fasta files]
    script_path:                            # Command for running Prokka 
    env:                                    # env parameters that needs to be in the PATH for running this module
    qsub_params:
        -pe:                                # Number of CPUs to reserve for this analysis
    generate_GFF_dir:                       # Create GFF directory
    redirects:
        --cpus:                             # parameters for running Prokka
        --force:                            # parameters for running Prokka
        --genus:                            # parameters for running Prokka
        --kingdom:                          # parameters for running Prokka
        --proteins:                         # Use the location of a protein DB [FASTA] for extra annotation or use "VFDB" to use the module VFDB built-in virulence/resistance DB  

References

Seemann, Torsten. “Prokka: rapid prokaryotic genome annotation.” Bioinformatics 30.14 (2014): 2068-2069.‏

prokka_old *

Authors

Menachem Sklarz

Affiliation

Bioinformatics core facility

Organization

National Institute of Biotechnology in the Negev, Ben Gurion University.

A module for running prokka:

Prokka is executed on the contigs stored in sample_data.

Requires

  • A nucleotide fasta file in one of the following slots:

    • sample_data[<sample>]["fasta.nucl"]

    • sample_data["fasta.nucl"]

Output

  • If scope is set to sample:

    • Puts output predicted protein sequences (faa file) in:

      sample_data[<sample>]["fasta.prot"]

    • Puts output predicted protein genomic sequences (fna file) in:

      sample_data[<sample>]["fasta.nucl"]

    • Puts the annotation file (gff) in:

      sample_data[<sample>]["gff"]

    • Stores the prokks dir in:

      sample_data[<sample>]["prokka.dir"]

  • If scope is set to project:

    • Puts output predicted protein sequences (faa file) in:

      sample_data["fasta.prot"]

    • Puts output predicted protein genomic sequences (fna file) in:

      sample_data["fasta.nucl"]

    • Puts the annotation file (gff) in:

      sample_data["gff"]

    • Stores the prokks dir in:

      sample_data["prokka.dir"]

Parameters that can be set

Parameter

Values

Comments

generate_GFF_dir

empty

Create a dir with links to the gff files for use downstream by others. Only relevant when scope=='sample'

Comments

If you set values to --locustag, --genus, --species and --strain, these will hold for all the samples, and will be passed as-is to the scripts.

If you pass the parameters without setting their values, the values will be set to the sample names (or to the project name, when scope == 'project').

Lines for parameter file

prokka1:
    module: prokka_old
    base: spades1
    script_path: /path/to/prokka
    qsub_params:
        -pe: shared 20
    generate_GFF_dir: 
    scope: sample
    redirects:
        --cpus: 20
        --fast: 
        --force:
        --genus: Staphylococcus
        --metagenome: 
        --strain: 

References

Seemann, T., 2014. Prokka: rapid prokaryotic genome annotation. Bioinformatics, 30(14), pp.2068-2069.