Sequence Annotation

Modules included in this section

Prokka

Authors:Liron Levin
Affiliation:Bioinformatics core facility
Organization:National Institute of Biotechnology in the Negev, Ben Gurion University.

Note

This module was developed as part of a study led by Dr. Jacob Moran Gilad

Short Description

Runs Prokka on all samples

Requires

  • For each Sample, a fasta.nucl file type [e.g. an assembly result] in:
    sample_data[sample]["fasta.nucl"]

Output

  • For each Sample, puts the location of the Sample’s GFF file in:
    sample_data[sample]["GFF"]
  • For each Sample, puts the location of the Sample’s identified genes file in:
    sample_data[sample]["fasta.nucl"]
  • For each Sample, puts the location of the Sample’s identified genes [translated] file in:
    sample_data[sample]["fasta.prot"]
  • if generate_GFF_dir option exist, puts the directory location of all Samples GFFs in:
    sample_data["GFF_dir"]

Parameters that can be set

Parameter Values Comments
generate_GFF_dir   Create GFF directory

Comments

Lines for parameter file

Step_Name:                                  # Name of this step
    module: Prokka                          # Name of the module to use
    base:                                   # Name of the step [or list of names] to run after [must be after a fasta file generator step like an assembly program or start the analysis with fasta files]
    script_path:                            # Command for running Prokka 
    env:                                    # env parameters that needs to be in the PATH for running this module
    qsub_params:
        -pe:                                # Number of CPUs to reserve for this analysis
    generate_GFF_dir:                       # Create GFF directory
    redirects:
        --cpus:                             # parameters for running Prokka
        --force:                            # parameters for running Prokka
        --genus:                            # parameters for running Prokka
        --kingdom:                          # parameters for running Prokka
        --proteins:                         # Use the location of a protein DB [FASTA] for extra annotation or use "VFDB" to use the module VFDB built-in virulence/resistance DB  

References

Seemann, Torsten. “Prokka: rapid prokaryotic genome annotation.” Bioinformatics 30.14 (2014): 2068-2069.‏

prokka_old *

Authors:Menachem Sklarz
Affiliation:Bioinformatics core facility
Organization:National Institute of Biotechnology in the Negev, Ben Gurion University.

A module for running prokka:

Prokka is executed on the contigs stored in sample_data.

Requires

  • A nucleotide fasta file in one of the following slots:

    • sample_data[<sample>]["fasta.nucl"]
    • sample_data["fasta.nucl"]

Output

  • If scope is set to sample:

    • Puts output predicted protein sequences (faa file) in:
      sample_data[<sample>]["fasta.prot"]
    • Puts output predicted protein genomic sequences (fna file) in:
      sample_data[<sample>]["fasta.nucl"]
    • Puts the annotation file (gff) in:
      sample_data[<sample>]["gff"]
    • Stores the prokks dir in:
      sample_data[<sample>]["prokka.dir"]
  • If scope is set to project:

    • Puts output predicted protein sequences (faa file) in:
      sample_data["fasta.prot"]
    • Puts output predicted protein genomic sequences (fna file) in:
      sample_data["fasta.nucl"]
    • Puts the annotation file (gff) in:
      sample_data["gff"]
    • Stores the prokks dir in:
      sample_data["prokka.dir"]

Parameters that can be set

Parameter Values Comments
generate_GFF_dir empty Create a dir with links to the gff files for use downstream by others. Only relevant when scope=='sample'

Comments

If you set values to --locustag, --genus, --species and --strain, these will hold for all the samples, and will be passed as-is to the scripts.

If you pass the parameters without setting their values, the values will be set to the sample names (or to the project name, when scope == 'project').

Lines for parameter file

prokka1:
    module: prokka_old
    base: spades1
    script_path: /path/to/prokka
    qsub_params:
        -pe: shared 20
    generate_GFF_dir: 
    scope: sample
    redirects:
        --cpus: 20
        --fast: 
        --force:
        --genus: Staphylococcus
        --metagenome: 
        --strain: 

References

Seemann, T., 2014. Prokka: rapid prokaryotic genome annotation. Bioinformatics, 30(14), pp.2068-2069.