Sequence Annotation

Modules included in this section

Prokka
prokka_old ^*

`Prokka`

Authors: Liron Levin
Affiliation: Bioinformatics core facility
Organization: National Institute of Biotechnology in the Negev, Ben Gurion University.

Note

This module was developed as part of a study led by Dr. Jacob Moran Gilad

Short Description

Runs Prokka on all samples

Requires

For each Sample, a fasta.nucl file type [e.g. an assembly result] in:
sample_data[sample]["fasta.nucl"]

Output

For each Sample, puts the location of the Sample’s GFF file in:
sample_data[sample]["GFF"]

For each Sample, puts the location of the Sample’s identified genes file in:
sample_data[sample]["fasta.nucl"]

For each Sample, puts the location of the Sample’s identified genes [translated] file in:
sample_data[sample]["fasta.prot"]

if generate_GFF_dir option exist, puts the directory location of all Samples GFFs in:
sample_data["GFF_dir"]

Parameters that can be set

Parameter	Values	Comments
generate_GFF_dir		Create GFF directory

Comments

Lines for parameter file

Step_Name:                                  # Name of this step
    module: Prokka                          # Name of the module to use
    base:                                   # Name of the step [or list of names] to run after [must be after a fasta file generator step like an assembly program or start the analysis with fasta files]
    script_path:                            # Command for running Prokka 
    env:                                    # env parameters that needs to be in the PATH for running this module
    qsub_params:
        -pe:                                # Number of CPUs to reserve for this analysis
    generate_GFF_dir:                       # Create GFF directory
    redirects:
        --cpus:                             # parameters for running Prokka
        --force:                            # parameters for running Prokka
        --genus:                            # parameters for running Prokka
        --kingdom:                          # parameters for running Prokka
        --proteins:                         # Use the location of a protein DB [FASTA] for extra annotation or use "VFDB" to use the module VFDB built-in virulence/resistance DB  

References

Seemann, Torsten. “Prokka: rapid prokaryotic genome annotation.” Bioinformatics 30.14 (2014): 2068-2069.‏

`prokka_old` ^*

Authors: Menachem Sklarz
Affiliation: Bioinformatics core facility
Organization: National Institute of Biotechnology in the Negev, Ben Gurion University.

A module for running prokka:

Prokka is executed on the contigs stored in sample_data.

Requires

A nucleotide fasta file in one of the following slots:
- sample_data[<sample>]["fasta.nucl"]
- sample_data["fasta.nucl"]

Output

If scope is set to sample:
- Puts output predicted protein sequences (faa file) in:
  sample_data[<sample>]["fasta.prot"]
- Puts output predicted protein genomic sequences (fna file) in:
  sample_data[<sample>]["fasta.nucl"]
- Puts the annotation file (gff) in:
  sample_data[<sample>]["gff"]
- Stores the prokks dir in:
  sample_data[<sample>]["prokka.dir"]
If scope is set to project:
- Puts output predicted protein sequences (faa file) in:
  sample_data["fasta.prot"]
- Puts output predicted protein genomic sequences (fna file) in:
  sample_data["fasta.nucl"]
- Puts the annotation file (gff) in:
  sample_data["gff"]
- Stores the prokks dir in:
  sample_data["prokka.dir"]

Parameters that can be set

Parameter	Values	Comments
generate_GFF_dir	empty	Create a dir with links to the gff files for use downstream by others. Only relevant when `scope=='sample'`

Comments

If you set values to --locustag, --genus, --species and --strain, these will hold for all the samples, and will be passed as-is to the scripts.

If you pass the parameters without setting their values, the values will be set to the sample names (or to the project name, when scope == 'project').

Lines for parameter file

prokka1:
    module: prokka_old
    base: spades1
    script_path: /path/to/prokka
    qsub_params:
        -pe: shared 20
    generate_GFF_dir: 
    scope: sample
    redirects:
        --cpus: 20
        --fast: 
        --force:
        --genus: Staphylococcus
        --metagenome: 
        --strain: 

References

Seemann, T., 2014. Prokka: rapid prokaryotic genome annotation. Bioinformatics, 30(14), pp.2068-2069.

Sequence Annotation

Prokka

Short Description

Requires

Output

Parameters that can be set

Comments

Lines for parameter file

References

prokka_old *

Requires

Output

Parameters that can be set

Comments

Lines for parameter file

References

`Prokka`

`prokka_old` ^*