Sequence Annotation
Modules included in this section
Prokka
- Authors
Liron Levin
- Affiliation
Bioinformatics core facility
- Organization
National Institute of Biotechnology in the Negev, Ben Gurion University.
Note
This module was developed as part of a study led by Dr. Jacob Moran Gilad
Short Description
Runs Prokka on all samples
Requires
- For each Sample, a fasta.nucl file type [e.g. an assembly result] in:
sample_data[sample]["fasta.nucl"]
Output
- For each Sample, puts the location of the Sample’s GFF file in:
sample_data[sample]["GFF"]
- For each Sample, puts the location of the Sample’s identified genes file in:
sample_data[sample]["fasta.nucl"]
- For each Sample, puts the location of the Sample’s identified genes [translated] file in:
sample_data[sample]["fasta.prot"]
- if generate_GFF_dir option exist, puts the directory location of all Samples GFFs in:
sample_data["GFF_dir"]
Parameters that can be set
Parameter |
Values |
Comments |
---|---|---|
generate_GFF_dir |
Create GFF directory |
Lines for parameter file
Step_Name: # Name of this step
module: Prokka # Name of the module to use
base: # Name of the step [or list of names] to run after [must be after a fasta file generator step like an assembly program or start the analysis with fasta files]
script_path: # Command for running Prokka
env: # env parameters that needs to be in the PATH for running this module
qsub_params:
-pe: # Number of CPUs to reserve for this analysis
generate_GFF_dir: # Create GFF directory
redirects:
--cpus: # parameters for running Prokka
--force: # parameters for running Prokka
--genus: # parameters for running Prokka
--kingdom: # parameters for running Prokka
--proteins: # Use the location of a protein DB [FASTA] for extra annotation or use "VFDB" to use the module VFDB built-in virulence/resistance DB
References
Seemann, Torsten. “Prokka: rapid prokaryotic genome annotation.” Bioinformatics 30.14 (2014): 2068-2069.
prokka_old
*
- Authors
Menachem Sklarz
- Affiliation
Bioinformatics core facility
- Organization
National Institute of Biotechnology in the Negev, Ben Gurion University.
A module for running prokka:
Prokka is executed on the contigs stored in sample_data.
Requires
A nucleotide fasta file in one of the following slots:
sample_data[<sample>]["fasta.nucl"]
sample_data["fasta.nucl"]
Output
If
scope
is set tosample
:- Puts output predicted protein sequences (faa file) in:
sample_data[<sample>]["fasta.prot"]
- Puts output predicted protein genomic sequences (fna file) in:
sample_data[<sample>]["fasta.nucl"]
- Puts the annotation file (gff) in:
sample_data[<sample>]["gff"]
- Stores the prokks dir in:
sample_data[<sample>]["prokka.dir"]
If
scope
is set toproject
:- Puts output predicted protein sequences (faa file) in:
sample_data["fasta.prot"]
- Puts output predicted protein genomic sequences (fna file) in:
sample_data["fasta.nucl"]
- Puts the annotation file (gff) in:
sample_data["gff"]
- Stores the prokks dir in:
sample_data["prokka.dir"]
Parameters that can be set
Parameter |
Values |
Comments |
---|---|---|
generate_GFF_dir |
empty |
Create a dir with links to the gff files for use downstream by others. Only relevant when |
Comments
If you set values to --locustag
, --genus
, --species
and --strain
, these will hold for all the samples, and will be passed as-is to the scripts.
If you pass the parameters without setting their values, the values will be set to the sample names (or to the project name, when scope == 'project'
).
Lines for parameter file
prokka1:
module: prokka_old
base: spades1
script_path: /path/to/prokka
qsub_params:
-pe: shared 20
generate_GFF_dir:
scope: sample
redirects:
--cpus: 20
--fast:
--force:
--genus: Staphylococcus
--metagenome:
--strain:
References
Seemann, T., 2014. Prokka: rapid prokaryotic genome annotation. Bioinformatics, 30(14), pp.2068-2069.
Comments