Transcriptome Annotation

Modules included in this section

Trinotate

Authors

Menachem Sklarz

Affiliation

Bioinformatics core facility

Organization

National Institute of Biotechnology in the Negev, Ben Gurion University.

A class that defines a module for RNA_seq assembly annotation using Trinotate.

Note

This module will be updated in the future to support uploading of other sources of information such as RNAMMER output. See Trinotate documentation.

Requires

  • A transcripts file in
    • self.sample_data[“project_data”][“transcripts.fasta.nucl”],

  • A gene to transcript mapping file in: (produced by Trinity_gene_to_trans_map module)
    • self.sample_data[“project_data”][“gene_trans_map”],

  • A protein fasta file (produced by TransDecoder)
    • self.sample_data[“project_data”][“fasta.prot”])

  • Results of blastp of protein file against swissprot database:
    • self.sample_data[“project_data”][“blast.prot”],

  • Results of blastx of transcripts file against swissprot database:
    • self.sample_data[“project_data”][“blast.nucl”],

  • Results of hmmscan of protein file against pfam database:
    • self.sample_data[“project_data”][“hmmscan.prot”])

  • Results of signalp of protein file using signalp program: [ optional ]
    • self.sample_data[“project_data”][“signalp”])

  • Results of rnammer/infernal transcripts of file: [ optional, use Infernal with Trinotate-V4 ]
    • self.sample_data[“project_data”][“rnammer”])

  • Results of tmhmm of protein file using TmHMM program: [ optional ]
    • self.sample_data[“project_data”][“tmhmm”])

  • Results of eggnog of protein file using EggnogMapper program: [ optional only Trinotate-V4]
    • self.sample_data[“project_data”][“eggnog”])

Attention

If scope is set to sample, all of the above files should be in the sample scope!

Output:

  • puts Trinotate report file in:

    • sample_data[<sample>]["trino.rep"] (scope = sample)

    • sample_data["trino.rep"] (scope = project)

Parameters that can be set

Parameter

Values

Comments

scope

sample|project

sqlitedb

Path to Trinotate sqlitedb

cp_sqlitedb

Create local copy of the sqlitedb, before loading teh data (recommended)

ver4

Indicate you are using Trinotate V4

Lines for parameter file

trino_Trinotate:
    module:             Trinotate
    base:               
                        - trino_blastp_sprot
                        - trino_blastx_sprot
                        - trino_hmmscan1
    script_path:        {Vars.paths.Trinotate}
    scope:              project
    sqlitedb:           {Vars.databases.trinotate.sqlitedb}
    cp_sqlitedb:    
    ver4:

References

Grabherr, M.G., Haas, B.J., Yassour, M., Levin, J.Z., Thompson, D.A., Amit, I., Adiconis, X., Fan, L., Raychowdhury, R., Zeng, Q. and Chen, Z., 2011. Trinity: reconstructing a full-length transcriptome without a genome from RNA-Seq data. Nature biotechnology, 29(7), p.644.

TransDecoder

Authors

Menachem Sklarz

Affiliation

Bioinformatics core facility

Organization

National Institute of Biotechnology in the Negev, Ben Gurion University.

A module for running TransDecoder on a transcripts file.

Note

Tested on TransDecoder version 5.5.0.. The main difference being that in this version an output directory can be specified in the command line.

Requires

fasta files in at least one of the following slots:

  • sample_data[<sample>]["fasta.nucl"] (if scope = sample)

  • sample_data["fasta.nucl"] (if scope = project)

Output:

  • If scope = project:

    • Protein fasta in self.sample_data["project_data"]["fasta.prot"]

    • Gene fasta in self.sample_data["project_data"]["fasta.nucl"]

    • Original transcripts in self.sample_data["project_data"]["transcripts.fasta.nucl"]

    • GFF file in self.sample_data["project_data"]["gff3"]

  • If scope = sample:

    • Protein fasta in self.sample_data[<sample>]["fasta.prot"]

    • Gene fasta in self.sample_data[<sample>]["fasta.nucl"]

    • Original transcripts in self.sample_data[<sample>]["transcripts.fasta.nucl"]

    • GFF file in self.sample_data[<sample>]["gff3"]

Parameters that can be set

Parameter

Values

Comments

scope

sample|project

Determine weather to use sample or project transcripts file.

Lines for parameter file

trino_Transdecode_highExpr:
    module:             TransDecoder
    base:               Split_Fasta
    script_path:        {Vars.paths.TransDecoder}
    scope:              sample

References