Transcriptome Annotation

Modules included in this section

Trinotate

Authors:Menachem Sklarz
Affiliation:Bioinformatics core facility
Organization:National Institute of Biotechnology in the Negev, Ben Gurion University.

A class that defines a module for RNA_seq assembly annotation using Trinotate.

Note

This module will be updated in the future to support uploading of other sources of information such as RNAMMER output. See Trinotate documentation.

Requires

  • A transcripts file in
    • self.sample_data[“project_data”][“transcripts.fasta.nucl”],
  • A gene to transcript mapping file in: (produced by Trinity_gene_to_trans_map module)
    • self.sample_data[“project_data”][“gene_trans_map”],
  • A protein fasta file (produced by TransDecoder)
    • self.sample_data[“project_data”][“fasta.prot”])
  • Results of blastp of protein file against swissprot database:
    • self.sample_data[“project_data”][“blast.prot”],
  • Results of blastx of transcripts file against swissprot database:
    • self.sample_data[“project_data”][“blast.nucl”],
  • Results of hmmscan of protein file against pfam database:
    • self.sample_data[“project_data”][“hmmscan.prot”])

Attention

If scope is set to sample, all of the above files should be in the sample scope!

Output:

  • puts Trinotate report file in:

    • sample_data[<sample>]["trino.rep"] (scope = sample)
    • sample_data["trino.rep"] (scope = project)

Parameters that can be set

Parameter Values Comments
scope sample|project  
sqlitedb   Path to Trinotate sqlitedb
cp_sqlitedb   Create local copy of the sqlitedb, before loading teh data (recommended)

Lines for parameter file

trino_Trinotate:
    module:             Trinotate
    base:               
                        - trino_blastp_sprot
                        - trino_blastx_sprot
                        - trino_hmmscan1
    script_path:        {Vars.paths.Trinotate}
    scope:              project
    sqlitedb:           {Vars.databases.trinotate.sqlitedb}
    cp_sqlitedb:    

References

Grabherr, M.G., Haas, B.J., Yassour, M., Levin, J.Z., Thompson, D.A., Amit, I., Adiconis, X., Fan, L., Raychowdhury, R., Zeng, Q. and Chen, Z., 2011. Trinity: reconstructing a full-length transcriptome without a genome from RNA-Seq data. Nature biotechnology, 29(7), p.644.

TransDecoder

Authors:Menachem Sklarz
Affiliation:Bioinformatics core facility
Organization:National Institute of Biotechnology in the Negev, Ben Gurion University.

A module for running TransDecoder on a transcripts file.

Note

Tested on TransDecoder version 5.5.0.. The main difference being that in this version an output directory can be specified in the command line.

Requires

fasta files in at least one of the following slots:

  • sample_data[<sample>]["fasta.nucl"] (if scope = sample)
  • sample_data["fasta.nucl"] (if scope = project)

Output:

  • If scope = project:

    • Protein fasta in self.sample_data["project_data"]["fasta.prot"]
    • Gene fasta in self.sample_data["project_data"]["fasta.nucl"]
    • Original transcripts in self.sample_data["project_data"]["transcripts.fasta.nucl"]
    • GFF file in self.sample_data["project_data"]["gff3"]
  • If scope = sample:

    • Protein fasta in self.sample_data[<sample>]["fasta.prot"]
    • Gene fasta in self.sample_data[<sample>]["fasta.nucl"]
    • Original transcripts in self.sample_data[<sample>]["transcripts.fasta.nucl"]
    • GFF file in self.sample_data[<sample>]["gff3"]

Parameters that can be set

Parameter Values Comments
scope sample|project Determine weather to use sample or project transcripts file.

Lines for parameter file

trino_Transdecode_highExpr:
    module:             TransDecoder
    base:               Split_Fasta
    script_path:        {Vars.paths.TransDecoder}
    scope:              sample

References