RNASeq
Modules included in this section
DeSeq2
- Authors
Liron Levin
- Affiliation
Bioinformatics core facility
- Organization
National Institute of Biotechnology in the Negev, Ben Gurion University.
Short Description
A module to preform: * Gene level differential expression using DeSeq2. * Gene annotation. * PCA plot. * Clustering of significant genes. * Heatmaps of significant genes by clusters. * Expression patterns plot by clusters * Enrichment analysis KEGG/GO.
Requires
- Search for count data in :
self.sample_data[<sample>][“RSEM”] self.sample_data[<sample>][“genes.counts”] self.sample_data[<sample>][“HTSeq.counts”] self.sample_data[“project_data”][“results”]
Parameters that can be set
Parameter |
Values |
Comments |
---|---|---|
use_click |
Will use the CLICK clustering program (Shamir et al. 2000) |
Note
If your using the use_click option, cite: Expander: Ulitsky I, Maron-Katz A, Shavit S, Sagir D, Linhart C, Elkon R, Tanay A, Sharan R, Shiloh Y, Shamir R. Expander: from expression microarrays to networks and functions. Nature Protocols Vol 5, pp 303 - 322, 2010 Click: Shamir , R. and Sharan, R. CLICK: A Clustering Algorithm with Applications to Gene Expression Analysis. Proceedings ISMB 2000, pp.307-316 (2000)
Lines for parameter file
Step_Name: # Name of this step
module: DeSeq2 # Name of the used module
base: # Name of the step [or list of names] to run after with count results.
script_path: # Command for running the a DeSeq2 script
# If this line is empty or missing it will try using the module's associated script
use_click: # Will use the CLICK clustering program (Shamir et al. 2000).
redirects:
--SAMPLE_DATA_FILE: # Path to Samples Information File
--GENE_ID_TYPE: # The Gene ID Type i.e 'ENSEMBL'[for Bioconductor] OR 'ensembl_gene_id'/'ensembl_transcript_id' [for ENSEMBL]
--Annotation_db: # Bioconductor Annotation Data Base Name from https://bioconductor.org/packages/release/BiocViews.html#___OrgDb
--Species: # Species Name to Retrieve Annotation Data from ENSEMBL
--KEGG_Species: # Species Name to Retrieve Annotation Data from KEGG
--KEGG_KAAS: # Gene to KO file from KEGG KAAS [first column gene id, second column KO number]
--Trinotate: # Path to a Trinotate annotation file in which the first column is the genes names
--FILTER_SAMPLES: # Filter Samples with Low Number of expressed genes OR with Small Library size using 'scater' package
--FILTER_GENES: # Filter Low-Abundance Genes using 'scater' package
--NORMALIZATION_TYPE: # The DeSeq2 Normalization Type To Use [VSD , RLOG] The Default is VSD
--BLIND_NORM: # Perform Blind Normalization
--DESIGN: # The Main DeSeq2 Design [ ~ Group ]
--removeBatchEffect # Will Remove Batch Effect from the Normalized counts data up to 2
# [using the limma package and only one using the sva package]
# Batch Effect fields [from the Sample Data ] separated by ,
--removeBatchEffect_method # The method to Remove Batch Effect from the Normalized counts data using the limma or sva packages [sva is the default]
--LRT: # The LRT DeSeq2 Design
--ALPHA: # Significant Level Cutoff, The Default is 0.05
--Post_statistical_ALPHA # Post Statistical P-value Filtering
--FoldChange: # Fold change Cutoff [testing for fold changes greater in absolute value], The Default is 1
--Post_statistical_FoldChange # Post Statistical Fold change Filtering
--CONTRAST: # The DeSeq Contrast Design ["Group,Treatment,Control"] [Not For LTR] .
# It is possible to define more then one contrast Design ["Group,Treatment1,Control1|Group,Treatment2,Control2|..."]
--SPLIT_BY_CONTRAST # Only use Samples found in the relevant contrast for Clustering and Enrichment Analysis
--modelMatrixType: # How the DeSeq model matrix of the GLM formula is formed [standard or expanded] ,The Default is standard
--GENES_PLOT: # Genes Id To Plot count Data [separated by ',']
--X_AXIS: # The Filed In the Sample Data To Use as X Axis
--GROUP: # The Filed In the Sample Data To Group By [can be two fields separated by ',']
--SPLIT_BY: # The Filed In the Sample Data To Split the Analysis By.
--FUNcluster: # A clustering function including [kmeans,pam,clara,fanny,hclust,agnes,diana,click]. The default is hclust
# If the 'use_click' option is used the '--FUNcluster' option is set to 'click'
--hc_metric: # Hierarchical clustering metric to be used for calculating dissimilarities between observations. The default is pearson
--hc_method: # Hierarchical clustering agglomeration method to be used. The default is ward.D2
--k.max: # The maximum number of clusters to consider, must be at least two. The default is 20
--nboot: # Number of Monte Carlo (bootstrap) samples for determining the number of clusters [Not For Mclust]. The default is 10
--stand: # The Data will be Standardized Before Clustering.
--Mclust: # Use Mclust for determining the number of clusters.
--CLICK_HOMOGENEITY: # The HOMOGENEITY [0-1] of clusters using CLICK program (Shamir et al. 2000). The default is 0.5
--PCA_COLOR: # The Filed In the Sample Data To Determine Color In The PCA Plot
--PCA_SHAPE: # The Filed In the Sample Data To Determine Shape In The PCA Plot
--PCA_SIZE: # The Filed In the Sample Data To Determine Size In The PCA Plot. The default is Library Size
--Enriched_terms_overlap: # Test for genes overlap in enriched terms
--USE_INPUT_GENES_AS_BACKGROUND # Use The input Genes as the Background for Enrichment Analysis
--only_clustering # Don't Perform Differential Analysis!!!
--significant_genes # Use these genes as the set of significant genes [a comma separated list]
--collapseReplicates # Will collapse technical replicates using a Sample Data field indicating which samples are technical replicates
Comments
Note
It is Possible to use CONDA to install all dependencies:
Flow this Tutorial for More Information.