RNA-Seq using a reference genome
Bioinformatics Core Facility
National Institute of Biotechnology in the Negev, Ben Gurion University.
In order to use this Work-Flow first:
Install NeatSeq-Flow using conda
Make sure that conda is in your PATH.
Merge Decompression and Concatenation (IF NEADED) of read files into single files per direction.
FastQC_Merge Quality tests on the original reads using FastQC.
MultiQC_pre_trim Quality report on the original reads using MultiQC.
Trim_Galore Reads trimming using Trim_Galore.
FastQC_Trim_Galore Quality tests on reads after trimming using FastQC.
RSEM_Genome indexing of the reference genome, mapping of the post trimming reads and count data creation.
MultiQC_post_trim Quality report on the trimmed reads and mapping information using MultiQC.
It is possible to add a DeSeq2 step for Differential Expression, Clustering and Functional Analyses. For more information see the DESeq2 Tutorial
Paired end or single-end reads fastq files. .
A reference genome in fasta format
An annotation file in gtf format
The programs are installed as part of the installation process using CONDA.
Create a tab-delimited sample file. It should look as follows:
#SampleID Type Path
Sample1 Forward /path/to/Sample1_F1.fastq.gz
Sample1 Forward /path/to/Sample1_F2.fastq.gz
Sample1 Reverse /path/to/Sample1_R1.fastq.gz
Sample1 Reverse /path/to/Sample1_R2.fastq.gz
Sample2 Forward /path/to/Sample2_F1.fastq.gz
Sample2 Reverse /path/to/Sample2_R1.fastq.gz
Sample2 Forward /path/to/Sample2_F2.fastq.gz
Sample2 Reverse /path/to/Sample2_R2.fastq.gz
You can edit the file in excel but make sure to save it in tab-delimited format. See this section of the manual for a full description of the sample file format.
Install all the required programs in to a conda environment:
RNASeq conda environment installer file:curl -LO https://raw.githubusercontent.com/bioinfo-core-BGU/neatseq-flow-modules/master/docs/source/_extra/RNASeq_env_install.yaml
Create the RNASeq conda environment:conda env create -f RNASeq_env_install.yaml
Download the Work-Flow’s Parameter file:
curl https://raw.githubusercontent.com/bioinfo-core-BGU/neatseq-flow-modules/master/Workflows/RNASeq_STAR.yaml > RNASeq.yamlcurl https://raw.githubusercontent.com/bioinfo-core-BGU/neatseq-flow-modules/master/Workflows/RNASeq_Bowtie2.yaml > RNASeq.yaml
Activate the NeatSeq_Flow conda environment:
bash source activate NeatSeq_Flow
- Edit the “Vars” section in the Work-Flow’s Parameter file:
Specify the location of the gtf and reference genome files
It is recommended to use the NeatSeq-Flow GUI in order to:
Edit the Work-Flow’s Parameter file
Create a Samples file
Generate and run the Work-Flow’s scripts.
Learn more about How to use NeatSeq-Flow GUI
Alternatively, It is possible to use a text editor.
Generate the scripts by typing in the command line:
neatseq_flow.py -s Samples_file.nsfs -p RNASeq.yaml
Run the Work-Flow by typing in the command line:
bash scripts/00.workflow.commands.sh 1> null &
Run the Work-Flow monitor by typing in the command line: