RNA-Seq using a reference genome
- Author
Liron Levin
- Affiliation
Bioinformatics Core Facility
- Organization
National Institute of Biotechnology in the Negev, Ben Gurion University.
Page Contents:
Note
In order to use this Work-Flow first:
Install NeatSeq-Flow using conda
Make sure that conda is in your PATH.
Steps:
Merge Decompression and Concatenation (IF NEADED) of read files into single files per direction.
FastQC_Merge Quality tests on the original reads using FastQC.
MultiQC_pre_trim Quality report on the original reads using MultiQC.
Trim_Galore Reads trimming using Trim_Galore.
FastQC_Trim_Galore Quality tests on reads after trimming using FastQC.
RSEM_Genome indexing of the reference genome, mapping of the post trimming reads and count data creation.
MultiQC_post_trim Quality report on the trimmed reads and mapping information using MultiQC.
Workflow Schema

Note
It is possible to add a DeSeq2 step for Differential Expression, Clustering and Functional Analyses. For more information see the DESeq2 Tutorial
Requires
Paired end or single-end reads fastq files. .
A reference genome in fasta format
An annotation file in gtf format
Programs required
Note
The programs are installed as part of the installation process using CONDA.
Example of Sample File
Create a tab-delimited sample file. It should look as follows:
Title RNA_seq
#SampleID Type Path
Sample1 Forward /path/to/Sample1_F1.fastq.gz
Sample1 Forward /path/to/Sample1_F2.fastq.gz
Sample1 Reverse /path/to/Sample1_R1.fastq.gz
Sample1 Reverse /path/to/Sample1_R2.fastq.gz
Sample2 Forward /path/to/Sample2_F1.fastq.gz
Sample2 Reverse /path/to/Sample2_R1.fastq.gz
Sample2 Forward /path/to/Sample2_F2.fastq.gz
Sample2 Reverse /path/to/Sample2_R2.fastq.gz
Note
You can edit the file in excel but make sure to save it in tab-delimited format. See this section of the manual for a full description of the sample file format.
Quick start with conda
Install all the required programs in to a conda environment:
Download the
RNASeq conda environment installer file
:curl -LO https://raw.githubusercontent.com/bioinfo-core-BGU/neatseq-flow-modules/master/docs/source/_extra/RNASeq_env_install.yamlCreate the RNASeq conda environment:
conda env create -f RNASeq_env_install.yaml
Download the Work-Flow’s Parameter file:
curl https://raw.githubusercontent.com/bioinfo-core-BGU/neatseq-flow-modules/master/Workflows/RNASeq_STAR.yaml > RNASeq.yamlcurl https://raw.githubusercontent.com/bioinfo-core-BGU/neatseq-flow-modules/master/Workflows/RNASeq_Bowtie2.yaml > RNASeq.yaml
Activate the NeatSeq_Flow conda environment:
bash source activate NeatSeq_Flow
- Edit the “Vars” section in the Work-Flow’s Parameter file:
Specify the location of the gtf and reference genome files
Note
It is recommended to use the NeatSeq-Flow GUI in order to:
Edit the Work-Flow’s Parameter file
Create a Samples file
Generate and run the Work-Flow’s scripts.
NeatSeq_Flow_GUI.py
Learn more about How to use NeatSeq-Flow GUI
Alternatively, It is possible to use a text editor.
Generate the scripts by typing in the command line:
neatseq_flow.py -s Samples_file.nsfs -p RNASeq.yaml
Run the Work-Flow by typing in the command line:
bash scripts/00.workflow.commands.sh 1> null &
Run the Work-Flow monitor by typing in the command line:
neatseq_flow_monitor.py