RNA-Seq using a reference genome

Author: Liron Levin
Affiliation: Bioinformatics Core Facility
Organization: National Institute of Biotechnology in the Negev, Ben Gurion University.

Page Contents:

Steps:
Workflow Schema
Requires
Programs required
Example of Sample File
Quick start with conda

Note

In order to use this Work-Flow first:

Install NeatSeq-Flow using conda
Make sure that conda is in your PATH.

Steps:

Merge Decompression and Concatenation (IF NEADED) of read files into single files per direction.
FastQC_Merge Quality tests on the original reads using FastQC.
MultiQC_pre_trim Quality report on the original reads using MultiQC.
Trim_Galore Reads trimming using Trim_Galore.
FastQC_Trim_Galore Quality tests on reads after trimming using FastQC.
RSEM_Genome indexing of the reference genome, mapping of the post trimming reads and count data creation.
MultiQC_post_trim Quality report on the trimmed reads and mapping information using MultiQC.

Workflow Schema 

Note

It is possible to add a DeSeq2 step for Differential Expression, Clustering and Functional Analyses. For more information see the DESeq2 Tutorial

Requires 

Paired end or single-end reads fastq files. .

A reference genome in fasta format

An annotation file in gtf format

Programs required 

Note

The programs are installed as part of the installation process using CONDA.

Example of Sample File 

Create a tab-delimited sample file. It should look as follows:

Title       RNA_seq

#SampleID   Type    Path
Sample1     Forward /path/to/Sample1_F1.fastq.gz
Sample1     Forward /path/to/Sample1_F2.fastq.gz
Sample1     Reverse /path/to/Sample1_R1.fastq.gz
Sample1     Reverse /path/to/Sample1_R2.fastq.gz
Sample2     Forward /path/to/Sample2_F1.fastq.gz
Sample2     Reverse /path/to/Sample2_R1.fastq.gz
Sample2     Forward /path/to/Sample2_F2.fastq.gz
Sample2     Reverse /path/to/Sample2_R2.fastq.gz

Note

You can edit the file in excel but make sure to save it in tab-delimited format. See this section of the manual for a full description of the sample file format.

Quick start with conda 

Install all the required programs in to a conda environment:

Download the RNASeq conda environment installer file:
curl -LO https://raw.githubusercontent.com/bioinfo-core-BGU/neatseq-flow-modules/master/docs/source/_extra/RNASeq_env_install.yaml
Create the RNASeq conda environment:
conda env create -f RNASeq_env_install.yaml

Download the Work-Flow’s Parameter file:

Using STAR as the mapper:

curl https://raw.githubusercontent.com/bioinfo-core-BGU/neatseq-flow-modules/master/Workflows/RNASeq_STAR.yaml > RNASeq.yaml

Using Bowtie2 as the mapper:

curl https://raw.githubusercontent.com/bioinfo-core-BGU/neatseq-flow-modules/master/Workflows/RNASeq_Bowtie2.yaml > RNASeq.yaml

Activate the NeatSeq_Flow conda environment:

bash
source activate NeatSeq_Flow

Edit the “Vars” section in the Work-Flow’s Parameter file:

Specify the location of the gtf and reference genome files

Note

It is recommended to use the NeatSeq-Flow GUI in order to:

Edit the Work-Flow’s Parameter file
Create a Samples file
Generate and run the Work-Flow’s scripts.

NeatSeq_Flow_GUI.py

Learn more about How to use NeatSeq-Flow GUI

Alternatively, It is possible to use a text editor.

Generate the scripts by typing in the command line:

neatseq_flow.py -s Samples_file.nsfs -p RNASeq.yaml

Run the Work-Flow by typing in the command line:

bash  scripts/00.workflow.commands.sh  1> null &

Run the Work-Flow monitor by typing in the command line:

neatseq_flow_monitor.py

RNA-Seq using a reference genome

Steps:

Workflow Schema

Requires

Programs required

Example of Sample File

Quick start with conda

Workflow Schema 

Requires 

Programs required 

Example of Sample File 

Quick start with conda 