Microbiome analysis using QIIME2

Author

Menachem Sklarz

Affiliation

Bioinformatics Core Facility

Organization

National Institute of Biotechnology in the Negev, Ben Gurion University.

Attention

This workflow is in active development!

Captive and Wild Atlantic Salmon project

Background

This workflow is based on the data described in Structural and compositional mismatch between captive and wild Atlantic salmon (Salmo salar) parrs gut microbiota highlights the relevance of integrating molecular ecology for management and conservation methods 1.

The data for the workflow is available on datadryad.

The workflow demonstrates executing qiime2 on a set of illumina paired-end reads.

The data for the workflow includes the raw reads and a metadata file. Obtaining the files will be demostrated in a later section.

The workflow also downloads a classifier object. This file is not included in the sample file as it is not specific to the project at hand. NeatSeq-Flow can also be programmed to train your own classifier but this option is beyond the scope of this workflow.

Attention

Please note this workflow is a demonstration of how to use the qiime2 with NeatSeq-Flow. Please make sure you go over the steps and parameters to make sure it suits your needs!

Steps

  1. MergeReads: Get the data and files from the sample file.

  2. FastQC_Merge, TrimGalore, FastQC_TrimGal, and MultiQC_TrimGal: QC on the reads: FastQC and Trim Galore!. Depending on the quality of the reads, TrimGalore` might not be required.

  3. Get_Project_Files: Download and import the classifier file into the workflow.

  4. import: import sequence data into a QIIME2 artifact

  5. sequence_qual: Create quality report for the sequences.

  6. dada2: dada2 and visualization.

    1. dada2_vis_summary:

    2. dada2_vis_tabulate:

  7. remove_metadata and filter_feature_table: Filter low-expression features from the feature table (have to first remove metadata slot, to avoid filtering by metadata)

    1. filtered_vis_summary and filtered_vis_tabulate: Visualization of the filtered table.

    Tip

    From this step onwards, if you want to use the filtered feature table, base your steps on filter_feature_table.

  8. phylogeny: Building a phylogenetic tree

  9. diversity: Core diversity analysis

  10. alpha_rarefaction: Creating α-rarefaction curves.

  11. alpha_group_signif: alpha groups differences based on Faith’s diversity index.

  12. Classification:

    1. Using the classifier downloaded in step Get_Project_Files.

      Tip

      You can add steps in the workflow to train your own classifier, but that is beyond the scope of this workflow.

    2. classify: Classify the reads.

    3. classify_tabulate and classify_plot: Visualization of the classification.

  13. gneiss_*: Steps for executing the gneiss analysis

  14. ANCOM*: Steps for executing the ANCOM analysis

Workflow Schema

QIIME2 Salmon workflow scheme

Requires

  1. Raw reads for the analysis can be downloaded as follows (Note: The downloaded directory is 5.5 GB!):

    wget https://datadryad.org/stash/downloads/file_stream/67648
    tar zxv 16S_reads_salmo_salar_V3_V4_gut_microbiota.tar.gz
    
  2. Get the salmon sample file with:

    curl -LO https://raw.githubusercontent.com/bioinfo-core-BGU/neatseq-flow-modules/master/docs/source/_extra/QIIME2/qiime2.samples.salmon.nsfs
    
  3. Get the qiime2-formatted metadata file with:

    curl -LO https://raw.githubusercontent.com/bioinfo-core-BGU/neatseq-flow-modules/master/docs/source/_extra/QIIME2/qiime2.metadata.salmon.tsv
    
  4. Modify the paths in the sample file to the correct full paths.

Tip

If the raw read directory (16S_reads_salmo_salar_V3_V4_gut_microbiota), the metadata file and the sample file are in the same path, you can set the paths with the following sed commands:

sed -i s+/path/to/+$PWD/16S_reads_salmo_salar_V3_V4_gut_microbiota/+ qiime2.samples.salmon.nsfs
sed -i s+qiime2.metadata.salmon.tsv+$PWD/qiime2.metadata.salmon.tsv+ qiime2.samples.salmon.nsfs

Programs required

  • QIIME2, version 2019.4, installed with conda as described here.

  • fastqc, multiqc, TrimGalore! and cutadapt which are not included in the qiime2 environment. All of these can be installed in a separate conda environment with:

    curl -LO https://raw.githubusercontent.com/bioinfo-core-BGU/neatseq-flow-modules/master/docs/source/_extra/QC_conda.yaml
    conda env create -f QC_conda.yaml
    

    You can also download the file from here

Download

The workflow file is available for download with the following command:

curl -LO https://raw.githubusercontent.com/bioinfo-core-BGU/neatseq-flow-modules/master/Workflows/qiime2.analysis.salmon.yaml

Attention

The following instructions assume the NeatSeq-Flow, qiime2 and QC environments were installed with the same conda version!

After downloading the parameter file, set the conda env in the Vars section to the name of the qiime environment you installed above, typically something like qiime2-2018.11.

Execute NeatSeq-Flow

Execute NeatSeq-Flow with the sample and parameters files downloaded above:

source activate NeatSeq_Flow
export CONDA_BASE=$(conda info --root)
neatseq_flow.py -s qiime2.samples.salmon.nsfs -p qiime2.analysis.salmon.yaml
1

https://onlinelibrary.wiley.com/doi/full/10.1111/eva.12658

The Moving Pictures tutorial

A workflow for executing the Moving Windows tutorial with QIIME2.

Steps:

  1. Merge_data: Get the data and files from the sample file.

  2. Get_sequences: Download the sequences from the internet

  3. import: import sequence data into a QIIME2 artifact

  4. demux: Demultiplex.

  5. demux_summary: Show statistics of demultiplexed data

  6. dada2: dada2 and visualization.

    1. dada2_vis_summary:

    2. dada2_vis_tabulate:

  7. phylogeny: Building a phylogenetic tree

  8. diversity: Core diversity analysis

  9. diversity_evenness: Calculating Pielou’s evenness index.

  10. Comparing alpha and beta groups differences.

    1. alpha_group_signif_faith: alpha groups differences based on Faith’s diversity index.

    2. alpha_group_signif_pielou: alpha groups differences based on Pielou’s evenness index.

    3. beta_group_signif_BodySite: beta groups differences based on site in body.

    4. beta_group_signif_Subject: beta groups differences based on subject.

  11. Creating emperor visualizations.

    1. emperor_unifrac: Emperor visualization based on UniFrac index.

    2. beta_braycurtis, pcoa_braycurtis and emperor_braycurtis: Emperor visualization based on Bray-Curtis index.

  12. alpha_rarefaction: Creating α-rarefaction curves.

  13. Taxonomy:

    1. classify: taxonomic classification

    2. classify_barplot: taxonomy visualization with barplots.

Workflow Schema

QIIME2 moving pictures workflow scheme

Requires

No requirements. All files are downloaded by the workflow.

Programs required

Attention

Download the parameter file in the link below and set the conda path in line 10 to the location of your conda installation, not including bin. e.g., if using the default location of miniconda, the path should be $HOME/miniconda2.

Download

The workflow and sample files are available for download with the following commands:

curl -LO https://raw.githubusercontent.com/bioinfo-core-BGU/neatseq-flow-modules/master/Workflows/qiime2_MovingPic_fullAuto.params.yaml
curl -LO https://raw.githubusercontent.com/bioinfo-core-BGU/neatseq-flow-modules/master/Workflows/qiime2_MovingPic_fullAuto.samples.nsfs