NeatSeq_Flow Module Repository Logo
1.5.0

Workflows

  • NeatSeq-Flow Tutorial Workflow
  • RNA-Seq analysis for non-model organisms
  • RNA-Seq using a reference genome
  • Transcriptome assembly and annotation with Trinity
  • Microbiome analysis using QIIME
  • Microbiome analysis using QIIME2
  • Variant analysis using GATK
  • Microbe-Flow: a comprehensive workflow for bacterial genomics, pathogenomics and genomic epidemiology
  • ChIP-seq workflow
  • Shotgun Metagenomics

Modules

  • NeatSeq-Flow modules

Tutorials

  • Using The Generic Module
NeatSeq_Flow Module Repository
  • Variant-related Tasks
  • View page source

Variant-related Tasks

Modules included in this section

  • freebayes

  • mpileup_varscan

  • vcftools *

  • Snippy

freebayes

Authors

Menachem Sklarz

Affiliation

Bioinformatics core facility

Organization

National Institute of Biotechnology in the Negev, Ben Gurion University.

A module for identifying variants by running freebayes:

Requires

  • BAM files in the the following slots:

    • sample_data[<sample>]["bam"]

  • Genome reference fasta files in the the following slot (the slot should be populated by the module that created the bam file):

    • sample_data[<sample>]["reference"]

Note

Do not specify the reference (-f), since it is filled in automatically by neatseq-flow

Output

  • If scope is set to sample:

    • Puts output files in:

      sample_data[<sample>]["vcf"] (if output_type is set to vcf) sample_data[<sample>]["gvcf"] (if output_type is set to gvcf)

  • If scope is set to project:

    • Puts output files in:

      sample_data["vcf"] (if output_type is set to vcf) sample_data["gvcf"] (if output_type is set to gvcf)

Parameters that can be set

Parameter

Values

Comments

output_type

vcf|gvcf

The type of output produced by freebayes. (Can be specified alternatively with appropriate redirects)

Comments

Lines for parameter file

freebayes1:
    module: freebayes
    base: samtools1
    script_path: /path/to/freebayes
    scope: sample
    output_type: vcf
    redirects: 
        --strict-vcf:

References

Marth GT, Korf I, Yandell MD, Yeh RT, Gu Z, Zakeri H, Stitziel NO, Hillier L, Kwok PY, Gish WR: A general approach to single-nucleotide polymorphism discovery. Nat Genet. 1999, 23: 452-456. 10.1038/70570.

mpileup_varscan

Authors

Menachem Sklarz

Affiliation

Bioinformatics core facility

Organization

National Institute of Biotechnology in the Negev, Ben Gurion University.

A module for identifying variance by running mpileup and piping it’s (large) output into varscan:

Requires

  • BAM files in the the following slots:

    • sample_data[<sample>]["bam"]

  • Genome reference fasta files in the the following slot (the slot should be populated by the module that created the bam file):

    • sample_data[<sample>]["reference"]

Output

  • If scope is set to sample:

    • Puts output files in:

      sample_data[<sample>]["vcf"] sample_data[<sample>]["variants"] (if --output-vcf is not redirected in redirects)

  • If scope is set to project:

    • Puts output files in:

      sample_data["vcf"] sample_data["variants"] (if --output-vcf is not redirected in redirects)

Parameters that can be set

Parameter

Values

Comments

mpileup_path

path

The full path to the mpileup program. You can append additional mpileup arguments after the path (see example lines)

script_path

path

The full path to the relevant varscan program (see example lines).

Comments

Lines for parameter file

mpileup_varscan1:
    module: mpileup_varscan
    base: samtools1
    script_path: /path/to/java -jar /path/to/VarScan.v2.3.9.jar mpileup2snp
    mpileup_path: /path/to/samtools mpileup --max-depth 6000
    scope: sample
    redirects:
        --min-coverage: 4
        --output-vcf:
        --variants: 1

References

  • Li, H., Handsaker, B., Wysoker, A., Fennell, T., Ruan, J., Homer, N., Marth, G., Abecasis, G. and Durbin, R., 2009. The sequence alignment/map format and SAMtools. Bioinformatics, 25(16), pp.2078-2079.

  • Koboldt, D.C., Chen, K., Wylie, T., Larson, D.E., McLellan, M.D., Mardis, E.R., Weinstock, G.M., Wilson, R.K. and Ding, L., 2009. VarScan: variant detection in massively parallel sequencing of individual and pooled samples. Bioinformatics, 25(17), pp.2283-2285.

vcftools *

Authors

Menachem Sklarz

Affiliation

Bioinformatics core facility

Organization

National Institute of Biotechnology in the Negev, Ben Gurion University.

A module for running vcftools:

Can take a VCF, gunzipped VCF or BCF file as input.

Produces an output file, as specified by the output options arguments.

Requires

  • Input files in one of the following slots (for project scope):

    • sample_data["VCF" | "gzVCF" | "BCF"]

  • Input files in one of the following slots (for sample scope):

    • sample_data[<sample>]["variants"]["VCF" | "gzVCF" | "BCF"]

Output

  • Puts output files in the following slots (for project scope):

    self.sample_data["project_data"][<output type>]

  • Puts output files in the following slots (for sample scope):

    self.sample_data[<sample>]["variants"][<output type>]

Note

Output type is set by redirecting the required type, i.e. any number of the following list of types.

For extracting several INFO fields, set --get-INFO to a list of INFO elements to extract (instead of passing --get-INFO several times). See examples below.

If several output types are passed, each type will be created in parallel with a different vcftools script.

See the vcftools manual for details (https://vcftools.github.io/man_latest.html).

"--freq", "--freq2", "--counts", "--counts2", "--depth", "--site-depth", "--site-mean-depth", "--geno-depth", "--hap-r2", "--geno-r2", "--geno-chisq", "--hap-r2-positions", "--geno-r2-positions", "--interchrom-hap-r2", "--interchrom-geno-r2", "--TsTv", "--TsTv-summary", "--TsTv-by-count", "--TsTv-by-qual", "--FILTER-summary", "--site-pi", "--window-pi", "--weir-fst-pop", "--het", "--hardy", "--TajimaD", "--indv-freq-burden", "--LROH", "--relatedness", "--relatedness2", "--site-quality", "--missing-indv", "--missing-site", "--SNPdensity", "--kept-sites", "--removed-sites", "--singletons", "--hist-indel-len", "--hapcount", "--mendel", "--extract-FORMAT-info", "--get-INFO", "--recode", "--recode-bcf", "--12", "--IMPUTE", "--ldhat", "--ldhat-geno", "--BEAGLE-GL", "--BEAGLE-PL", "--plink", "--plink-tped".

Warning

At the moment, you can’t pass more than one extract-FORMAT-info option at once. For more than one extract-FORMAT-info, create more than one instance of vcftools.

Parameters that can be set

Parameter

Values

Comments

scope

project | sample

Indicates whether to use a project or sample bowtie1 index.

input

vcf | bcf | gzvcf

Type of input to use. Default: vcf

Lines for parameter file

vcftools1:
    module: vcftools
    base: freebayes1
    script_path: /path/to/vcftools
    scope: project
    input: vcf
    redirects:
        --recode:
        --extract-FORMAT-info: GT
        --get-INFO:
            - NS
            - DB

References

Danecek, P., Auton, A., Abecasis, G., Albers, C.A., Banks, E., DePristo, M.A., Handsaker, R.E., Lunter, G., Marth, G.T., Sherry, S.T. and McVean, G., 2011. The variant call format and VCFtools. Bioinformatics, 27(15), pp.2156-2158.

Snippy

Authors

Liron Levin

Affiliation

Bioinformatics core facility

Organization

National Institute of Biotechnology in the Negev, Ben Gurion University.

Note

This module was developed as part of a study led by Dr. Jacob Moran Gilad

Short Description

A module for running Snippy on fastq files

Requires

  • fastq files in at least one of the following slots:

    self.sample_data[<sample>]["fastq.F"] self.sample_data[<sample>]["fastq.R"] self.sample_data[<sample>]["fastq.S"]

Output

  • puts Results directory location in:

    self.sample_data[<sample>]["Snippy"]

  • puts for each sample the vcf file location in:

    self.sample_data[<sample>]["vcf"]

if snippy_core is set to run:
  • puts the core Multi-FASTA alignment location in:

    self.sample_data["project_data"]["fasta.nucl"]

  • puts core vcf file location of all analyzed samples in the following slot:

    self.sample_data["project_data"]["vcf"]

if Gubbins is set to run:
  • puts result Tree file location of all analyzed samples in:

    self.sample_data["project_data"]["newick"]

  • update the core Multi-FASTA alignment in:

    self.sample_data["project_data"]["fasta.nucl"]

  • update the core vcf file in the slot:

    self.sample_data["project_data"]["vcf"]

if pars is set to run, puts phyloviz ready to use files in:
  • Alleles:

    self.sample_data["project_data"]["phyloviz_Alleles"]

  • MetaData:

    self.sample_data["project_data"]["phyloviz_MetaData"]

Parameters that can be set

Parameter

Values

Comments

Comments

  • This module was tested on:

    Snippy v3.2 gubbins v2.2.0

  • For the pars analysis the following python packages are required:

    pandas

Lines for parameter file

Step_Name:                                  # Name of this step
    module: Snippy                          # Name of the module used
    base:                                   # Name of the step [or list of names] to run after [must be after a merge step]
    script_path:                            # Command for running the Snippy script
    env:                                    # env parameters that needs to be in the PATH for running this module
    qsub_params:
        -pe:                                # Number of CPUs to reserve for this analysis
    gubbins:
        script_path:                        # Command for running the gubbins script, if empty or this line dose not exist will not run gubbins
        --STR:                              # More redirects arguments for running gubbins
    phyloviz:                                   # Generate phyloviz ready to use files
        -M:                                 # Location of a MetaData file 
        --Cut:                              # Use only Samples found in the metadata file
        --S_MetaData:                       # The name of the samples ID column
        -C:                                 # Use only Samples that has at least this fraction of identified alleles
    snippy_core:
        script_path:                        # Command for running the snippy-core script, if empty or this line dose not exist will not run snippy-core
        --noref:                            # Exclude reference 
    redirects:
        --cpus:                             # Parameters for running Snippy
        --force:                            # Force overwrite of existing output folder (default OFF)
        --mapqual:                          # Minimum mapping quality to allow
        --mincov:                           # Minimum coverage of variant site
        --minfrac:                          # Minumum proportion for variant evidence
        --reference:                        # Reference Genome location
        --cleanup                           # Remove all non-SNP files: BAMs, indices etc (default OFF)            

References

Snippy:

https://github.com/tseemann/snippy

gubbins:

Croucher N. J., Page A. J., Connor T. R., Delaney A. J., Keane J. A., Bentley S. D., Parkhill J., Harris S.R. “Rapid phylogenetic analysis of large samples of recombinant bacterial whole genome sequences using Gubbins”. doi:10.1093/nar/gku1196, Nucleic Acids Research, 2014


© Copyright 2017, Menachem Sklarz.

Built with Sphinx using a theme provided by Read the Docs.