Variant-related Tasks
Modules included in this section
freebayes
- Authors
Menachem Sklarz
- Affiliation
Bioinformatics core facility
- Organization
National Institute of Biotechnology in the Negev, Ben Gurion University.
A module for identifying variants by running freebayes:
Requires
BAM files in the the following slots:
sample_data[<sample>]["bam"]
Genome reference fasta files in the the following slot (the slot should be populated by the module that created the
bam
file):sample_data[<sample>]["reference"]
Note
Do not specify the reference (-f), since it is filled in automatically by neatseq-flow
Output
If
scope
is set tosample
:Puts output files in:
sample_data[<sample>]["vcf"]
(ifoutput_type
is set tovcf
)sample_data[<sample>]["gvcf"]
(ifoutput_type
is set togvcf
)
If
scope
is set toproject
:Puts output files in:
sample_data["vcf"]
(ifoutput_type
is set tovcf
)sample_data["gvcf"]
(ifoutput_type
is set togvcf
)
Parameters that can be set
Parameter |
Values |
Comments |
---|---|---|
output_type |
vcf|gvcf |
The type of output produced by freebayes. (Can be specified alternatively with appropriate redirects) |
Lines for parameter file
freebayes1:
module: freebayes
base: samtools1
script_path: /path/to/freebayes
scope: sample
output_type: vcf
redirects:
--strict-vcf:
References
Marth GT, Korf I, Yandell MD, Yeh RT, Gu Z, Zakeri H, Stitziel NO, Hillier L, Kwok PY, Gish WR: A general approach to single-nucleotide polymorphism discovery. Nat Genet. 1999, 23: 452-456. 10.1038/70570.
mpileup_varscan
- Authors
Menachem Sklarz
- Affiliation
Bioinformatics core facility
- Organization
National Institute of Biotechnology in the Negev, Ben Gurion University.
A module for identifying variance by running mpileup and piping it’s (large) output into varscan:
Requires
BAM files in the the following slots:
sample_data[<sample>]["bam"]
Genome reference fasta files in the the following slot (the slot should be populated by the module that created the
bam
file):sample_data[<sample>]["reference"]
Output
If
scope
is set tosample
:Puts output files in:
sample_data[<sample>]["vcf"]
sample_data[<sample>]["variants"]
(if--output-vcf
is not redirected inredirects
)
If
scope
is set toproject
:Puts output files in:
sample_data["vcf"]
sample_data["variants"]
(if--output-vcf
is not redirected inredirects
)
Parameters that can be set
Parameter |
Values |
Comments |
---|---|---|
mpileup_path |
path |
The full path to the mpileup program. You can append additional mpileup arguments after the path (see example lines) |
script_path |
path |
The full path to the relevant varscan program (see example lines). |
Comments
Lines for parameter file
mpileup_varscan1:
module: mpileup_varscan
base: samtools1
script_path: /path/to/java -jar /path/to/VarScan.v2.3.9.jar mpileup2snp
mpileup_path: /path/to/samtools mpileup --max-depth 6000
scope: sample
redirects:
--min-coverage: 4
--output-vcf:
--variants: 1
References
Li, H., Handsaker, B., Wysoker, A., Fennell, T., Ruan, J., Homer, N., Marth, G., Abecasis, G. and Durbin, R., 2009. The sequence alignment/map format and SAMtools. Bioinformatics, 25(16), pp.2078-2079.
Koboldt, D.C., Chen, K., Wylie, T., Larson, D.E., McLellan, M.D., Mardis, E.R., Weinstock, G.M., Wilson, R.K. and Ding, L., 2009. VarScan: variant detection in massively parallel sequencing of individual and pooled samples. Bioinformatics, 25(17), pp.2283-2285.
vcftools
*
- Authors
Menachem Sklarz
- Affiliation
Bioinformatics core facility
- Organization
National Institute of Biotechnology in the Negev, Ben Gurion University.
A module for running vcftools:
Can take a VCF, gunzipped VCF or BCF file as input.
Produces an output file, as specified by the output options arguments.
Requires
Input files in one of the following slots (for project scope):
sample_data["VCF" | "gzVCF" | "BCF"]
Input files in one of the following slots (for sample scope):
sample_data[<sample>]["variants"]["VCF" | "gzVCF" | "BCF"]
Output
- Puts output files in the following slots (for project scope):
self.sample_data["project_data"][<output type>]
- Puts output files in the following slots (for sample scope):
self.sample_data[<sample>]["variants"][<output type>]
Note
Output type is set by redirecting the required type, i.e. any number of the following list of types.
For extracting several INFO fields, set --get-INFO
to a list of INFO elements to extract (instead of passing --get-INFO
several times). See examples below.
If several output types are passed, each type will be created in parallel with a different vcftools script.
See the vcftools manual for details (https://vcftools.github.io/man_latest.html).
"--freq"
, "--freq2"
, "--counts"
, "--counts2"
, "--depth"
, "--site-depth"
, "--site-mean-depth"
, "--geno-depth"
, "--hap-r2"
, "--geno-r2"
, "--geno-chisq"
, "--hap-r2-positions"
, "--geno-r2-positions"
, "--interchrom-hap-r2"
, "--interchrom-geno-r2"
, "--TsTv"
, "--TsTv-summary"
, "--TsTv-by-count"
, "--TsTv-by-qual"
, "--FILTER-summary"
, "--site-pi"
, "--window-pi"
, "--weir-fst-pop"
, "--het"
, "--hardy"
, "--TajimaD"
, "--indv-freq-burden"
, "--LROH"
, "--relatedness"
, "--relatedness2"
, "--site-quality"
, "--missing-indv"
, "--missing-site"
, "--SNPdensity"
, "--kept-sites"
, "--removed-sites"
, "--singletons"
, "--hist-indel-len"
, "--hapcount"
, "--mendel"
, "--extract-FORMAT-info"
, "--get-INFO"
, "--recode"
, "--recode-bcf"
, "--12"
, "--IMPUTE"
, "--ldhat"
, "--ldhat-geno"
, "--BEAGLE-GL"
, "--BEAGLE-PL"
, "--plink"
, "--plink-tped"
.
Warning
At the moment, you can’t pass more than one extract-FORMAT-info
option at once. For more than one extract-FORMAT-info
, create more than one instance of vcftools
.
Parameters that can be set
Parameter |
Values |
Comments |
---|---|---|
scope |
project | sample |
Indicates whether to use a project or sample bowtie1 index. |
input |
vcf | bcf | gzvcf |
Type of input to use. Default: vcf |
Lines for parameter file
vcftools1:
module: vcftools
base: freebayes1
script_path: /path/to/vcftools
scope: project
input: vcf
redirects:
--recode:
--extract-FORMAT-info: GT
--get-INFO:
- NS
- DB
References
Danecek, P., Auton, A., Abecasis, G., Albers, C.A., Banks, E., DePristo, M.A., Handsaker, R.E., Lunter, G., Marth, G.T., Sherry, S.T. and McVean, G., 2011. The variant call format and VCFtools. Bioinformatics, 27(15), pp.2156-2158.
Snippy
- Authors
Liron Levin
- Affiliation
Bioinformatics core facility
- Organization
National Institute of Biotechnology in the Negev, Ben Gurion University.
Note
This module was developed as part of a study led by Dr. Jacob Moran Gilad
Short Description
A module for running Snippy on fastq files
Requires
- fastq files in at least one of the following slots:
self.sample_data[<sample>]["fastq.F"]
self.sample_data[<sample>]["fastq.R"]
self.sample_data[<sample>]["fastq.S"]
Output
- puts Results directory location in:
self.sample_data[<sample>]["Snippy"]
- puts for each sample the vcf file location in:
self.sample_data[<sample>]["vcf"]
- if snippy_core is set to run:
- puts the core Multi-FASTA alignment location in:
self.sample_data["project_data"]["fasta.nucl"]
- puts core vcf file location of all analyzed samples in the following slot:
self.sample_data["project_data"]["vcf"]
- if Gubbins is set to run:
- puts result Tree file location of all analyzed samples in:
self.sample_data["project_data"]["newick"]
- update the core Multi-FASTA alignment in:
self.sample_data["project_data"]["fasta.nucl"]
- update the core vcf file in the slot:
self.sample_data["project_data"]["vcf"]
- if pars is set to run, puts phyloviz ready to use files in:
- Alleles:
self.sample_data["project_data"]["phyloviz_Alleles"]
- MetaData:
self.sample_data["project_data"]["phyloviz_MetaData"]
Parameters that can be set
Parameter |
Values |
Comments |
---|---|---|
Comments
- This module was tested on:
Snippy v3.2
gubbins v2.2.0
- For the pars analysis the following python packages are required:
pandas
Lines for parameter file
Step_Name: # Name of this step
module: Snippy # Name of the module used
base: # Name of the step [or list of names] to run after [must be after a merge step]
script_path: # Command for running the Snippy script
env: # env parameters that needs to be in the PATH for running this module
qsub_params:
-pe: # Number of CPUs to reserve for this analysis
gubbins:
script_path: # Command for running the gubbins script, if empty or this line dose not exist will not run gubbins
--STR: # More redirects arguments for running gubbins
phyloviz: # Generate phyloviz ready to use files
-M: # Location of a MetaData file
--Cut: # Use only Samples found in the metadata file
--S_MetaData: # The name of the samples ID column
-C: # Use only Samples that has at least this fraction of identified alleles
snippy_core:
script_path: # Command for running the snippy-core script, if empty or this line dose not exist will not run snippy-core
--noref: # Exclude reference
redirects:
--cpus: # Parameters for running Snippy
--force: # Force overwrite of existing output folder (default OFF)
--mapqual: # Minimum mapping quality to allow
--mincov: # Minimum coverage of variant site
--minfrac: # Minumum proportion for variant evidence
--reference: # Reference Genome location
--cleanup # Remove all non-SNP files: BAMs, indices etc (default OFF)
References
- Snippy:
- gubbins:
Croucher N. J., Page A. J., Connor T. R., Delaney A. J., Keane J. A., Bentley S. D., Parkhill J., Harris S.R. “Rapid phylogenetic analysis of large samples of recombinant bacterial whole genome sequences using Gubbins”. doi:10.1093/nar/gku1196, Nucleic Acids Research, 2014
Comments