Variant analysis using GATK
Bioinformatics Core Facility
National Institute of Biotechnology in the Negev, Ben Gurion University.
A workflow for somatic short variant discovery (SNVs + Indels), based on GATK Best Practices. adapted for the analysis of rare diseases.
This workflow lacks the “base recalibration process (BQSR)” step.
In this workflow we use the option “hard filtering”. The module for variant recalibration (VQSR) exists, and GATK recommends to use it with at least 30 exome samples.
Because some GATK modules work at the sample-chromosome level, the number of jobs is 24 times larger than the number of samples. Please make sure that the platform on which you work can handle such an amount of jobs, in terms of memory management, CPU management, and the number of jobs that can wait in the queue.
It is recommended to carefully follow the log file and ensure that all jobs have completed successfully.
- Read preparation:
trimmomatic - For cleaning the reads
FastQC - Checking the quality of the reads
- Somatic short variant discovery analysis and annotation
GATK pre-processing (from fastq to ready-to-use BAM : generate uBAM, MarkIlluminaAdapters, uBAM to fastq, BWA MEM, Merge BAM and UBAM, Mark Duplicates) - per sample.
Picard_CollectAlignmentSummaryMatrics - statistical information about the mapping generated by CollectAlignmentSummaryMetrics from Picard tools.
GATK_gvcf (HaplotypeCaller, from BAM to g.vcf ) - per sample per chromosome.
GATK_merge_gvcf - CombineGVCFs combine g.vcf files to cohorts.
GenotypeGVCFs - Perform joint genotyping on gVCF files produced by HaplotypeCaller, generate multi VCF file - per-cohort per-chromosome.
GATK_hard_filters - Filter the multi VCF file - per cohort per chromosome.
VEP - annotate the multi VCF file (Variant Effect Predictor. )
Picard_CollectVariantCalling - statistical information about variants generated by CollectVariantCallingMetrics from Picard tools
GATK_SelectVariants - Separate multi VCF per-chromosome to one VCF per-samples per-chromosomes.
GATK_CatVariants - Concatenate chromosome to get one VCF file for each sample.
fastq files. Paired end or single-end.
Title Metagenomics #SampleID Type Path lane Sample1 Forward /path/to/Sample1_F1.fastq.gz Sample1 Reverse /path/to/Sample1_R1.fastq.gz Sample2 Forward /path/to/Sample2_F1.fastq.gz Sample2 Reverse /path/to/Sample2_R1.fastq.gz
The workflow file is available