Skip to main content

GATK4

The genome analysis toolkit is a suit of tools handling a variety of tasks within the genome analysis pipeline. This suit consists of more than 100 tools and as such complete documentation is beyond the scope of this document. Each tool consists of different input requirements and the user is urged to see the full documentation.

The areas of analysis covered by this tool are:

Area of AnalysisDescription
BasecallingTools that process sequencing machine data e.g Illumina base calls, and detect sequencing level attributes e.g. adapters
Copy Number Variant DiscoveryTools that analyze read coverage to detect copy number variants
Coverage analysisTools that count coverage e.g. depth per allele
Diagnostics and QCTools that collect sequencing quality related and comparative metrics
Genotyping arrays manipulationTools that manipulate data generated by genotyping arrays
Intervals manipulationTools that process genomic intervals in various formats
Metagenomics and pathogen detectionTools that perform metagenomic analysis e.g. microbial community composition
Methylation-specific toolsTools that perform methylation calling, processing bisulfite sequences, methylation-aware aligned BAM
OtherMiscellaneous tools such as those that aid in data streaming
Read data manipulationTools that manipulate read data in SAM, BAM, or CRAM formats
ReferenceTools that analyze and manipulate fasta format references
Short variant discoveryTools that perform variant calling and genotyping for short variants such as SNPs, SNVs, and Indels
Structural variant discoveryTools that detect structural variants
Variant evaluation and refinementTools that evaluate and refine variant calls such as with annotations not offered by the engine
Variant filteringTools that filter variants by annotating the Filter column
Variant manipulationTools that manipulate variant call format data

Parallel Capabilities: Varies by tool. See documentation.