VCF Format Deep Dive - iffatAGheyas/bioinformatics-tutorial-wiki GitHub Wiki
VCF Format Deep Dive
-
Mandatory columns
Each VCF record must include these 8 tab-delimited fields:- CHROM – Reference sequence name (e.g.
NC_000913.3
) - POS – 1-based position of the variant on
CHROM
- ID – Variant identifier (e.g. dbSNP rsID) or
.
if none - REF – Reference allele (one or more bases)
- ALT – Alternate allele(s), comma-separated if multiple
- QUAL – Phred-scaled quality score for the assertion
- FILTER – PASS or a semicolon-separated list of filters that failed
- INFO – Semicolon-separated additional annotations
- CHROM – Reference sequence name (e.g.
-
INFO and FORMAT subfields
- INFO is a list of key=value (or flag) pairs describing each variant.
- e.g.
DP=42;AF=0.25
means total depth 42 and alternate allele freq 25%.
- e.g.
- FORMAT defines per-sample subfields in the genotype columns.
- The first sample column header shows the FORMAT keys (e.g.
GT:DP:GQ
).
- The first sample column header shows the FORMAT keys (e.g.
- INFO is a list of key=value (or flag) pairs describing each variant.
-
Genotype encoding (FORMAT fields)
- GT – Genotype call;
0/0
= homozygous reference,0/1
= het,1/1
= hom alt - DP – Read depth at this site for that sample
- GQ – Genotype quality (Phred-scaled confidence in the genotype call)
- GT – Genotype call;
Example VCF record:
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT sample1
NC_000913.3 1234 . A G 60 PASS DP=42;AF=0.25 GT:DP:GQ 0/1:42:99