Pisces VCF Specifications - Illumina/Pisces GitHub Wiki
The following specification document is valid for both somatic VCF and gVCF formatted files.
SDS ID |
Specification |
VCF-1 |
The application shall write a header section at the top of the VCF file with the following header lines. These lines shall have the format “##{key}={value}”. The keys and their descriptions are given below: |
Key |
Description |
fileformat |
Version of vcf format, which is “VCFv4.1”. |
fileDate |
Date in YYYYMMDD format. |
Source |
Application name and version, e.g. “CallSomaticVariants 1.0.0.0” |
CallSomaticVariants_cmdline |
Command line call for the program, including all arguments. |
Reference |
File name for reference genome fasta file. |
INFO |
Description of INFO fields used in the file. There is one INFO header line for each field in the file. |
FILTER |
Description of FILTER fields used in the file. There is one FILTER header line for each field in the file. |
FORMAT |
Description of FORMAT fields used in the file. There is one FORMAT header line for each field in the file. |
contig |
List of processed chromosomes and their lengths. There is a contig header line for each chromosome. Format is “##contig=<ID={chrName},length={length}” |
SDS ID |
Specification |
VCF-2 |
The application shall write the following INFO and FORMAT lines to the VCF header, if the associated configuration rule is satisfied. These lines shall have the format: “##{Key}=<ID={FieldName},Number={Number},Type={Type},Description={Description}”. |
Key |
Field name |
Number |
Type |
Description |
Configuration Rule |
INFO |
DP |
1 |
Integer |
Total Depth |
None |
FORMAT |
GT |
1 |
String |
Genotype |
None |
FORMAT |
GQ |
1 |
Integer |
Genotype Quality |
None |
FORMAT |
AD |
. |
Integer |
Allele Depth |
None |
FORMAT |
DP |
. |
Integer |
Total Depth Used For Variant Calling |
None |
FORMAT |
VF |
. |
Float |
Variant Frequency. One number if 0/0 or 0/1. Two numbers for 1/2 |
None |
FORMAT |
NL |
1 |
Integer |
Applied BaseCall Noise Level |
Debug mode enabled, or outputting bias files, or strand bias threshold < 1 |
FORMAT |
SB |
1 |
Float |
StrandBias Score |
Debug mode enabled, or outputting bias files, or strand bias threshold < 1 |
FORMAT |
NC |
1 |
Float |
Fraction of bases which were uncalled or with basecall quality below the minimum threshold |
Report no calls enabled |
SDS ID |
Specification |
VCF-3 |
The application write the following FILTER lines to the VCF header, if the associated configuration rule is satisfied. FILTER lines shall have format “##{Key}=<ID={FieldName}, Description={Description}”. |
Key |
FieldName |
Description |
Configuration Rule |
FILTER |
q{threshold}, e.g. “q20” |
Quality below {thresholdValue} |
Minimum variant score configured > 0. |
FILTER |
LowDP |
Low coverage (DP tag), therefore no genotype called |
Minimum coverage configured > 0. |
FILTER |
SB |
One of the following, depending on the rule: |
Strand bias threshold configured > 0. |
FILTER |
SB |
A)Variant strand bias too high |
Strand bias threshold configured > 0. |
FILTER |
SB |
B)Variant support on only one strand |
Filter variants on only one strand |
FILTER |
SB |
C)Variant strand bias too high or coverage on only one strand |
Three possible rules: |
SDS ID |
Specification |
VCF-4 |
The application shall write a data section to the VCF file as a tab-delimited table below the header section. |
VCF-5 |
The application shall write a column header line, as below, at the top of the data section of a VCF file. The column header line shall be prefixed by a single “#” and have the following format. The SampleName is set to the input BAM file name (without extension). |
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT {SampleName}
SDS ID |
Specification |
VCF-6 |
By default VCF mode, after the column header line, the data section of a VCF file shall have one line per variant allele. |
VCF-7 |
If gVCF mode is selected, after the column header line, the data section of a VCF file shall have one line per allele (reference or variant). |
VCF-8 |
If CrushVcf mode is selected, after the column header line, the data section of a VCF file shall have one line per genomic loci. |
VCF-9 |
For each data line item, the application shall write the following values to the data section of a VCF file, as below: |
Column Name |
Value |
CHROM |
Chromosome name |
POS |
Reference position |
ID |
Source ID for variant, always “.”. This columns is provided for downstream annotators to update as appropriate. |
REF |
Reference allele |
ALT |
Alternate (variant) allele |
QUAL |
Variant Quality Score |
FILTER |
“PASS” if no filters. Otherwise, comma-separated list of filter names, e.g. “LowDP,SB”. |
INFO |
Comma-separated list of INFO name and value pairs, in the format “{name}={value}”. Currently only supporting DP INFO field, e.g. “DP=500”. |
FORMAT |
Colon-separated list of field names, e.g. “GT:GQ:AD”. |
{SampleName} |
Colon-separated list of FORMAT field values. |
SDS ID |
Specification |
VCF-10 |
For each data line time, the application shall write the following FORMAT fields to the data section in a VCF file, if the associated configuration rule is satisfied. As below: |
FORMAT Field Name |
Field Value |
Configuration Rule |
GT |
Genotype |
None |
GQ |
Genotype Quality Score |
None |
AD |
If variant call, value is “{X},{Y}” where X is the reference depth and Y is the allele depth. If reference call, value is allele depth |
None |
DP |
Total coverage depth used in variant calling |
None |
VF |
Variant frequency |
None |
NL |
Estimated basecall quality |
Debug mode enabled, or outputting bias files, or strand bias threshold < 1 |
SB |
Strand bias score |
Debug mode enabled, or outputting bias files, or strand bias threshold < 1 |
NC |
No call frequency or fraction |
Report no calls enabled |