Flags and Settings

This page describes the flags that can be applied to substitution data and the settings available to modify them.

Flags
Flag Settings

Flags

This section explains the contents of the flag.to.vcf.convert.ini file, with the addition of a more detailed explanation of why we use the flag.

id the id of the flag. This value is in the FILTER field should the position fail the flag
description may contain variable names. These correspond to variables in the flag.to.vcf.convert.ini file and are the variables that can be adjusted with sequence-type or species.
info=[1|0] info=1 defines that this is an INFO field flag ('soft flag') rather than a true filter (FILTER field).
If info=1 more fields are permitted:
type [Flag|Float etc] The type of INFO entry this corresponds to. See the VCF spec
val [1|0] If 1 this INFO field entry has a corresponding value
intersect 1 if this flag requires checking against a reference file
optname The short name of the commandline option containing the file to intersect with
filename The filename name passed to the commandline option containing the file to intersect with

depthFlag

This flag ensures we have reasonable depth of real alleles in the tumour sample. Using base quality as a filter for realness.

info=0
id=DTH
description=Less than depthCutoffProportion mutant alleles were >= minDepthQual base quality

readPositionFlag

This flag tries to account for the drop in accuracy towards the end of each read.

info=0
id=RP
description=Coverage was less than minRdPosDepth and no mutant alleles were found in the first 2/3 of a read (shifted readPosBeginningOfReadIgnoreProportion from the start and extended readPosTwoThirdsOfReadExtendProportion more than 2/3 of the read length)

matchedNormalFlag

Looks for reads in the normal sample containing the high base quality mutant alleles.

info=0
id=MN
description=More than maxMatchedNormalAlleleProportion of mutant alleles that were >= minNormMutAllelequal base quality found in the matched normal

pentamericMotifFlag

Presence of the motif GGC[AT]G in sequenced orientation causes a drop in mean base quality and largely increased base miscalling - this leads to increased false positives if not filtered.

info=0
id=PT
description=Mutant alleles all on one direction of read (1rd allowed on opposite strand) and in second half of the read. Second half of read contains the motif GGC[AT]G in sequenced orientation and the mean base quality of all bases after the motif was less than pentamerMinPassAvgQual

avgMapQualFlag

Ensures good evidence is present in the tumour for a mutation by checking the mean mapping quality of mutant allele containing reads.

info=0
id=MQ
description=Mean mapping quality of the mutant allele reads was < minPassAvgMapQual

germlineIndelFlag

Germline indels can appear to be substitutions as part of the mapping process. This flag excludes germline indel positions.

info=0
id=GI
description=Position falls within a germline indel using the supplied bed file
intersect=1
optname=g

tumIndelDepthFlag

A high proportion of indels in reads mapped that also cover the mutant position can lead to an increased likelihood of false positive.

info=0
id=TI
description=More than maxTumIndelProportion percent of reads covering this position contained an indel according to mapping

sameReadPosFlag

Useful only in paired end, non PCR based sequencing types. Checks for piling up of mutant alleles at the same read position.

info=0
id=SRP
description=More than samePosMaxPercent percent of reads contain the mutant allele at the same read position

simpleRepeatFlag

Simple repeats can cause sequencing to encounter something similar to slippage in capillary sequencing. Flagging simple repeats can remove false positives.

info=1
val=0
id=SR
type=Flag
description=Position falls within a simple repeat using the supplied bed file
intersect=1
optname=b
filename=simple_repeats.bed.gz

centromericRepeatFlag

Centromeric repeats may lead to mismapped regions. Excluding these reduces false positives.

info=0
id=CR
description=Position falls within a centromeric repeat using the supplied bed file
optname=b
filename=centromeric_repeats.bed.gz
intersect=1

snpFlag

A soft flag (in the INFO field), represents a position known to contain a SNP via intersecting with the provided bgzipped, tabix indexed bed file of SNPs.

info=1
val=0
id=SNP
type=Flag
description=Position matches a dbSNP entry using the supplied bed file
intersect=1
optname=b
filename=snps.bed.gz

phasingFlag

Phasing is a sequencing artefact where 'bleed through' of the adjacent bases causes a position to appear as a SNP. This is most commonly seen where the middle base of a triplet appears to be the same as neighbouring bases (ACA -> AAA).

info=0
id=PH
description=Mutant reads were on one strand (permitted proportion on other strand: maxPhasingMinorityStrandReadProportion), and mean mutant base quality was less than minPassPhaseQual

annotationFlag

Useful in sequencing types targeting genes and introns etc data (eg WXS) where positions are expected to be annotatable. This flag uses a bgzipped, tabix indexed bed file of annotatable positions to exclude those positions that can't be annotated. NB This is NOT the same as the coding flag

info=0
id=AN
description=Position could not be annotated against a transcript using the supplied bed file
intersect=1
optname=ab
filename=gene_regions.bed.gz

hiSeqDepthFlag

Filters regions of high sequencing depth, usually caused by mismapped in the alignment process, therefore highly likely to be false positives.

info=0
id=HSD
description=Position falls within a high sequencing depth region using the supplied bed file
intersect=1
optname=b
filename=hi_seq_depth.bed.gz

codingFlag

Useful in sequencing types targeting coding regions etc (eg AMPLICON) where positions are expected to be coding. This soft flag (INFO field) uses a bgzipped, tabix indexed bed file of coding positions to exclude those positions that aren't annotated as coding.

info=1
val=0
id=CA
description=Position could not be annotated to a coding region of a transcript using the supplied bed file
intersect=1
type=Flag
optname=ab
filename=codingexon_regions.sub.bed.gz

lowMutBurdenFlag

Excludes mutations where the variant allele fraction in the tumour sample is less than 10%. Useful where specificity is required. It is inadvisable to apply this flag to data where subclonal mutations are being investigated.

info=0
id=LMB
description=Proportion of mutant alleles was < 10 pct

unmatchedNormalVcfFlag

Excludes mutations where the change from reference to mutant allele matches an entry in the provided unmatched normal panel. Called VCF for historical reasons. A bed file is more efficient.

info=0
id=VUM
description=Position has >= vcfUnmatchedMinMutAlleleCvg mutant allele present in at least vcfUnmatchedMinSamplePct percent unmatched normal samples in the unmatched VCF.
optname=umv

singleEndFlag

Remove mutations with coverage on both strands but evidence only on one strand.

info=0
id=SE
description=Coverage is >= minSingleEndCoverage on each strand but mutant allele is only present on one strand

matchedNormalProportion

Filter positions called where normal variant allele fraction (VAF) and tumour sample VAF are not sufficiently different.

info=0
id=MNP
description=Tumour sample mutant allele proportion - normal sample mutant allele proportion < matchedNormalMaxMutProportion

alignmentScoreReadLengthAdjustedFlag

A soft flag (INFO field), this time with an associated value. Useful in BWA-mem mapped data, the alignment score includes the number of clipped bases.

Excessive clipping can lead to false positives and poorly mapped reads. By providing the median alignment score of reads presenting the variant allele we provide a filtering opportunity on the side of the user. This value is adjusted for the length of the reads, whereas alnScoreMedianFlag is not. This allows for a standard cutoff in all data rather than on a read length basis.

info=1
val=1
id=ASRD
type=Float
description=A soft flag median (read length adjusted) alignment score of reads showing the variant allele

clippingMedianFlag

A soft flag (INFO field), this time with an associated value. Useful in BWA-mem mapped data, another flag taking the number of clipped bases in variant supporting reads into account.

Excessive clipping can lead to false positives and poorly mapped reads. By providing the median count clipped bases in reads presenting the variant allele we provide a filtering opportunity on the side of the user.

info=1
val=1
type=Float
id=CLPM
description=A soft flag median number of soft clipped bases in variant supporting reads

alnScoreMedianFlag

A soft flag (INFO field), this time with an associated value. Useful in BWA-mem mapped data, the alignment score includes the number of clipped bases.

info=1
val=1
type=Float
id=ASMD
description=A soft flag median alignment score of reads showing the variant allele

cavemanMatchNormalProportionFlag

This new flag was developed alongside the DERMATLAS project. It is not applied by default and should be used with caution. Like the MatchedNormalProportionFlag, however instead of using bam file reads, metrics in this flag are obtained from the CaVEMan VCF per sample depth outputs.

info=0
id=CMNP
description=Tumour sample mutant allele proportion - normal sample mutant allele proportion < maxCavemanMatchedNormalProportion (differs from MNP in using CaVEMan only seen reads as per VCF)

withinGapRangeFlag

This new flag was developed alongside the DERMATLAS project. It is not applied by default and should be used with caution. Caveman sometimes calls false positive variants in misaligned reads (mostly near the beginning/end of a read), near in/dels. Some of our samples have a higher than average number of indels (and many are in repeats, due to microsatellite instability) so this is not rare. We also cannot use the 'SR' flag as we do see real mutations in simple repeats. (A large number of our samples will have a high mutation rate due to various DNA repair deficiencies and UV light exposure).

info=0
id=GAP
description=If variant is within withinXBpOfDeletion of an indel in reads without the variant and the indel is present in at least minGapPresentInReads percent of total reads and no variant reads have the indel.

mnvFlag

Developed alongside the DERMATLAS project. If included, only MNVs will be flagged using the mnv flag. After all flags under MNVFLAGLIST are applied to each SNV base within the MNV, mnvFlag will fail if ALL bases in the MNV fail all applied flags. Should one pass then the MNV itself will PASS. All failed flags are also stored under an INFO tag.

info=0
id=MNV
description=If any base in an MNV passes all other SNV flags, pass this variant. If all fail fail and merge failure list as well as adding INFO fields holding failed flags per base.

Flag Settings

All the highlighted settings highlighted in the flags section above are available for modification. We provide an example file for human in our distribution (flag.to.vcf.convert.ini). The file has several sections that are required on a per species, sequence type basis. The naming pattern of the sections is as follows:

<SPECIES>_<SEQ_TYPE> <SECTION_NAME>, so for the flaglist section of a human genomic data section would be HUMAN_WGS FLAGLIST

Flags and Settings - cancerit/cgpCaVEManPostProcessing GitHub Wiki

Flags and Settings

Flags

depthFlag

readPositionFlag

matchedNormalFlag

pentamericMotifFlag

avgMapQualFlag

germlineIndelFlag

tumIndelDepthFlag

sameReadPosFlag

simpleRepeatFlag

centromericRepeatFlag

snpFlag

phasingFlag

annotationFlag

hiSeqDepthFlag

codingFlag

lowMutBurdenFlag

unmatchedNormalVcfFlag

singleEndFlag

matchedNormalProportion

alignmentScoreReadLengthAdjustedFlag

clippingMedianFlag

alnScoreMedianFlag

cavemanMatchNormalProportionFlag

withinGapRangeFlag

mnvFlag

Flag Settings

PARAMS

FLAGLIST

MNVFLAGLIST

BEDFILES

⚠️ GitHub.com Fallback ⚠️

Flags and Settings - cancerit/cgpCaVEManPostProcessing GitHub Wiki

Flags and Settings

Flags

depthFlag

readPositionFlag

matchedNormalFlag

pentamericMotifFlag

avgMapQualFlag

germlineIndelFlag

tumIndelDepthFlag

sameReadPosFlag

simpleRepeatFlag

centromericRepeatFlag

snpFlag

phasingFlag

annotationFlag

hiSeqDepthFlag

codingFlag

lowMutBurdenFlag

unmatchedNormalVcfFlag

singleEndFlag

matchedNormalProportion

alignmentScoreReadLengthAdjustedFlag

clippingMedianFlag

alnScoreMedianFlag

cavemanMatchNormalProportionFlag

withinGapRangeFlag

mnvFlag

Flag Settings

PARAMS

FLAGLIST

MNVFLAGLIST

BEDFILES

⚠️ **GitHub.com Fallback** ⚠️

⚠️ GitHub.com Fallback ⚠️