qual_classifier - spiralgenetics/biograph GitHub Wiki
The biograph qual_classifier command assigns a genotype and quality score to variants and filters on a threshold.
See Customizing the BioGraph Pipeline for an overview of how and when to use this command.
Essential Options
--vcf: the input VCF.--model: the classifier model file. This is provided by Spiral Genetics and should match your version of BioGraph (for example,biograph_model-7.0.0.ml).--grm: the dataframe output from the truvari anno grm command. This is only required when running the quality score classifier.--out: the output VCF. If unspecified, the VCF will be written to STDOUT.
Filter Thresholds
--filter: Calls with a quality score lower than this will be removed from the output VCF.--lowqual_sv: Structural variants with a quality score lower than this will be included but markedlowqin the filter field.--lowqual_ao: SNPs and indels with a quality score lower than this will be included but markedlowqin the filter field. The ao is short for all others (non-SVs).--thresh_gt: Cutoff threshold for GT (default: 0.5)
Other Options
--sample: When running on a multi-sample VCF, set--sampleto choose the sample of interest.--clsf: The genotype and quality classifiers are both run by default. You can run just the GT classifier with--clsf 1, or just the quality classifier with--clsf 2.--df: A dataframe generated from the input VCF withbgvar2table.py. If not specified, a dataframe will automatically be created.--threads: Use the specified number of threads. By default, one thread is allocated per available processor.
Getting Help
To see a list of all biograph qual_classifier options, use the --help switch:
$ biograph qual_classifier --help
usage: qual_classifier [-h] -v VCF -d DATAFRAME -m MODEL [-o OUT] [-x GRM]
[-f FILTER] [-s LOWQUAL_SV] [-a LOWQUAL_AO]
[--sample SAMPLE] [--tmp TMP] [-t THREADS]
[-g THRESH_GT] [-c {GT,Qual,Both}]
Classify VCF variants
optional arguments:
-h, --help show this help message and exit
-v VCF, --vcf VCF VCF to parse
-d DATAFRAME, --dataframe DATAFRAME
Coverage DataFrame frame
-m MODEL, --model MODEL
Model to apply to data
-o OUT, --out OUT VCF to output
-x GRM, --grm GRM DataFrame conaining grm features from truvari
-f FILTER, --filter FILTER
Maximum threshold of calls to filter (0.1)
-s LOWQUAL_SV, --lowqual_sv LOWQUAL_SV
Maximum threshold for calls to mark as lowqual_sv
(0.352)
-a LOWQUAL_AO, --lowqual_ao LOWQUAL_AO
Maximum threshold for calls to mark as lowqual_ao
(0.22)
--sample SAMPLE Sample identifier (only required for multi-sample
VCFs)
--tmp TMP Temporary directory (/tmp)
-t THREADS, --threads THREADS
Number of threads to use (48)
-g THRESH_GT, --thresh_gt THRESH_GT
threshold for GT
-c {GT,Qual,Both}, --clsf {GT,Qual,Both}
Flag for which classifiers to run (Both)