Annotating with ANNOVAR - asoltis/MutEnricher GitHub Wiki
Annotating mutations with ANNOVAR
Last updated: October 29, 2019
Introduction
When running MutEnricher's coding
module, somatic VCFs must be annotated with some form of gene and non-silent term annotations (in the VCF INFO field). Several bioinformatic tools are available for this purpose; this page describes how this can be done with the popular ANNOVAR tool.
Annotation procedure
perl
is available on your sytem
1a. Assure 1b. Download ANNOVAR and databases
Download and install ANNOVAR from the ANNOVAR download and instructions page. Follow the instructions to obtain the gene annotation database of interest.
2. Prepare somatic VCF(s) for annotation
ANNOVAR can run on VCF files directly with the -vcfinput
flag; however, we have encountered errors when attempting to run on VCFs without the genotype (i.e. GT) field set (which may not be set in somatic VCFs depending on the program). A workaround for this is to modify somatic VCFs to include only the first 8 columns, e.g.:
# With bcftools
bcftools view sample.vcf.gz | cut -f1-8 | bcftools view -Oz > sample.cut.vcf.gz
bcftools index -t sample.cut.vcf.gz
# with bgzip/tabix
zcat sample.vcf.gz | cut -f1-8 | bgzip > sample.cut.vcf.gz
tabix -p vcf sample.cut.vcf.gz
3. Run ANNOVAR
Run ANNOVAR's table_annovar.pl
script on the native somatic VCF (or cut version if necessary from (2)) with the desired gene annotation database:
# hg19, refGene models
perl /path/to/annovar/table_annovar.pl
/path/to/sample.vcf.gz
/path/to/annovar/humandb
-buildver hg19
-out /path/to/output/directory/sample.annovar
-vcfinput
-remove
-protocol refGene
-operation g
-nastring .
# hg38, refGene models
perl /path/to/annovar/table_annovar.pl
/path/to/sample.vcf.gz
/path/to/annovar/humandb
-buildver hg38
-out /path/to/output/directory/sample.annovar
-vcfinput
-remove
-protocol refGene
-operation g
-nastring .
# Other annotations follow the same general format #
The above command will produce several output files, with the output of interest being the annotated VCF file (e.g. sample.annovar.hg19_multianno.vcf
or sample.annovar.hg38_multianno.vcf
, depending on the genome build).
4. Compress and index VCF file(s)
MutEnricher requires sorted, bgzipped, and tabix-indexed VCF files. Run one of the following commands on the ANNOVAR output VCF(s):
# with bcftools (hg19 annotated output example)
bcftools view sample.annovar.hg19_multianno.vcf -Oz > sample.annovar.hg19_multianno.vcf.gz
bcftools index -t sample.annovar.hg19_multianno.vcf.gz
# with bgzip/tabix directly (hg19 example)
bgzip sample.annovar.hg19_multianno.vcf > sample.annovar.hg19_multianno.vcf.gz
tabix -p vcf sample.annovar.hg19_multianno.vcf.gz
5. Remove temporary files (optional)
After the annotated bgzipped VCF files and their indexes are generated (i.e. .vcf.gz and .vcf.gz.tbi files), the additional ANNOVAR files and other temporary files can be removed.