ERROR_CHROMOSOME_NOT_FOUND - pcingola/SnpEff GitHub Wiki

The error is due to a difference between the chromosome names in input VCF file and the chromosome names in SnpEff's database.

Chromosome does not exist in the reference database. Typically this means that there is a mismatch between the chromosome names in your input file and the chromosome names used in the reference genome to build SnpEff's database.

This error could be caused because you are trying to annotate using a reference genome that is different than the one you used for sequence alignment. Obviously doing this makes no sense and the annotation information you'll get will be garbage. That's why SnpEff shows you an error message.

Solution

Sometimes SnpEff database matches the reference genome for your organism, and it's just that the chromosome names are changed. In this case, you can fix the error by changing the chromosome names in your input file.

You can see the chromosome names used by SnpEff's database by using -v (verbose) option. SnpEff will show a line like this one:

$ java -Xmx4g -jar snpEff.jar -v genomeName my.vcf > my.ann.vcf
...
... 
# Chromosomes names [sizes]  : '1' [249250621] '2' [243199373]
...
...

You can see the chromosome names in your input VCF file using a command like this one

cat input.vcf | grep -v "^#" | cut -f 1 | uniq

Once you know the names of the input file and the name used by SnpEff's database, you can adjust the chromosome name using a simple sed command. For example, if you input file's chromosome name is INPUT_CHR_NAME and the name in SnpEff's database is SNPEFF_CHR_NAME, you could use the following command:

cat input.vcf | sed "s/^INPUT_CHR_NAME/SNPEFF_CHR_NAME/" > input_updated_chr.vcf