6. Minor Allele Frequency (MAF) - DianaCarolinaVergara/SNPs_pipeline GitHub Wiki

Minor Allele Frequency Filter (MAF)

Minor allele frequency to filter out phylogenetically uninformative sites. Biallelic SNPs.

With RStudio

Muricea_MAF_vcf <-maf(Muricea_DP_90Miss_filtered.vcf)
Muricea_MAF_vcf[Muricea_MAF_vcf[,4]< 0.01]<- NA
Muricea_MAF_vcf_NA <- is.na(Muricea_MAF_vcf[,4])
Muricea_MAF_vcf_NA_loci<- which(Muricea_MAF_vcf_NA, arr.ind = TRUE, useNames = TRUE)

## Removing
Muricea_toRemoveMAF<- c(Muricea_MAF_vcf_NA_loci)
length(Muricea_toRemoveMAF)
Muricea_filtered_no_clone_DP_Miss90_MAF_vcf_last <- Muricea_DP_90Miss_filtered.vcf[-Muricea_toRemoveMAF]

And write new VCF

#Final SNPs

write.vcf(Muricea_filtered_no_clone_DP_Miss90_MAF_vcf_last, "Muricea_DP_90Miss_filtered.vcf")
Muricea_filtered_no_clone_DP_Miss90_MAF_vcf_last

Output

[1] 263
***** Object of Class vcfR *****
98 samples
759 CHROMs
10,673 variants
Object size: 65.2 Mb
20.59 percent missing data
*****        *****         *****

With VCFTools

This step is not necessary if you can run the previous steps in RStudio.

This is just a demonstration of how you can do this with VCFTools

Minor allele frequency to filter out phylogenetically uninformative sites. Biallelic SNPs.

Here you can see in the same space the code and the output/results:

vcftools --vcf Muricea_DP_90Miss_filtered.vcf --maf 0.01  --min-alleles 2 --max-alleles 2 --recode --recode-INFO-all --out Muricea_DP_90Miss_MAF01_Bialle_filtered 

VCFtools - 0.1.17
(C) Adam Auton and Anthony Marcketta 2009

Parameters as interpreted:
	--vcf Muricea_DP_70Miss_filtered_copy.vcf
	--recode-INFO-all
	--maf 0.01
	--max-alleles 2
	--min-alleles 2
	--out Muricea_DP_70Miss_MAF01_Bialle_filtered
	--recode

Warning: Expected at least 2 parts in INFO entry: ID=DP4,Number=4,Type=Integer,Description="Ref+, Ref-, Alt+, Alt-">
Warning: Expected at least 2 parts in INFO entry: ID=DP4,Number=4,Type=Integer,Description="Ref+, Ref-, Alt+, Alt-">
Warning: Expected at least 2 parts in INFO entry: ID=DP4,Number=4,Type=Integer,Description="Ref+, Ref-, Alt+, Alt-">
After filtering, kept 81 out of 81 Individuals
Outputting VCF file...
After filtering, kept 32375 out of a possible 33174 Sites
Run Time = 8.00 seconds