3. Extract SNPs and Indels - DianaCarolinaVergara/SNPs_pipeline GitHub Wiki

Extract SNPs

Variants contained in VCF data may include single nucleotide polymorphisms (SNPs) as well as indels or more complicated features. Some analyses (as our analysis) may require that only SNPs are used (e.g., when a mutation model is used).

In these cases, it may be useful to subset the data to only the SNPs. The function extract.indels() may be used for this. This allows the rapid creation of vcfR object that should only contain SNPs.

The function extract.indels is used to remove indels from SNPs. When the parameter return_indels is FALSE only SNPs will be returned. When the parameter return_indels is TRUE only indels will be returned.

Muricea.vcf_ID_SNPs <- extract.indels(Muricea.vcf_ID,return.indels = FALSE)
Muricea.vcf_ID_SNPs  
Muricea.vcf_ID_Indels <- extract.indels(Muricea.vcf_ID,return.indels = TRUE)
Muricea.vcf_ID_Indels

Results

***** Object of Class vcfR *****
108 samples
762 CHROMs
10,966 variants
Object size: 74.1 Mb
0 percent missing data
*****        *****         *****
***** Object of Class vcfR *****
108 samples
301 CHROMs
616 variants
Object size: 32.4 Mb
0 percent missing data
*****        *****         *****

Example:

Contrasting the return.indels = FALSE and return.indels = TRUE

vcf_ID_SNPs <- extract.indels(vcf_ID,return.indels = FALSE)
vcf_ID_SNPs 
head(vcf_ID_SNPs)
tail(vcf_ID_SNPs)
class(vcf_ID_SNPs)

vcf_ID_SNPs <- extract.indels(vcf_ID,return.indels = TRUE)
vcf_ID_SNPs 
head(vcf_ID_SNPs)
tail(vcf_ID_SNPs)
class(vcf_ID_SNPs)