3. Extract SNPs and Indels - DianaCarolinaVergara/SNPs_pipeline GitHub Wiki
Extract SNPs
Variants contained in VCF data may include single nucleotide polymorphisms (SNPs) as well as indels or more complicated features. Some analyses (as our analysis) may require that only SNPs are used (e.g., when a mutation model is used).
In these cases, it may be useful to subset the data to only the SNPs. The function extract.indels()
may be used for this. This allows the rapid creation of vcfR object that should only contain SNPs.
The function extract.indels
is used to remove indels from SNPs. When the parameter return_indels is FALSE only SNPs will be returned. When the parameter return_indels
is TRUE
only indels will be returned.
Muricea.vcf_ID_SNPs <- extract.indels(Muricea.vcf_ID,return.indels = FALSE)
Muricea.vcf_ID_SNPs
Muricea.vcf_ID_Indels <- extract.indels(Muricea.vcf_ID,return.indels = TRUE)
Muricea.vcf_ID_Indels
Results
***** Object of Class vcfR *****
108 samples
762 CHROMs
10,966 variants
Object size: 74.1 Mb
0 percent missing data
***** ***** *****
***** Object of Class vcfR *****
108 samples
301 CHROMs
616 variants
Object size: 32.4 Mb
0 percent missing data
***** ***** *****
Example:
Contrasting the return.indels = FALSE
and return.indels = TRUE
vcf_ID_SNPs <- extract.indels(vcf_ID,return.indels = FALSE)
vcf_ID_SNPs
head(vcf_ID_SNPs)
tail(vcf_ID_SNPs)
class(vcf_ID_SNPs)
vcf_ID_SNPs <- extract.indels(vcf_ID,return.indels = TRUE)
vcf_ID_SNPs
head(vcf_ID_SNPs)
tail(vcf_ID_SNPs)
class(vcf_ID_SNPs)