2. Packages and Files Needed - DianaCarolinaVergara/SNPs_pipeline GitHub Wiki

Packages and Files Needed

Packages

We use the command install.packages for installing the needed packages in our directory.

install.packages("genepop")
install.packages("parallel")
install.packages("poppr")
install.packages("treemap")
install.packages('devtools')
install.packages("vcfR")
install.packages("ape")
install.packages("pegas")
install.packages("adegenet")
install.packages("pcadapt")
install.packages("hierfstat")
install.packages("genepopedit")
install.packages("ggplot2")
install.packages("seqinr")
install.packages("phytools")
install.packages("phylotools")
install.packages("grDevices")
install.packages("colorspace")
install.packages("colorRamps")

And we call it with the command library

The function library() loads libraries

library(vcfR)
library(ggplot2)
library(reshape2)

install_github("whitlock/OutFLANK")
biocLite("qvalue")

library("genepopedit")
library("devtools")
library("pcadapt")
library("qvalue")
library("OutFLANK")
library("ggplot2")
library(genepop)
library(vcfR)
library(ade4)
library(ape)
library(adegenet)
library(hierfstat)
library(poppr)
library(pegas)
library(poppr)
library(dplyr)
library(treemap)
library(magrittr)
library("SNPRelate")
library(RColorBrewer)
library("phangorn")
library("grDevices")
library("colorspace")
library(colorRamps)

Import the vcf file

Name.vcf <- read.vcfR("muriceacombinedref.vcf")
Name.vcf

And here and example of how R summary the vcf information:

Scanning file to determine attributes.
File attributes:
  meta lines: 301690
  header_line: 301691
  variant count: 11582
  column count: 117
Meta line 301690 read in.
All meta lines processed.
gt matrix initialized.
Character matrix gt created.
  Character matrix gt rows: 11582
  Character matrix gt cols: 117
  skip: 0
  nrows: 11582
  row_num: 0
Processed variant: 11582
All variants processed
***** Object of Class vcfR *****
108 samples
762 CHROMs
11,582 variants
Object size: 75.2 Mb
0 percent missing data
*****        *****         *****

Populate the ID column of VCF data by concatenating the chromosome, position and optionally an index.The ID field indicates the type of structural variant, and can be a colon-separated list of types and subtypes. ID values are case sensitive strings and may not contain whitespace or angle brackets. The first level type must be one of the following: (see VCFv4.2.pdf)

 Muricea.vcf_ID <- addID(Muricea.vcf,"_")
 Muricea.vcf_ID

Here we see the first few lines or rows of each slot.

tail Returns the first or last parts of a vector, matrix, table, data frame or function. Since head() and tail() are generic functions, they may also have been extended to other classes.

head(Muricea.vcf_ID)

tail(Muricea.vcf_ID)

head(Abip_vcf_novo_ID_SNPs)
tail(Abip_vcf_novo_ID_SNPs)

Example_ID <- addID(vcf_novo,"_")
head(Example_ID)
tail(Example_ID)
class(Example_ID)

Output:

***** Object of Class vcfR *****
108 samples
762 CHROMs
11,582 variants
Object size: 76.1 Mb
0 percent missing data
*****        *****         *****