Lab 06: DESeq2 - ryandkuster/EPP_575_RNA_25 GitHub Wiki
Before you begin, you should already have the R packages installed, but if not please follow the install guide for the relevant packages here.
The R script (salmon_deseq.R) for this lab can be found at this link, and you might already have it downloaded from our previous exercises.
If you don't already have the github data downloaded:
git clone https://github.com/ryandkuster/EPP_575_RNA.git
You'll need your count file from the featureCounts step. Use Open OnDemand or scp to copy it to your local device:
/lustre/isaac24/proj/UTK0386/analysis/<your_folder>/05_count/combined.counts.txt
or
/lustre/isaac24/proj/UTK0386/completed/05_counts/combined.counts.txt
scp <your_netid>@dtn2.isaac.utk.edu:/lustre/isaac24/proj/UTK0386/analysis/<your_folder>/05_count/combined.counts.txt .
if you don't have the file, you can grab it from the completed directory:
scp <your_netid>@dtn2.isaac.utk.edu:/lustre/isaac24/proj/UTK0386/completed/05_count/combined.counts.txt .
Wherever you end up copying the counts file, you'll need to modify the path to this folder in the R script in the following line to reflect where on your computer the files are located:
setwd("~/Downloads/05_counts/")
The R script DESeq_STAR.R which can be found in EPP_575_RNA/data/R_materials/DESeq_STAR.R or you can just copy the text from the text below and open a new Rscript in RStudio with this information.
library(tidyverse)
library(tximport)
library(GenomicFeatures)
library(pheatmap)
library(DESeq2)
BiocManager::install("DEGreport")
library(DEGreport)
# create a workding directory
setwd("/Users/ryankuster/Documents/github/EPP_575_RNA_25/data/salmon_output")
# give the full path to the "salmon_results" (it should end in "salmon_results"
salmon_dir <- "/Users/ryankuster/Documents/github/EPP_575_RNA_25/data/salmon_output"
# load the gff3 file, then create a transcript database/dataframe for use with deseq
txdb <- makeTxDbFromGFF("/Users/ryankuster/Downloads/EPP_575_RNA_25/IGV/genomic_modified.gff")
keytypes(txdb)
k <- keys(txdb, keytype = "CDSNAME")
str(k)
txdf = AnnotationDbi::select(txdb, k, "GENEID", "CDSNAME")
samples <- read_csv("/Users/ryankuster/Downloads/EPP_575_RNA_25/DESEQ/salmon_output/salmon_data.csv")
Qfiles <- file.path(salmon_dir, samples$quant_file)
# this step imports the count data from salmon
txi <- tximport(files = Qfiles, type = "salmon", txOut = TRUE)
head(txi$counts)
colnames(txi$counts) <- samples$sample_id
# convert fields to factors
samples$treatment = factor(samples$treatment)
samples$time = factor(samples$time)
samples$replicate = factor(samples$replicate)
# now we convert the txi object into a deseq-formatted object
dds <- DESeqDataSetFromTximport(txi = txi, colData = samples, design = ~ treatment + time + treatment:time)
dds <- DESeq(dds, test="LRT", reduced = ~ treatment + time)
dds <- dds[which(mcols(dds)$fullBetaConv),]
res_LRT <- results(dds)
# plot dispersion
plotDispEsts(dds)
vsd <- vst(dds)
plotPCA(vsd, intgroup = c("time"))
plotPCA(vsd, intgroup = c("replicate"))
plotPCA(vsd, intgroup = c("treatment"))
################################################################################
# summarize results
res <- results(dds)
head(res)
summary(res)
# create a contrast with lfcThreshold and alpha cutoff (first list item is condition from samples object)
# here are contrasts we can do:
res_treatment <- results(dds, alpha = 0.05, contrast = c("treatment", "Col", "minusEGTA"))
res_time <- results(dds, alpha = 0.05, contrast = c("time", "3h", "0h"))
plotMA(res_treatment, ylim=c(-12,12))
plotMA(res_time, ylim=c(-12,12))
res_sig <- as.data.frame(res_treatment[ which(res_treatment$padj < 0.05),])
res_sig <- res_sig[order(res_sig$padj, -abs(res_sig$log2FoldChange)),]
write.csv(res_sig, file="salmon_featurecounts_sig_results.csv")Check out more information for any interesting gene patterns you find on the TAIR database.