Lab 06: DESeq2 - ryandkuster/EPP_575_RNA_25 GitHub Wiki

Setup

Before you begin, you should already have the R packages installed, but if not please follow the install guide for the relevant packages here.

The R script (salmon_deseq.R) for this lab can be found at this link, and you might already have it downloaded from our previous exercises.

If you don't already have the github data downloaded:

git clone https://github.com/ryandkuster/EPP_575_RNA.git

You'll need your count file from the featureCounts step. Use Open OnDemand or scp to copy it to your local device:

/lustre/isaac24/proj/UTK0386/analysis/<your_folder>/05_count/combined.counts.txt

or

/lustre/isaac24/proj/UTK0386/completed/05_counts/combined.counts.txt

scp <your_netid>@dtn2.isaac.utk.edu:/lustre/isaac24/proj/UTK0386/analysis/<your_folder>/05_count/combined.counts.txt .

if you don't have the file, you can grab it from the completed directory:

scp <your_netid>@dtn2.isaac.utk.edu:/lustre/isaac24/proj/UTK0386/completed/05_count/combined.counts.txt .

Wherever you end up copying the counts file, you'll need to modify the path to this folder in the R script in the following line to reflect where on your computer the files are located:

setwd("~/Downloads/05_counts/")

The R script DESeq_STAR.R which can be found in EPP_575_RNA/data/R_materials/DESeq_STAR.R or you can just copy the text from the text below and open a new Rscript in RStudio with this information.

DESeq_STAR.R file contents

library(tidyverse)
library(tximport)
library(GenomicFeatures)
library(pheatmap)
library(DESeq2)


BiocManager::install("DEGreport")
library(DEGreport)

# create a workding directory
setwd("/Users/ryankuster/Documents/github/EPP_575_RNA_25/data/salmon_output")

# give the full path to the "salmon_results" (it should end in "salmon_results"
salmon_dir <- "/Users/ryankuster/Documents/github/EPP_575_RNA_25/data/salmon_output"

# load the gff3 file, then create a transcript database/dataframe for use with deseq
txdb <- makeTxDbFromGFF("/Users/ryankuster/Downloads/EPP_575_RNA_25/IGV/genomic_modified.gff")
keytypes(txdb)
k <- keys(txdb, keytype = "CDSNAME")
str(k)

txdf = AnnotationDbi::select(txdb, k, "GENEID", "CDSNAME")

samples <- read_csv("/Users/ryankuster/Downloads/EPP_575_RNA_25/DESEQ/salmon_output/salmon_data.csv")
Qfiles <- file.path(salmon_dir, samples$quant_file)

# this step imports the count data from salmon
txi <- tximport(files = Qfiles, type = "salmon", txOut = TRUE)
head(txi$counts)
colnames(txi$counts) <- samples$sample_id

# convert fields to factors
samples$treatment = factor(samples$treatment)
samples$time = factor(samples$time)
samples$replicate = factor(samples$replicate)

# now we convert the txi object into a deseq-formatted object
dds <- DESeqDataSetFromTximport(txi = txi, colData = samples, design = ~ treatment + time + treatment:time)
dds <- DESeq(dds, test="LRT", reduced = ~ treatment + time)
dds <- dds[which(mcols(dds)$fullBetaConv),]
res_LRT <- results(dds)

# plot dispersion
plotDispEsts(dds)
vsd <- vst(dds)
plotPCA(vsd, intgroup = c("time"))
plotPCA(vsd, intgroup = c("replicate"))
plotPCA(vsd, intgroup = c("treatment"))

################################################################################
# summarize results
res <- results(dds)
head(res)
summary(res)

# create a contrast with lfcThreshold and alpha cutoff (first list item is condition from samples object)
# here are contrasts we can do:

res_treatment <- results(dds, alpha = 0.05, contrast = c("treatment", "Col", "minusEGTA"))
res_time <- results(dds, alpha = 0.05, contrast = c("time", "3h", "0h"))

plotMA(res_treatment, ylim=c(-12,12))
plotMA(res_time, ylim=c(-12,12))

res_sig <- as.data.frame(res_treatment[ which(res_treatment$padj < 0.05),])
res_sig <- res_sig[order(res_sig$padj, -abs(res_sig$log2FoldChange)),]

write.csv(res_sig, file="salmon_featurecounts_sig_results.csv")

Go futher!

Check out more information for any interesting gene patterns you find on the TAIR database.

https://www.arabidopsis.org/locus?name=AT4G25480

⚠️ **GitHub.com Fallback** ⚠️