Lab: agriGO analysis - statonlab/EPP_575_RNASeq_Workshop GitHub Wiki

GO Enrichment: agriGO analysis

##R script used in lab Day_6_GO_enrichment_2022_final.R

This lab will pick up immediately after Lab: Identify Differentially Expressed Genes, within the same Rstudio session and with all your variables and results from DESeq2 still loaded in your environment. The lab will end on agriGO which is where we will perform the GO analysis.

Before we get started, we need to ensure the annotation file Athaliana_447_Araport11.annotation_info.txt is copied into your current working directory. Similar to how you downloaded your HTSeq results, we will utilize scp:

scp <your_username>@sphinx.ag.utk.edu:/pickett_shared/teaching/EPP575_Jan2022/reference_genome/Athaliana_447_Araport11.annotation_info.txt .

Again, make sure the target directory is the same as the directory you performed your DESeq2 analysis in.

Install and load required packages

install.packages("tibble")
install.packages("dplyr")
library(tibble)
library(dplyr)

Read in and prepare annotation data

The annotation file contains multiple rows for many genes on locusName. We need to get rid of the extra rows before merging the annotation with our differential expression results. We subset the annotation table anno for those rows that do not contain duplicated locus names. Then we view the first lines of anno to check the file was read in correctly.

anno <- read.csv("Athaliana_447_Araport11.annotation_info.txt", header=TRUE, sep="\t", stringsAsFactors = F)
head(anno)

To keep the first row for each unique gene in locusName

anno <- anno[!duplicated(anno[,2]),]
head(anno)

Format MaxRes_sig data frame prior to merging tables

First, let's give the first column a header with the name "VALUE":

test_MaxRes_sig <- tibble::rownames_to_column(MaxRes_sig, "VALUE")

Next, let's remove the string ".Araport11.447" from entries in the "VALUE" column for GO formatting. The reason we are doing this is to ensure anno$locusName and test_MaxRes_sig$VALUE are using the same naming convention prior to merging these tables. :

test_MaxRes_sig <- test_MaxRes_sig %>% mutate_at("VALUE", str_replace, ".Araport11.447", "")

Merge tables

We are ready to merge the tables. We need to convert the DESeq object with our results to a data.frame object and select the column on each file that contains the name to use for merging. Also, we need to specify to print all the rows from our results whether they are present in the second file with all.x = TRUE, and not to print non-matched lines in the second file with all.y = FALSE.

myRes.anno <- merge(as.data.frame(test_MaxRes_sig), anno, by.x = "VALUE", by.y = "locusName", all.x = TRUE, all.y = FALSE)

Write results to text files to use for agriGO analysis:

To write annotated results to table:

write.table(myRes.anno, file="myRes_sig.txt", sep = "\t", row.names = F)

To print one gene and GO per line and write out the reference annotation file:

s <- strsplit(anno$GO, split = ",")
anno.go <- data.frame(locus = rep(anno$locusName, sapply(s, length)), GO = unlist(s))
write.table(anno.go, file="go_ref_anno.txt", sep = " ", quote = F , row.names = F, col.names = F)

To print one gene and GO each line and write out my GO results to file:

s <- strsplit(myRes.anno$GO, split = ",")
myRes.go <- data.frame(locus = rep(myRes.anno$VALUE, sapply(s, length)), GO = unlist(s))
write.table(myRes.go, file="go_myRes.txt", sep = " ", quote = F, row.names = F, col.names = F)

agriGO analysis

  1. Navigate to agriGO to perform the analysis.
  2. Under 1. Select analysis tool:, ensure Singular Enrichment Analysis (SEA) is selected.
  3. Under 2. Select the species:, select Customized annotation and paste the content of go_myRes.txt into the box provided.
  4. Under 3. Select reference:, select Customized annotated reference and upload the reference annotation file (go_ref_anno.txt).
  5. Finally, click submit and explore the results.

Assignment: How many significant GO terms are reported in the agriGO analysis? Send your response to [email protected] and [email protected].