Assignment 3 Journal - bcb420-2023/Angela_Uzelac GitHub Wiki
Assignment 3 - Data set Pathway and Network Analysis
Objective
- document progress while working on Assignment 3
Duration
Day 1: 2 hrs
Day 2: 5 hrs
Day 3: 4 hrs
Day 4: 4 hrs
Day 5: 10 hrs
Procedure
Preparation
- re-ran my Assignment 2 code and wrote the qlf_output_hits_withgn table and normalized_count_data table to a file as tsv
Non-thresholded gene set enrichment analysis
Retrieving Gene sets
- referred to my GSEA journal for procedure on how to perform pre-ranked analysis
- ran the following code to get the gmt file
- this gets genesets from baderlab geneset collection from current release containing GO biological process, all pathways, no IEA
# setwd("C:/Users/angel/BCB420")
install.packages("RCurl")
library("RCurl")
gmt_url = "http://download.baderlab.org/EM_Genesets/current_release/Human/symbol/"
# list all the files on the server
filenames = getURL(gmt_url)
tc = textConnection(filenames)
contents = readLines(tc)
close(tc)
# get the gmt that has all the pathways and does not include terms inferred
# from electronic annotations(IEA) start with gmt file that has pathways only
rx = gregexpr("(?<=<a href=\")(.*.GOBP_AllPathways_no_GO_iea.*.)(.gmt)(?=\">)", contents, perl = TRUE)
gmt_file = unlist(regmatches(contents, rx))
dest_gmt_file <- file.path(getwd(), gmt_file)
download.file(paste(gmt_url, gmt_file, sep = ""), destfile = dest_gmt_file)
- this is saved in my home machine in the path C:\Users\angel\BCB420\Human_GOBP_AllPathways_no_GO_iea_April_02_2023_symbol.gmt, and equivalently in the projects folder on the docker container
Creating ranked gene list file
- read file qlf_output_hits_withgn.tsv into variable
- calculated rank for each gene then ordered the genes by rank
- wrote ranked gene list to a file ranked_genelist.rnk
- note: can be tab-delimited but the file type must be rank so end in .rnk
- htmltools::includeHTML() for displaying the summary of results from GSEA
Run GSEA
- input the gmt file and the rank file into GSEA
- set all parameters then ran GSEA
- opened the results file in HTML
- compared results between upregulated in na_pos and upregulated in na_neg
- double check: what is na_pos and what is na_neg??
- embedded summary of results into report
- went into detailed results for both phenotypes and compared that to results from G:profiler - pretty different results
Visualize in cytoscape
-
followed instructions in the enrichment map pipeline resource
-
apps > app manager > install enrichment map pipeline. then click apps > enrichment map
-
loaded the entire folder with GSEA results
-
this automatically input the required files into required fields
-
report for na_pos into Enrichments Pos field, same for na_neg
-
GMT file is the ORIGINAL gmt file, not the filtered
-
analysis type: GSEA
-
input ranks file: ranked_gene_list_na_pos_versus_na_neg...
-
q value cutoff 0.05, check field filter genes by expressions, then click build
- this is the standard threshold
-
to check number of nodes and edges: see node table and edge table in cytoscape at the bottom, then click export, export to csv, then open the csv in excel and check the number of rows ( but -1 because first row is heading)
Auto Annotate
- Apps > Auto Annotate > New Annotation Set > Create Annotations
- to change the names of the themes: Apps > Word Cloud > Show Word Cloud
- increase the normalization factor to get rid of words like pathway or regulation
- manually exclude words
- manually change the theme names
Publication Ready Figure
- could follow protocol in EnrichmentMap Pipeline at the end of the Navigating and interpreting the enrichment map slide
- in AutoAnnotate tab just clicked "Publication Ready"
- downloaded svg of legend example then edited in powerpoint then added it manually to the publication-ready figure
Collapse into themes
- Edit > Preferences > Group preferences and select “Enable attribute aggregation"
- in menu of auto annotate: collapse all
- view > show tool panel > in scale slider slide left to make nodes clustered together
- manually moved them around to put similar themes together
Results
GSEA
- Some of the top terms for genes that are upregulated in Schizophrenia were "TYROBP CAUSAL NETWORK IN MICROGLIA", "RHO GTPASES ACTIVATE WASPS AND WAVES", and "REGULATION OF PHAGOCYTOSIS".
- The top terms for genes that are downregulated in disease were "COLLAGEN CHAIN TRIMERIZATION", "ASSEMBLY OF COLLAGEN FIBRILS AND OTHER MULTIMERIC STRUCTURES", and "SYNAPTIC_VESICLE_TRAFFICKING"
- not really similar to g:profiler results, but also not a straightforward comparison
Visualizing GSEA results
-
only about 27 nodes, is this enough? heatmap not really showing up, don't have columns of the samples, what is wrong?
-
only 1 red (upregulated), rest are blue
-
changing the annotations on wordcloud is not working
-
followed protocol and my map does not look like the one in the slides
-
heatmap not showing up
- remember to remove quotes in the txt file because sometimes doesn't recognize things
-
April 17: fixed all problems above.
-
Number of nodes in the enrichment map: 67
-
Number of edges in the enrichment map: 89
-
approx half blue half red. not too interconnected and not too many nodes
Pathway Analysis
- apps > install WikiPathways
- import > import from database > choose wiki pathways > type TYROBP causal network in microglia > choose homo sapiens > import as pathway
- import > import table from file > qlf output hits file that has log fc and p value
- remember to remove quotes in the txt file because sometimes doesn't recognize things
Conclusion and Outlook
- results are relatively the same as in Assignment 2
- results are the same as in original paper: also talked about synaptic vesicle trafficking and dopamine synthesis regulation
- dysregulation of these two has been shown to lead to psychotic symptoms