Assignment 3 Journal - bcb420-2023/Angela_Uzelac GitHub Wiki

Assignment 3 - Data set Pathway and Network Analysis

Objective

Duration

Day 1: 2 hrs

Day 2: 5 hrs

Day 3: 4 hrs

Day 4: 4 hrs

Day 5: 10 hrs

Procedure

Preparation

  • re-ran my Assignment 2 code and wrote the qlf_output_hits_withgn table and normalized_count_data table to a file as tsv

Non-thresholded gene set enrichment analysis

Retrieving Gene sets

  • referred to my GSEA journal for procedure on how to perform pre-ranked analysis
  • ran the following code to get the gmt file
  • this gets genesets from baderlab geneset collection from current release containing GO biological process, all pathways, no IEA
# setwd("C:/Users/angel/BCB420")

install.packages("RCurl")
library("RCurl")

gmt_url = "http://download.baderlab.org/EM_Genesets/current_release/Human/symbol/"
# list all the files on the server
filenames = getURL(gmt_url)
tc = textConnection(filenames)
contents = readLines(tc)
close(tc)
# get the gmt that has all the pathways and does not include terms inferred
# from electronic annotations(IEA) start with gmt file that has pathways only
rx = gregexpr("(?<=<a href=\")(.*.GOBP_AllPathways_no_GO_iea.*.)(.gmt)(?=\">)", contents, perl = TRUE)
gmt_file = unlist(regmatches(contents, rx))
dest_gmt_file <- file.path(getwd(), gmt_file)
download.file(paste(gmt_url, gmt_file, sep = ""), destfile = dest_gmt_file)
  • this is saved in my home machine in the path C:\Users\angel\BCB420\Human_GOBP_AllPathways_no_GO_iea_April_02_2023_symbol.gmt, and equivalently in the projects folder on the docker container

Creating ranked gene list file

  • read file qlf_output_hits_withgn.tsv into variable
  • calculated rank for each gene then ordered the genes by rank
  • wrote ranked gene list to a file ranked_genelist.rnk
    • note: can be tab-delimited but the file type must be rank so end in .rnk
  • htmltools::includeHTML() for displaying the summary of results from GSEA

Run GSEA

  • input the gmt file and the rank file into GSEA
  • set all parameters then ran GSEA
  • opened the results file in HTML
  • compared results between upregulated in na_pos and upregulated in na_neg
    • double check: what is na_pos and what is na_neg??
  • embedded summary of results into report
  • went into detailed results for both phenotypes and compared that to results from G:profiler - pretty different results

Visualize in cytoscape

  • followed instructions in the enrichment map pipeline resource

  • apps > app manager > install enrichment map pipeline. then click apps > enrichment map

  • loaded the entire folder with GSEA results

  • this automatically input the required files into required fields

  • report for na_pos into Enrichments Pos field, same for na_neg

  • GMT file is the ORIGINAL gmt file, not the filtered

  • analysis type: GSEA

  • input ranks file: ranked_gene_list_na_pos_versus_na_neg...

  • q value cutoff 0.05, check field filter genes by expressions, then click build

    • this is the standard threshold
  • to check number of nodes and edges: see node table and edge table in cytoscape at the bottom, then click export, export to csv, then open the csv in excel and check the number of rows ( but -1 because first row is heading)

Auto Annotate

  • Apps > Auto Annotate > New Annotation Set > Create Annotations
  • to change the names of the themes: Apps > Word Cloud > Show Word Cloud
  • increase the normalization factor to get rid of words like pathway or regulation
  • manually exclude words
  • manually change the theme names

Publication Ready Figure

  • could follow protocol in EnrichmentMap Pipeline at the end of the Navigating and interpreting the enrichment map slide
  • in AutoAnnotate tab just clicked "Publication Ready"
  • downloaded svg of legend example then edited in powerpoint then added it manually to the publication-ready figure

Collapse into themes

  • Edit > Preferences > Group preferences and select “Enable attribute aggregation"
  • in menu of auto annotate: collapse all
  • view > show tool panel > in scale slider slide left to make nodes clustered together
  • manually moved them around to put similar themes together

Results

GSEA

  • Some of the top terms for genes that are upregulated in Schizophrenia were "TYROBP CAUSAL NETWORK IN MICROGLIA", "RHO GTPASES ACTIVATE WASPS AND WAVES", and "REGULATION OF PHAGOCYTOSIS".
  • The top terms for genes that are downregulated in disease were "COLLAGEN CHAIN TRIMERIZATION", "ASSEMBLY OF COLLAGEN FIBRILS AND OTHER MULTIMERIC STRUCTURES", and "SYNAPTIC_VESICLE_TRAFFICKING"
  • not really similar to g:profiler results, but also not a straightforward comparison

Visualizing GSEA results

  • only about 27 nodes, is this enough? heatmap not really showing up, don't have columns of the samples, what is wrong?

  • only 1 red (upregulated), rest are blue

  • changing the annotations on wordcloud is not working

  • followed protocol and my map does not look like the one in the slides

  • heatmap not showing up

    • remember to remove quotes in the txt file because sometimes doesn't recognize things
  • April 17: fixed all problems above.

  • Number of nodes in the enrichment map: 67

  • Number of edges in the enrichment map: 89

  • approx half blue half red. not too interconnected and not too many nodes

Pathway Analysis

  • apps > install WikiPathways
  • import > import from database > choose wiki pathways > type TYROBP causal network in microglia > choose homo sapiens > import as pathway
  • import > import table from file > qlf output hits file that has log fc and p value
    • remember to remove quotes in the txt file because sometimes doesn't recognize things

Conclusion and Outlook

  • results are relatively the same as in Assignment 2
  • results are the same as in original paper: also talked about synaptic vesicle trafficking and dopamine synthesis regulation
  • dysregulation of these two has been shown to lead to psychotic symptoms

References