7. Assignment 3: Data set pathway and network analysis - bcb420-2022/Yuzi_Li GitHub Wiki

Objective

Summarize work done in assignment 1 and 2. Then perform non-thresholded gene set enrichment analysis to find the major pathways that are enriched or suppressed in my data set. Visualize the results and examine the dark matters.

Duration

Expected duration: 5h Actual duration: 10h

Progress

Tasks

  1. Perform non-thresholded gene set enrichment analysis on list of genes expressed in CPT1A knockdown and overexpression experiments.
  2. Visualize gene set enrichment analysis in Cytoscape.
  3. Interpret the results and write an R notebook.

Non-thresholded gene set enrichment analysis

  • Made list of genes and used GSEA to do analysis
qlf_hits_kd[,"rank"] <- -log(qlf_hits_kd$PValue, base=10) * sign(qlf_hits_kd$logFC)
qlf_hits_oe[,"rank"] <- -log(qlf_hits_oe$PValue, base=10) * sign(qlf_hits_oe$logFC)

write.table(x=data.frame(genename=row.names(qlf_hits_kd),F_stat=qlf_hits_kd$rank), 
            file='cpt1a_kd.rnk',sep = '\t', 
            row.names = FALSE,col.names = FALSE,quote = FALSE)
write.table(x=data.frame(genename=row.names(qlf_hits_oe),F_stat=qlf_hits_oe$rank), 
            file='cpt1a_oe.rnk',sep = '\t', 
            row.names = FALSE,col.names = FALSE,quote = FALSE)
gsea_jar <- '/home/rstudio/GSEA_4.2.3/gsea-cli.sh'
java_version <- '11'
working_dir <- getwd()
analysis_name_oe <- 'cpt1a_oe'
analysis_name_kd <- 'cpt1a_kd'
rnk_file_oe <- "cpt1a_oe.rnk"
rnk_file_kd <- "cpt1a_kd.rnk"
dest_gmt_file <- 'Human_GOBP_AllPathways_no_GO_iea_March_01_2021_symbol.gmt'

command <- paste("", gsea_jar,  "GSEAPreRanked -gmx", dest_gmt_file, "-rnk", 
                 file.path(working_dir, rnk_file_oe), "-collapse false -nperm 1000 -scoring_scheme weighted -rpt_label ",
                 analysis_name_oe,"  -plot_top_x 20 -rnd_seed 12345  -set_max 200 -set_min 15 -zip_report false -out" ,
                 working_dir, " > gsea_output.txt",sep=" ")
system(command)
command <- paste("", gsea_jar,  "GSEAPreRanked -gmx", dest_gmt_file, "-rnk", 
                 file.path(working_dir, rnk_file_kd), "-collapse false -nperm 1000 -scoring_scheme weighted -rpt_label ",
                 analysis_name_kd,"  -plot_top_x 20 -rnd_seed 12345  -set_max 200 -set_min 15 -zip_report false -out" ,
                 working_dir, " > gsea_output.txt",sep=" ")
system(command)
  • In the process of checking my GSEA results, I discovered that the html files containing detailed descriptions on enriched genesets from Reactome cannot be opened in Docker even though the html file is present in the rstudio directory inside the GSEA result folders. I solved this issue by exporting the html file from Docker and opening it outside of Docker. This issue is not found with gene set descriptions from other pathway databases.

Cytoscape visualization of gene set enrichments

  • Downloaded Cytoscape 3.9.1
  • Installed EnrichmentMap and AutoAnnotate
  • Learned to use EnrichmentMap to represent gene set enrichments in networks
  • Used AutoAnnotate to find clusters in the network and generated cluster annotations as well as summarized the theme network
  • Downloaded the current release (April 01 2022) of Human Approved Drugs gmt file from Bader Lab
  • Used the approved drugs database to do post analysis

Result interpretations

  • See R Notebook: Assignment 3 html notebook
  • Added a link to this journal in the notebook
  • Compared results from thresholded vs. non-thresholded analyses
  • Corroborated results with literature findings

Conclusions and Outlook

  • Both thresholded and non-thresholded gene set enrichment analyses are useful. Sometimes they can complement the results from another analysis