Entry 16: Assignment 3 Workflow - bcb420-2025/Izumi_Ando GitHub Wiki

Overview

  • Estimate of total amount of time this assignment is going to take⏰:

Day 1 - March 31st, 2025

Estimate time to complete the following tasks: 3 hours - ended up working 3.5 hours not finishing the tasks

Tasks

  • create ranked list βœ…
  • run and analyze gsea βœ…
  • get first cytoscape viewing - not done

Workflow

docker run -e PASSWORD=changeit  -v "$(pwd)":/home/rstudio/projects -p 8787:8787 risserlin/bcb420-base-image:winter2025-arm64
  • Got feedback from TA to reduce the coverage of the previous assignment and hide some of the code. Found out that you can hide the contents and outputs of a code block with include = FALSE in the code chunk header.
  • Initially, I couldn't get GSEA to run, it would keep on erroring. At first I thought it might be an issue with the gmt file because that is what I struggled with in the GSEA assignment but after inspecting my ranked list, I realized that the gene names were omitted. I fixed my R code to change this. (see below)
# before: did not include gene names
ranked_list <- 
  -log10(qlf_output_hits_table$FDR) * sign(qlf_output_hits_table$logFC)
names(ranked_list) <- rownames(ranked_list)
ranked_list <- sort(ranked_list, decreasing = TRUE)

# after: includes gene names
ranked_list <- 
  -log10(qlf_output_hits_table$FDR) * sign(qlf_output_hits_table$logFC)
names(ranked_list) <- rownames(qlf_output_hits_table)
ranked_list <- sort(ranked_list, decreasing = TRUE)

EnrichmentMap in Cytoscape

Writing out the steps I took because they were not obvious - I had to rewatch all the lectures as my notes were not step by step.

  • Installed Cytoscape v3.10.3 for MacOS, EnrichmentMap v3.5.0 from the website.
  • Opened the Cytoscape app, opened EnrichmentMap from the "Apps" section in the menu at the top.
  • Opened the data input panel.
  • Input and parameters as shown in the screenshot of the EnrichmentMap input page below. image
  • Selecting cluster cmd + click + drag
  • In the "Style" panel, I set the mapping variable to "NES", and it was very apparent that the majority of the gene sets were down enriched (there was only one visible node that had a strong, positive NES value). A screenshot of this initial network is below. image
  • The single positively enriched node was the "HALLMARK_TNFA_SIGNALING_VIANFKB"

Day 2 - April 1st, 2025

Estimate time to complete the following tasks: 8 hours - worked 2 hours, could not continue due to sickness

Tasks

  • get first cytoscape viewing βœ…
  • create themes (groups) βœ…
  • analyze one pathway (probably the one in the largest grouping) - not done
  • write report - not done

Workflow - continued from yesterday

  • Installed AutoAnnotate app v1.5.2
  • Did the annotations with the default parameters but the labels overlapped so I ran it a second time, looks good now
  • Installed yFiles Layout Algorithms app v1.1.5

Day 3/4 - April 2nd-3rd, 2025

Mostly sick, slow progress.

Tasks

  • analyze one pathway (probably the one in the largest grouping) βœ…

Workflow

  • noticed smaller clusters are from specific data bases (ex GO or REACTOME)

  • the largest theme DNA strand repair has multiple, might be interesting to look at hub pathways (nodes connecting bigger clusters)

  • tried REACTOME but Wiki pathways was easier to use as it was integrated in cytoscape

  • WikiPathways is app but is accessed by "importing data" (below) image

  • selected "DNA REPAIR PATHWAYS FULL NETWORK" as gene set

Day 5/6 - April 4-5th, 2025

Recovering from bad health. Expected time : 4 hours - Actual time: 8 hours

Tasks

  • do full analysis βœ…
  • write up rest of the report, figure legends etc βœ…

Notes

  • the individual nodes left out of clusters/themes were simply those that did not share genes with other genesets
  • had to reread the original paper & do literature review to analyze (this took most of my time) - notes in Notion