Entry 16: Assignment 3 Workflow - bcb420-2025/Izumi_Ando GitHub Wiki
Overview
- Estimate of total amount of time this assignment is going to takeβ°:
Day 1 - March 31st, 2025
Estimate time to complete the following tasks: 3 hours - ended up working 3.5 hours not finishing the tasks
Tasks
- create ranked list β
- run and analyze gsea β
- get first cytoscape viewing - not done
Workflow
docker run -e PASSWORD=changeit -v "$(pwd)":/home/rstudio/projects -p 8787:8787 risserlin/bcb420-base-image:winter2025-arm64
- Got feedback from TA to reduce the coverage of the previous assignment and hide some of the code. Found out that you can hide the contents and outputs of a code block with
include = FALSE
in the code chunk header. - Initially, I couldn't get GSEA to run, it would keep on erroring. At first I thought it might be an issue with the
gmt
file because that is what I struggled with in theGSEA
assignment but after inspecting my ranked list, I realized that the gene names were omitted. I fixed my R code to change this. (see below)
# before: did not include gene names
ranked_list <-
-log10(qlf_output_hits_table$FDR) * sign(qlf_output_hits_table$logFC)
names(ranked_list) <- rownames(ranked_list)
ranked_list <- sort(ranked_list, decreasing = TRUE)
# after: includes gene names
ranked_list <-
-log10(qlf_output_hits_table$FDR) * sign(qlf_output_hits_table$logFC)
names(ranked_list) <- rownames(qlf_output_hits_table)
ranked_list <- sort(ranked_list, decreasing = TRUE)
EnrichmentMap in Cytoscape
Writing out the steps I took because they were not obvious - I had to rewatch all the lectures as my notes were not step by step.
- Installed
Cytoscape
v3.10.3 for MacOS,EnrichmentMap
v3.5.0 from the website. - Opened the Cytoscape app, opened EnrichmentMap from the "Apps" section in the menu at the top.
- Opened the data input panel.
- Input and parameters as shown in the screenshot of the EnrichmentMap input page below.
- Selecting cluster
cmd
+click
+drag
- In the "Style" panel, I set the mapping variable to "NES", and it was very apparent that the majority of the gene sets were down enriched (there was only one visible node that had a strong, positive NES value). A screenshot of this initial network is below.
- The single positively enriched node was the "HALLMARK_TNFA_SIGNALING_VIANFKB"
Day 2 - April 1st, 2025
Estimate time to complete the following tasks: 8 hours - worked 2 hours, could not continue due to sickness
Tasks
- get first cytoscape viewing β
- create themes (groups) β
- analyze one pathway (probably the one in the largest grouping) - not done
- write report - not done
Workflow - continued from yesterday
- Installed
AutoAnnotate
app v1.5.2 - Did the annotations with the default parameters but the labels overlapped so I ran it a second time, looks good now
- Installed
yFiles Layout Algorithms
app v1.1.5
Day 3/4 - April 2nd-3rd, 2025
Mostly sick, slow progress.
Tasks
- analyze one pathway (probably the one in the largest grouping) β
Workflow
-
noticed smaller clusters are from specific data bases (ex GO or REACTOME)
-
the largest theme DNA strand repair has multiple, might be interesting to look at hub pathways (nodes connecting bigger clusters)
-
tried REACTOME but Wiki pathways was easier to use as it was integrated in cytoscape
-
WikiPathways
is app but is accessed by "importing data" (below) -
selected "DNA REPAIR PATHWAYS FULL NETWORK" as gene set
Day 5/6 - April 4-5th, 2025
Recovering from bad health. Expected time : 4 hours - Actual time: 8 hours
Tasks
- do full analysis β
- write up rest of the report, figure legends etc β
Notes
- the individual nodes left out of clusters/themes were simply those that did not share genes with other genesets
- had to reread the original paper & do literature review to analyze (this took most of my time) - notes in Notion