Entry 11‐12.1: Assignment 3 - bcb420-2025/Chloe_Calica GitHub Wiki

Objective: Document my process in completing Assignment 3 with the main goal of taking the ranked genes in Assignment 2 and performing pathway and network analysis.

Expected Time: 20 hours

Actual Time: (04/02 - 4hrs, 04/03 - )

Aside: To whoever is reading this, I know I did it late again please forgive me 🥹. Senioritis has kicked in yet again. I promised not to be late this time because I did submit A2 late as well, but I should have known better not to keep my hopes up. I think I can push through this assignment though and not to worry, all my assignments in all other courses are late as well so no selectivity there 😅. If one is late, everthing is late, at least I still submitted it. I still don't know if anyone is reading this but if this is inappropriate, I apologize loool. I keep ending up treating it as an actual journal albeit not scientific or professional but still related to the project regardless. Welp, thanks for listening, lez get to work.

Nevermind, who was I kidding 💀. I should have known I cant do anything when I just don't feel like it. Anyways, It's been 5 days since D-day and we still here. At least I still plan on submitting it. This is the sign the universe gave me that I should really leave school and never look back 🤡. Should have kept my minimum wage Maccads jobs.

Girl, Im never finishing this. Should I just submit now, coz I'm too tired to care anymore.

Ok Chloe, we finally done. good job, at least u finished it, albeit 1 week later loooool.

Feedback From Assignment 2

I surprisingly did well in the previous assignment despite submitting late. Shoutout to Professor Isserlin who was nice enough to give me an extension. (Thanks a ton!). Will not beg for an extension on this assignment. I deserve the deduction after all, but I will try to get the bonus points at least: "If GSEA and Cytoscape and EM performed on the fly then you can get up to 4 bonus marks on the assignment."

No outstanding comments to be addressed form Assignment 2. Will do the same as I did by adding specific figure captions, justifications for methods used, and adding text explanations before each code output.

Introduction

  • Started the assignment by creating a summary of assignment 1 and 2 though with more detail on the second one.

  • I wasn't sure if I should follow the knitr_child example or just add a link to the assignments. Will test how it looks first and if it looks fine i.e. not long enough then I'll maybe use it. If it's too bulky though might just opt to use the link so it doesnt clutter the report.

    • Yeah it ended up adding the whole assignment which isn't really that nice since the document would be too long.
    • would just include a link to the html file in github
  • Keep forgetting to add the docker command here for easy reference

docker run -e PASSWORD=changeit --rm -v ${PWD}:/home/rstudio/projects -p 8787:8787 risserlin/bcb420-base-image:winter2025

Non-thresholded GSEA

Obtaining Ranked Gene List

  • Obtained the QLF Hits dataframe from A2 then performed all processing in A3 to create the rank file.
  • Referred to GSEA Lecture on how to create the rankings.
  • Searched about how the .rnk file is formatted and its basically just two columns with the gene and the ranking so created that file using write.table

Running GSEA

  • When I was getting the GMT file using the code from the Bader lab, I kept running into an error saying that "No such file or directory" exists. Turns out, I just wrote the output dir wrong. Instead of gseaOutput, I wrote /gseaOutput. After I fixed this, I was able to download the gmt file with no problems and i also changed all other variables using directory file names.
  • I had a hard time running the GSEA programmatically in R and I found it was because the Java installed in the docker was not updated based on the GSEA requirements. I further got confused because I did not realize early on that the Java in my computer is different from the Java in the docker container. So even though I did have the latest Java version in my computer, I wasn't actually changing the version in the docker instance.
  • In order to fix this, I consulted my dear best friend ChatGPT and I ended up in Windows Poershell trying to update Java in my docker instance. Here are the stps that I followed which ventually allowed GSEA to run.
    • Restart Docker with Root User
      • I tried doing the following steps in the Rstudio terminal, but I kept getting errors saying it did not have sudo capabilities so I asked ChatGPT and this is the route it made me go instead
      • Opened Windows Powershell in my computer.
      • Type the ff. command: docker exec -it --user root <container_name> bash
      • To determine my containe name, I just wen to my already open docker desktop app.
      • Now, I was inside the container as root.
    • Install Java 21 (latest stable version)
      • apt update to get all available updates
      • apt install -y openjdk-21-jdk to install latest Java version inside Docker
    • Verify that Java version is changed
      • it should automatically set the latest one to default but to sure, I just typed java -version to see which one is currently set
      • then I just restarted the RStudio container by typing exit then docker restart <container_name>
      • I reran my code and I did not get any immediate errors which was a good sign. It ended up taking a while to finish, but I now see resealts in the ouput directory I have specified.

GSEA Results

  • Was able to successfully run GSEA, though was a bit confused by the results at first. I ended up finding the index.html file which is exactly the same as the one in the GSEA GUI version and form there, I was able to navigate to the more specific results of the analysis.
  • Asked ChatGPT if there is a way to embed html commands in the Rmarkdown and eventually found the ff command which eas able to show the preview of index.html although some of the internal links were broken. Was too lazy to try and figure it out as it wasn't that essential so I just stayed with the ff code:
# {r, echo=FALSE, results='asis'} comment it out in case it runs??
cat('<iframe src="../gseaOutput/ControlvsBbExposure/index.html" width="100%" height="500px"></iframe>')
  • Got confused again on which one is na_pos and na_neg so had to read around on how to interpret the results.

Visualization with Cytoscape

  • Followed the automation steps for cytoscape, though I had to use the GUI to manually layout the networks the way I wanted it.
  • Played around the settings so it displays nicely.
  • For the specific pathway I looked at, I couldn't figure out how to put on the p-value and original log fold value so I just did the rank score, the same one I used to generate the GSEA results. This was the only one that I saw from the options.
    • I also got a bit confused with this one since apparently it only needed a gene list and not our gsea results, so I went back to my network, found the pathway in the Node Table and saw there is a column there for the genes associated with it. I copied that and pasted that to Reactome where it did like a pathway enrichment analysis?
  • I didn't automate this last part since it's more visual work.
⚠️ **GitHub.com Fallback** ⚠️