Assignment #3 ‐ Data set Pathway and Network Analysis - bcb420-2024/Anna_Lai GitHub Wiki

Data set Pathway and Network Analysis

Date April 02, 2024

Notes that are not included in the final submission. For a coherent story of the data, please refer to the HTML file generated for this assignment.

Links to Assignment 2:

The R Markdown file: https://github.com/bcb420-2024/Anna_Lai/blob/main/A2_AnnaLai.Rmd

The HTML filed: https://github.com/bcb420-2024/Anna_Lai/blob/main/A2_AnnaLai.html

Non-Thresholded Gene set Enrichment Analysis

GESA is a non-thresholded gene set enrichment analysis method. For details please refer to the RMarkdown.

Visualization of Gene set Enrichment Analysis in Cytospace

Encountered difficulties building the network. Perhaps because the coverage was too high. [1712777096273] [INFO] Unable to create Enrichment Map: Internal Server Error Failed: Cannot read the array length because "dir_listing" is nul

The first attempt to solve the problem was reduce the number of genes included using this code:

gsea_r <- read.delim("./a3_analysis.GseaPreranked.1712263512608/edb/ranked.rnk", row.names = NULL)
colnames(gsea_r) <- c("Gene", "value")
gsea_r
sorted_negative_genes <- top_n(arrange(subset(gsea_r, value < 0), desc(value)), n = 1000)
merged_data <- rbind(subset(gsea_r, value >= 0)[1:1000,], sorted_negative_genes)
write.table(merged_data, "./a3_analysis.GseaPreranked.1712263512608/edb/ranked_less.rnk", sep="\t", quote=FALSE, row.names=FALSE)

Yet I got the same error message.

To solve this problem, I built the network inside Cytoscape instead of GSEA as shown below. the network was successfully built.

enrichment_param

Interpretation and detailed view of results

Do the enrichment results support the conclusions or mechanisms discussed in the original paper? How do these results differ from the results you got from Assignment #2 thresholded methods
Can you find evidence, i.e. publications, to support some of the results that you see? How does this evidence support your result?

Please see the RMarkdown for answers to these questions.

Post analysis of the main network

I chose to study the drug target of post analyst of the main network because the cells are known to have PODOCIN protein variation. Hence I would like to dwell on the expression difference and pathway connection difference resulting from PODOCIN protein mutation.

The dataset used: https://download.baderlab.org/EM_Genesets/March_01_2024/Human/symbol/DrugTargets/

Human_DrugBank_approved_symbol.gmt

Additional Question Add a post-analysis to your main network using specific transcription factors, microRNAs or drugs. Include the reason why you chose the specific miRs, TFs or drugs (i.e publications indicating that they might be related to your model). What does this post-analysis show? Please refer to the answer in the RMarkDown Notebook.

Initial drug target map generated. It looked messy, hence I reorganized it.

It was very interesting to see some nodes with multiple, or even more than 10 approved drug targets.

Super Node

Super node 2

There's drugs that target multiple pathways as well.

Muti target

Links to Assignment 3:

The R Markdown file: https://github.com/bcb420-2024/Anna_Lai/blob/main/A3_AnnaLai.Rmd

The HTML filed: https://github.com/bcb420-2024/Anna_Lai/blob/main/A3_AnnaLai.html

Citations

For the RNotebook, I used a research aid application Zotero to generate the bib file as mentioned in the previous journal.

Dorison A, Ghobrial I, Graham A, Peiris T et al. Kidney Organoids Generated Using an Allelic Series of NPHS2 Point Variants Reveal Distinct Intracellular Podocin Mistrafficking. J Am Soc Nephrol 2023 Jan 1;34(1):88-109. PMID: 36167728