Assignment #3 ‐ Data set Pathway and Network Analysis - bcb420-2024/Anna_Lai GitHub Wiki
Data set Pathway and Network Analysis
Date April 02, 2024
Notes that are not included in the final submission. For a coherent story of the data, please refer to the HTML file generated for this assignment.
Links to Assignment 2:
The R Markdown file: https://github.com/bcb420-2024/Anna_Lai/blob/main/A2_AnnaLai.Rmd
The HTML filed: https://github.com/bcb420-2024/Anna_Lai/blob/main/A2_AnnaLai.html
Non-Thresholded Gene set Enrichment Analysis
GESA is a non-thresholded gene set enrichment analysis method. For details please refer to the RMarkdown.
Visualization of Gene set Enrichment Analysis in Cytospace
Encountered difficulties building the network. Perhaps because the coverage was too high.
[1712777096273] [INFO] Unable to create Enrichment Map: Internal Server Error Failed: Cannot read the array length because "dir_listing" is nul
The first attempt to solve the problem was reduce the number of genes included using this code:
gsea_r <- read.delim("./a3_analysis.GseaPreranked.1712263512608/edb/ranked.rnk", row.names = NULL)
colnames(gsea_r) <- c("Gene", "value")
gsea_r
sorted_negative_genes <- top_n(arrange(subset(gsea_r, value < 0), desc(value)), n = 1000)
merged_data <- rbind(subset(gsea_r, value >= 0)[1:1000,], sorted_negative_genes)
write.table(merged_data, "./a3_analysis.GseaPreranked.1712263512608/edb/ranked_less.rnk", sep="\t", quote=FALSE, row.names=FALSE)
Yet I got the same error message.
To solve this problem, I built the network inside Cytoscape instead of GSEA as shown below. the network was successfully built.
Interpretation and detailed view of results
- Do the enrichment results support the conclusions or mechanisms discussed in the original paper? How do these results differ from the results you got from Assignment #2 thresholded methods
- Can you find evidence, i.e. publications, to support some of the results that you see? How does this evidence support your result?
Please see the RMarkdown for answers to these questions.
Post analysis of the main network
I chose to study the drug target of post analyst of the main network because the cells are known to have PODOCIN protein variation. Hence I would like to dwell on the expression difference and pathway connection difference resulting from PODOCIN protein mutation.
The dataset used: https://download.baderlab.org/EM_Genesets/March_01_2024/Human/symbol/DrugTargets/
Human_DrugBank_approved_symbol.gmt
Additional Question Add a post-analysis to your main network using specific transcription factors, microRNAs or drugs. Include the reason why you chose the specific miRs, TFs or drugs (i.e publications indicating that they might be related to your model). What does this post-analysis show? Please refer to the answer in the RMarkDown Notebook.
It was very interesting to see some nodes with multiple, or even more than 10 approved drug targets.
There's drugs that target multiple pathways as well.
Links to Assignment 3:
The R Markdown file: https://github.com/bcb420-2024/Anna_Lai/blob/main/A3_AnnaLai.Rmd
The HTML filed: https://github.com/bcb420-2024/Anna_Lai/blob/main/A3_AnnaLai.html
Citations
For the RNotebook, I used a research aid application Zotero to generate the bib file as mentioned in the previous journal.
Dorison A, Ghobrial I, Graham A, Peiris T et al. Kidney Organoids Generated Using an Allelic Series of NPHS2 Point Variants Reveal Distinct Intracellular Podocin Mistrafficking. J Am Soc Nephrol 2023 Jan 1;34(1):88-109. PMID: 36167728