Journal Entry: Assignment 2 - bcb420-2022/Emiliya_Stolyarova GitHub Wiki

Started: March 5 2022. Completed: March 15, 2022.

A2 R Notebook

A2 html file

Progress

I have loaded the normalized data that I have obtained during Assignment 1. I have tried to source the assignment 1 rmd file as a child document within the assignment 2 file, however this did not appear to work. Therefore, I decided to check in the normalized data as a text file.

Differential Expression

Creating a model.

The samples from the dataset are clearly divided in the MDS plot by their control group and protein expressing group and therefore I may use the exactTest model available in edgeR. However, sample number may still have an effect on the gene expression though it is not apparent and therefore the Quasi liklihood model in edgeR should be more preferable.

One of the sample groups is missing from the dataset which causes some difficulty when constructing the model.

Correction and multiple hypothesis testing

The method used with the edgeR package appears to use FDR for multiple hypothesis testing. The R Documentation for p.adjust {stats} suggests that FDR is another name for the Benjamni - hochberg method.

Heatmap

The heatmap without row normalization results in the of majority of values indicated in blue indicating a large amount of low values. Row normalization creates a significantly more even distribution of colours in the heatmap.

Over-representation analysis

Annotation source

I would like to use annotation data from the Human Phenotype Ontology. I wanted to use the HPOSim R package for the enrichment analysis but it has been archived on CRAN (Deng et al., 2015). I believe that it is better to use a package which has not been archived for my analysis.

I was able access HPO data through the gprofiler2 package (Kolberg et al., 2020). The top terms were hard to interpret in the context of stopping cancer progression with protein expression. It would be better to analyze terms which relate to the molecular processes behind cancer rather than phenotypes. I have therefore decided to use Reactome instead (Gillespie et al., 2021).

The original research article appears to use GSEA with MsigDB (Huangyang et al., 2020)(Subramanian et al., 2005).

References

Deng, Y., Gao, L., Wang, B., & Guo, X. (2015). HPOSim: an R package for phenotypic similarity measure and enrichment analysis based on the human phenotype ontology. PloS one, 10(2), e0115692. https://doi.org/10.1371/journal.pone.0115692

Huangyang, P., Li, F., Lee, P., Nissim, I., Weljie, A. M., Mancuso, A., Li, B., Keith, B., Yoon, S. S., & Simon, M. C. (2020). Fructose-1,6-bisphosphatase 2 inhibits sarcoma progression by restraining mitochondrial biogenesis. Cell Metabolism, 31(1), 174–188.e7. https://doi.org/https://doi.org/10.1016/j.cmet.2019.10.012

Kolberg, L., Raudvere, U., Kuzmin, I., Vilo, J., & Peterson, H. (2020). gprofiler2– an r package for gene list functional enrichment analysis and namespace conversion toolset g:profiler. F1000Research, 9 (ELIXIR)(709).

Subramanian, A., Tamayo, P., Mootha, V. K., Mukherjee, S., Ebert, B. L., Gillette, M. A., Paulovich, A., Pomeroy, S. L., Golub, T. R., Lander, E. S., & Mesirov, J. P. (2005). Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proceedings of the National Academy of Sciences of the United States of America, 102(43), 15545–15550. https://doi.org/10.1073/pnas.0506580102