Journal Entry: Assignment 1 - bcb420-2022/Emiliya_Stolyarova GitHub Wiki

Started: February 6, 2022. Completed: February 22, 2022.

Objective: Normalize the data in the dataset I have chosen.

Previous Journal Entry: Finding a Dataset

Progress

Starting Data Normalization

When counting the occurrences of individual genes in my chosen database, it seems that each GeneID appears only once. Since there does not seem to be duplicates, I need to find out whether it is necessary to filter out any genes from the database.

Another issue I have encountered is that there are different number of sets of controls and cells expressing the protein of interest. The paper associated with the dataset indicates that there are five controls which have been treated with the vector, and four sets of cells expressing FBP2 (Huangyang et al., 2020). The paper shows little information on the reasoning for which the two groups have different sample numbers. We may perhaps assume that the researchers experienced complications with the protein expressing F2 group missing from the dataset and thus it has been excluded.

When applying normalization, I noticed that the data which I have put in a boxplot showed very little change after the normalization was applied.

Identifier mapping

HGNC is the abbreviation for the HUGO Gene Nomenclature Committee (Tweedie et al. 2021). I have at first mapped the gene IDs with version numbers to HUGO symbols which did not work since it returned NA for all of the gene IDs. I have then removed the version numbers which solved this problem.

References

Huangyang, P., Li, F., Lee, P., Nissim, I., Weljie, A. M., Mancuso, A., Li, B., Keith, B., Yoon, S. S., & Simon, M. C. (2020). Fructose-1,6-Bisphosphatase 2 Inhibits Sarcoma Progression by Restraining Mitochondrial Biogenesis. Cell metabolism, 31(1), 174–188.e7. https://doi.org/10.1016/j.cmet.2019.10.012

Tweedie, S., Braschi, B., Gray, K., Jones, T., Seal, R. L., Yates, B., & Bruford, E. A. (2021). Genenames.org: the HGNC and VGNC resources in 2021. Nucleic acids research, 49(D1), D939–D946. https://doi-org.myaccess.library.utoronto.ca/10.1093/nar/gkaa980