Journal Entry: Finding a Dataset - bcb420-2022/Emiliya_Stolyarova GitHub Wiki

Started: January 28, 2022. Completed: February 1, 2022.

Objective: Finding a Dataset to use from GEO which I will use throughout this course.

Progress

I have installed the GEOmetadb package and the GEOmetadb.sqlite file. I am searching for datasets which are interesting to me and which have supplementary "txt" files.

I searched for datasets that mentioned "sarcoma" in their title.

Dataset chosen: GSE137755

I have posted my chosen dataset on the assignment 1 sign up page, and my dataset has been approved on February 1.

Checking the values in the dataset

In RStudio, when using head(), the dataset shows 4 rows of controls and 5 rows of vectors and a row with gene IDs. The values in the dataset seem to range from 0 to 2. Using max() to check the maximum value of a column in the dataset, the output is significantly higher than 2, which indicates that the dataset has a range of values larger than that shown with head().

Conclusions

I am now able to start data normalization with the dataset I have chosen.

References

Huangyang, P., Li, F., Lee, P., Nissim, I., Weljie, A. M., Mancuso, A., Li, B., Keith, B., Yoon, S. S., & Simon, M. C. (2020). Fructose-1,6-Bisphosphatase 2 Inhibits Sarcoma Progression by Restraining Mitochondrial Biogenesis. Cell metabolism, 31(1), 174–188.e7. https://doi.org/10.1016/j.cmet.2019.10.012

Zhu, Y., Davis, S., Stephens, R., Meltzer, P. S., & Chen, Y. (2008). GEOmetadb: powerful alternative search engine for the Gene Expression Omnibus. Bioinformatics (Oxford, England), 24(23), 2798–2800. https://doi.org/10.1093/bioinformatics/btn520