Assignment 1: Choosing data set - bcb420-2024/Krutika_Joshi GitHub Wiki
About the publication/data-set:
GSE212591 Preclinical efficacy of azacitidine and venetoclax for infant KMT2A-rearranged ALL reveals a new therapeutic strategy
How did I choose my data set:
1.) I watched the videos Professor Isserlin posted in week 3 to understand how I can filter through GEO(Gene Expression Omnibus).
2.) Followed those steps mentioned in the video and ensured that the papers were filtered so that:
- The data set contained information for Homo sapiens
- Was published 1 years ago (Not older than four years)
- Had a supplementary file and a txt file in the publication
- The study type was bulk RNA-seq data (Expression profiling by high throughput sequencing)
- had 54 samples (more than 5 samples)
- collected under two different conditions: azacitidine and decitabine
3.) Then I started going through the links and trying to find a data-set that is linked to a publication.
4.) Once I found a publish paper, I would read the abstract of the paper to ensure it sounds interesting and fits the scope/requiremnts of this course
5.) Once I found a paper that fits all the criteria posted under Assignment one and fits my interest, I posted the link to the student wiki page.
6.) Asked for Professor Isserlin's approval.
Why did I choose this data set:
I have been interested in research for leukemia for a couple of years now, but I never got the chance to delve deeper into this topic. This data-set will give me the opportunity to read/learn more about cancer biology. In addition, this data set will allow me to apply my technical skillset practically instead of theoretically which I can then use in my grad school.