Assignment 1: Choosing data set - bcb420-2024/Krutika_Joshi GitHub Wiki

About the publication/data-set:

GSE212591 Preclinical efficacy of azacitidine and venetoclax for infant KMT2A-rearranged ALL reveals a new therapeutic strategy

How did I choose my data set:

1.) I watched the videos Professor Isserlin posted in week 3 to understand how I can filter through GEO(Gene Expression Omnibus).

2.) Followed those steps mentioned in the video and ensured that the papers were filtered so that:

  • The data set contained information for Homo sapiens
  • Was published 1 years ago (Not older than four years)
  • Had a supplementary file and a txt file in the publication
  • The study type was bulk RNA-seq data (Expression profiling by high throughput sequencing)
  • had 54 samples (more than 5 samples)
  • collected under two different conditions: azacitidine and decitabine

3.) Then I started going through the links and trying to find a data-set that is linked to a publication.

4.) Once I found a publish paper, I would read the abstract of the paper to ensure it sounds interesting and fits the scope/requiremnts of this course

5.) Once I found a paper that fits all the criteria posted under Assignment one and fits my interest, I posted the link to the student wiki page.

6.) Asked for Professor Isserlin's approval.

Why did I choose this data set:

I have been interested in research for leukemia for a couple of years now, but I never got the chance to delve deeper into this topic. This data-set will give me the opportunity to read/learn more about cancer biology. In addition, this data set will allow me to apply my technical skillset practically instead of theoretically which I can then use in my grad school.