Entry 7: Process of Finding a Dataset - bcb420-2025/Izumi_Ando GitHub Wiki

Criteria

  • Cancer related dataset from Japan, or labs in universities I might want to work in in the future
  • 6+ samples, 3 per comparable conditions, all from homo sapiens
  • bulk RNAseq
  • published in or after 2022
  • ideally associated with a publication in a notable journal

Candidate 1: GSE252906

  • Japan
  • publication in Gynecologic Oncology which has a slightly low but not too low IF
  • Concern : The main analysis done in this dataset is differential expression analysis. Is this okay? - (update) Yes according to Prof Isserlin

Candidate 2: GSE275276

  • Stanford
  • publication in Integrative Biology
  • uses enteroids (type of organoid) to test the anti-tumor effect of FLASH radiation therapy
  • Issue: data file broken

Candidate 3: GSE214968

  • ETH Zurich
  • publication in Nature Portfolio
  • Issue: this dataset is scRNAseq, the publication also has other bulk RNA seq data as well but it only has 2 samples

Candidate 4: GSE226798

Candidate 5: GSE201427

  • ETH Zurich
  • data file is very organized, and paper is readable
  • dataset of Panc1 cells either treated with control siRNA or siRNA targeting SF3B1, 3 of each
  • publication in Cell Press