2 Expression Data - bcb420-2023/Helena_Jovic GitHub Wiki

A1: Select an Expression Data Set

Objective

  • Select an expression dataset

Time Management

Date Started: 2022-02-09
Data Completed: 2022-02-12
Estimated Time: 3 hours
Actual Time: 10 hours

Procedure

  1. Select an Expression Data Set. Choose a dataset of native, healthy human cells or tissue.
  2. Choose an interesting experiment. Their expression response to the experimental conditions must reflect some biological property. Ideally, this will be a physiological response of some sort. It is your task to reflect on this question and choose accordingly.
  3. Make sure the coverage is as complete as possible. Experiments that measure expression for only a small subset of genes are not suitable.
  4. Choose high-quality experiments. The experiments should be performed with biological replicates (the more the better). It also should be performed with mature experimental platforms, according to best-practice procedures; therefore we should choose recent experiments (not older than ten years). As above, contact me for special permission if you want to deviate from this requirement.
  5. Claim the dataset on the dataset signup page of the Student Wiki Links to an external site.

Course Notes

  • GEO contains expression data collected from a variety of technologies.
  • Choose a gene expression platform like microarrays
  • We want expression datasets with good coverage; not much older than ten years (quality!); with sufficient numbers of replicates; collected under interesting conditions; mapped to unique human gene identifiers.

Workflow

Choose dataset

  • Visited the GEO website (https://www.ncbi.nlm.nih.gov/geo/) and navigated to the Browse Contents tab, and clicked on "Series".
  • Filtered my query search in the Builder using the following filters: (((((count[Description]) AND txt[Description]) AND homo sapiens[Organism]) AND ("2013/01/01"[Publication Date] : "3000"[Publication Date]))) AND HIV[Title].
  • This yielded a total of 5 results, and I chose the first result Accession Number: GSE184320 "Loss of skin and mucosal CXCR3+ resident memory T cells causes irreversible tissue-confined immunodeficiency in HIV" because it seemed the most interesting to me.
  • The dataset was submitted on Sep 16, 2021, uses the Illumina HiSeq 4000 (Homo sapiens) platform and has three replicates for each group out of 28 samples.

Read paper associated with selected dataset:

  • Citation: Saluzzo S, Pandey RV, Gail LM, Dingelmaier-Hovorka R et al. Delayed antiretroviral therapy in HIV-infected individuals leads to irreversible depletion of skin- and mucosa-resident memory T cells. Immunity 2021 Dec 14;54(12):2842-2858.e5. PMID: 34813775

Issues

I had issues with the 'GEOmetadb.sqlite' file when following the procedure described in Lecture 3: Finding Expression Data so I searched for my dataset manually. Initially, the first dataset I selected was not suitable for the assignment because there was no supplemental file containing count data.