Data overview - borenstein-lab/microbiome-metabolome-curated-data GitHub Wiki

Below we describe how the data is organized, which datasets are included in the resource and how to access the data.

Data organization

Each dataset in the collection is organized in 4 (or 5) tables:

  • metadata.tsv: sample metadata, including subject's age, gender, study group (case or control), etc. (see here);
  • mtb.tsv: metabolite levels per sample;
  • mtb.map.tsv: mappings of original metabolite identifiers to KEGG and HMDB ID's;
  • genera.tsv: genera relative abundances per sample; Shotgun metagenomics datasets have an additional species table with species-level abundances.

Tables are available as tab-delimited (.tsv) text files, or as an RData object (.RData) for quick loading into R.

Datasets included

Data was obtained only from studies that met the following criteria:

  • Human cohort
  • At least 40 stool samples collected
  • Metagenomic profiles from stool samples available
  • Metabolomic profiles from same stool samples available
  • Basic metadata per sample/subject available

Data from the original studies were obtained from public data repositories, supplementary information accompanying relevant publications, or directly from the authors via email.

The collection currently includes the following datasets:

Dataset name Ref. Cohort description No. samples* No. subjects* Metagenomics approach Metabolomics approach
YACHIDA-CRC-2019** [1] Patients with colonoscopy findings from normal to stage 4 CRC 347 [127] 347 [127] Shotgun Targeted, CE-TOFMS
FRANZOSA-IBD-2019 [2] IBD patients and controls (PRISM cohort + A validation cohort) 220 [56] 220 [56] Shotgun Untargeted, Four complimentary LC-MS methods
SINHA-CRC-2016 [3] CRC patients and controls 131 [89] 131 [89] 16S V3-V4 Untargeted, HPLC-GC/MS-MS
HE-INFANTS-MFGM-2019 [4] Infants on different diets during their 1st year of life 277 80 16S V4 Targeted, 1H-NMR
iHMP-IBDMDB-2019 [5] HMP2 cohort: Longitudinal samples from IBD patients and controls 382 [104] 105 [26] Shotgun Untargeted, Four complimentary LC-MS methods
JACOBS-IBD-RELATIVES-2016 [6] IBD patients and their first degree (healthy) relatives 90 [54] 90 [54] 16S V4 Untargeted, UPLC/ToFMS
POYET-BIO-ML-2019 [7] Longitudinal samples from healthy BIO-ML (stool bank) donors 164 83 WGSS + 16S V4 Untargeted, Four complimentary LC-MS methods
ERAWIJANTARI-GASTRIC-CANCER-2020** [8] Patients with a history of gastrectomy for GC 96 [54] 96 [54] Shotgun Targeted, CE-TOFMS
KIM-ADENOMAS-2020 [9] Patients with advanced colorectal adenomas, CRC, and controls 240 [102] 240 [102] 16S V3-V5 Untargeted, UPLC-MS/MS
MARS-IBS-2020 [10] Longitudinal samples from patients with IBS and controls 444 [139] 75 [24] Shotgun Targeted, 1H-NMR + LC-MS/MS
KANG-AUTISM-2018 [11] Children with autism and neurotypical children 44 [21] 44 [21] 16S V2-V3 Targeted, 1H-NMR
KOSTIC-INFANTS-DIABETES-2015 [12] Longitudinal samples from children at risk for T1D (DIABIMMUNE cohort) 103 [37] 19 [8] 16S V4 Untargeted, Four complimentary LC-MS methods
WANDRO-PRETERMS-2018 [13] Preterm infants during their first 6 months of life. Some developed LOS/NEC 75 [37] 32 [21] 16S V3-V4 Untargeted, GC-MS
WANG-ESRD-2020 [14] Adults with ESRD and controls 287 [67] 287 [67] Shotgun Untargeted, GC-MS

* The number of samples/subjects from the "control" study group is noted in square brackets (relevant for case-control study designs). Longitudinal studies have multiple samples per subject, therefore the number of samples is higher than the number of subjects.

** There's an overlap of control subjects between YACHIDA_CRC_2019 and ERAWIJANTARI_GASTRIC_CANCER_2020 datasets. An indication for overlapping patients was added to the metadata tables of these datasets.

CRC: Colorectal cancer; IBD: Inflammatory bowel disease; GC: Gastric cancer; IBS: Irritable bowel syndrome; ASD: Autism spectrum disorder; T1D: Type 1 diabetes; LOS: Late-onset sepsis; MFGM: Milk fat globule membrane; NEC: Necrotizing enterocolitis; BIO-ML: Broad Institute-OpenBiome Microbiome Library; ESRD: End-stage renal disease;

Links to raw data, additional details and important notes per dataset can be found in the supplementary tables.

Data access

All data files can be found in the pertaining github repository under the /data/processed_data folder. Users can either clone the repository or specifically download the required data files.

References

  1. Yachida, Shinichi, et al. "Metagenomic and metabolomic analyses reveal distinct stage-specific phenotypes of the gut microbiota in colorectal cancer." Nature medicine 25.6 (2019): 968-976.
  2. Franzosa, Eric A., et al. "Gut microbiome structure and metabolic activity in inflammatory bowel disease." Nature microbiology 4.2 (2019): 293-305.
  3. Sinha, Rashmi, et al. "Fecal microbiota, fecal metabolome, and colorectal cancer interrelations." PloS one 11.3 (2016): e0152126.
  4. He, Xuan, et al. "Fecal microbiome and metabolome of infants fed bovine MFGM supplemented formula or standard formula with breast-fed infants as reference: a randomized controlled trial." Scientific reports 9.1 (2019): 1-14.
  5. Lloyd-Price, Jason, et al. "Multi-omics of the gut microbial ecosystem in inflammatory bowel diseases." Nature 569.7758 (2019): 655-662.
  6. Jacobs, Jonathan P., et al. "A disease-associated microbial and metabolomics state in relatives of pediatric inflammatory bowel disease patients." Cellular and molecular gastroenterology and hepatology 2.6 (2016): 750-766.
  7. Poyet, M., et al. "A library of human gut bacterial isolates paired with longitudinal multiomics data enables mechanistic microbiome research." Nature medicine 25.9 (2019): 1442-1452.
  8. Erawijantari et al. Influence of gastrectomy for gastric cancer treatment on faecal microbiome and metabolome profiles. Gut. 2020 Aug;69(8):1404-1415.
  9. Kim, Minsuk, et al. "Fecal metabolomic signatures in colorectal adenoma patients are associated with gut microbiota and early events of colorectal cancer pathogenesis." MBio 11.1 (2020): e03186-19.
  10. Mars, Ruben AT, et al. "Longitudinal multi-omics reveals subset-specific mechanisms underlying irritable bowel syndrome." Cell 182.6 (2020): 1460-1473.
  11. Kang, Dae-Wook, et al. "Differences in fecal microbial metabolites and microbiota of children with autism spectrum disorders." Anaerobe 49 (2018): 121-131.
  12. Kostic, Aleksandar D., et al. "The dynamics of the human infant gut microbiome in development and in progression toward type 1 diabetes." Cell host & microbe 17.2 (2015): 260-273.
  13. Wandro, Stephen, et al. "The microbiome and metabolome of preterm infant stool are personalized and not driven by health outcomes, including necrotizing enterocolitis and late-onset sepsis." Msphere 3.3 (2018): e00104-18.
  14. Wang, Xifan, et al. "Aberrant gut microbiota alters host metabolome and impacts renal failure in humans and rodents." Gut 69.12 (2020): 2131-2142.
⚠️ **GitHub.com Fallback** ⚠️