00.Datasets01 (A C) - sporedata/researchdesigneR GitHub Wiki

Select datasets

This section focuses on a few select healthcare datasets that hold special value in patient-centered research:

  • 200ks_Med_ResPaper_Abstracts (USA) #public - Consists of 200,000 abstracts for NLP and Sequential Sentence Classification Problems.
  • 2010 i2b2/VA (USA) #public - The 2010 i2b2/VA dataset.
  • 2011 i2b2/VA (USA) #public - The 2011 i2b2/VA dataset.
  • 2012 i2b2 (USA) #public - The 2012 i2b2 (Informatics for Integrating Biology and the Bedside) dataset.
  • 4DN-DP (USA) #public - The 4D Nucleome Data Portal.
  • AARP (USA) - The American Association of Retired Persons.
  • ACAG (USA) #public - The Atmospheric Composition Analysis Group.
  • AccessClinicalData (USA) #public - Enables access to and sharing of data sets and reports from NIAID COVID-19 and other sponsored clinical trials.
  • AC-MRI (USA) - Consists of 2,888 clinical MRIs of patients admitted with acute or early subacute stroke, including diverse protocols and MRI modalities with typical clinical resolution.
  • ACRD (USA) #private - Archived Clinical Research Datasets.
  • ACS (USA) #public - American Community Survey.
  • ADDEP (USA) #public|#private - Archive of Data on Disability to Enable Policy and research.
  • ADDI (USA) #public - Alzheimer’s Disease Data Initiative.
  • ADI (USA) #public - Area Deprivation Index.
  • ADKP (USA) #public - AD Knowledge Portal.
  • AHA (USA) #private - American Hospital Association .
  • AHRF (USA) #public - Area Health Resources Files.
  • AHRQ (USA) #public|#private - Agency for Healthcare Research and Quality.
  • Air-Quality-and-Meteorological-Information-of-Chile (Chile) - Compiles air quality data from the National Air Quality System (S.I.N.C.A.).
  • ALFA (USA) #public - Allele Frequency Aggregator.
  • AllOfUs (USA) #public|#private - All of Us Research Program stands as a major biomedical data resource of unparalleled scale.
  • AMDS (The Netherlands) #public - Amsterdam's Medical Data Science.
  • AMP-PD (USA) #public|#private- Accelerating Medicines Partnership – Parkinson's Disease.
  • ANA (Brazil) #public - Brazil National Water Agency.
  • APCDs (USA) #public|#private - All-Payer Claims Databases.
  • AphasiaBank (USA) #private- A collaborative repository containing multimedia interactions aimed at researching communication in individuals with aphasia.
  • ARB (USA) #public - AgingResearchBiobank.
  • ARS (Italy) #public - Agenzia Regionale di Sanità della Toscana.
  • ARSA (USA) #public - Atlas of Rural and Small-Town America.
  • ASCQ-Me (USA) #public - Adult Sickle Cell Quality of Life Measurement.
  • AYA-HOPE (USA) #public - Adolescent & Young Adult Health Outcomes & Patient Experience Study.
  • Base de Datos de Facil Acceso del Censo 2017 de Chile (Chile) - The 2017 Chilean Census Easy Access Database provides convenient access to more than 17 million records from the 2017 Census database.
  • BCAC (UK) #public- Breast Cancer Association Consortium.
  • BCBSNC (USA) #private - Blue Cross Blue Shield of North Carolina (Blue Cross NC).
  • BDC (USA) #public|#private - BioData Catalyst.
  • Bengali_Medical_Dataset (Bangladesh, India) #public - The Bengali Medical Dataset
  • BIDMC (USA) - Beth Israel Deaconess Medical Center.
  • BIFAP (Spain) #public - Base de Datos para la Investigación Farmacoepidemiológica en Atención Primaria.
  • BIL (USA) #public - Brain Image Library.
  • BindingDB (USA) #public - BindingDB is an open, centralized, web-based repository primarily focused on cataloging measured binding affinities.
  • BioASQ (USA) #public - BioASQ Challenge Data is a challenge on large-scale biomedical semantic indexing and question answering (QA).
  • BioBERT_QA_Model (USA) #public - BioBERT-based extractive question-and-answering model, finetuned on SQuAD 2.0.
  • BioLINCC (USA) #public|#private - Biologic Specimen and Data Repository Information Coordinating Center.
  • BioPortal (USA) #private - BioPortal is the most expansive integrated repository of global biomedical ontologies and controlled terminologies.
  • BioSystics-AP (USA) #private - BioSystics Analytics Platform.
  • BioVU (USA) - Vanderbilt’s de-identified DNA data bank.
  • BKAI-IGH_NeoPolyp-Small (Vietnam) #public - BKAI-IGH NeoPolypSmall is part of a larger NeoPolyp dataset aimed at medical imaging research, specifically for polyp segmentation and detection during endoscopy.
  • BOD (Bavaria) - Bavaria Oncological Dataset.
  • BossDB (USA) #Public|#private- Brain Observatory Storage Service & Database.
  • BRFSS (USA) #public- Behavioral Risk Factor Surveillance System.
  • Broadband Deployment Data (USA) #public - Used to develop broadband networks or infrastructure through which broadband services can be delivered.
  • BSO (Austria) #public - Bundesanstalt Statistik Österreich / Statistik Austria
  • BV-BRC (USA) #public - Bacterial and Viral Bioinformatics Resource Center.
  • CAHPS(R) Database (USA) #public|#private - Consumer Assessment of Healthcare Providers and Systems Database.
  • caNanoLab (USA) #public - Cancer Nanotechnology Laboratory portal.
  • CAPER (USA) #private - Comprehensive Ambulatory Professional Encounter Record.
  • CART (USA) #private - Clinical Assessment Reporting and Tracking.
  • Caserta (USA) - Contains individual claims databases with information on NHS-covered healthcare services.
  • CASI (USA) #public - Clinical Abbreviation Sense Inventory (CASI) for medical term disambiguation dataset.
  • CCDI-CCDC (USA) #public - Childhood Cancer Data Initiative's Childhood Cancer Data Catalog.
  • CCDI-MTP (USA) #public - Childhood Cancer Data Initiative's Molecular Targets Platform.
  • CCHMC (USA) #Text - Cincinnati Children’s Hospital Medical Center ICD-9 radiology corpus.
  • CDE (USA) #public - Crime Data Explorer.
  • CDI (USA) #public - Chronic Disease Indicators.
  • CDS (USA) #public|#private - Cancer Data Service.
  • CDW (USA) #private - Veterans Health Administration’s (VHA) Corporate Data Warehouse.
  • CEDCD (USA) #public - Cancer Epidemiology Descriptive Cohort Database.
  • CEGS GRID (USA) #public - The 2016 CEGS N-GRID dataset.
  • Census Tract (USA) #public - Social Determinants of Health by US Census Tract comprises social determinants of health (SDoH) constructs for each US census tract as defined by 2010 census tract boundaries.
  • censusIncarceration (USA) - shows the number of people incarcerated across the United States, per the 2000, 2010, and 2020 Decennial Census.
  • CépiDc (France) #public - Inserm CépiDc (Centre d'épidémiologie sur les causes médicales de décès).
  • CFDE (USA) #public - Common Fund Data Ecosystem.
  • CHARLS (China) #public - China Health and Retirement Longitudinal Study.
  • Chatbot (USA) #private - Contains information about University Inquiry for ordinary purposes, including a list of intents with patterns, responses, tags, and context set.
  • CHILDES (USA) #public - Child Language Data Exchange System.
  • CHRR (USA) #public - County Health Rankings and Roadmaps dataset.
  • CHSI (USA) #public - CDC's Community Health Status Indicators.
  • CI5 #public - Cancer Incidence in Five Continents database.
  • CIBMTR (USA) #public - The Center for International Blood and Marrow Transplant Research.
  • CIL (USA) #public - Cell Image Library.
  • CIMRD (USA) #private - California Independent Medical Review Dataset .
  • CKD-SS (USA) #public - Chronic Kidney Disease Surveillance System.
  • ClinicalTrials (USA) #public - ClinicalTrials.gov is a register and results database of clinical studies conducted worldwide and funded by public and private sources.
  • ClinVar (USA) #public - A publicly available archive of reports on the associations between human variations and phenotypes, along with supporting data.
  • CMDKP (USA) #public - Common Metabolic Diseases Knowledge Portal.
  • CMS claims data (USA) #public|#private- Centers for Medicare and Medicaid Services.
  • CMS Cost Reports (USA) #public - Comprises files with comprehensive cost center information from 2010-2021.
  • CMS Hospital Compare (USA) #public - Displays hospital performance data in a consistent, unified manner to ensure the availability of credible information about the care delivered in the nation’s hospitals.
  • CMS–Physician Compare (USA) #public - Provides useful information about the physicians and other healthcare professionals currently enrolled in Medicare.
  • ColonoscopicImagingDatabases #public - Open-access publicly available colonoscopic imaging databases for artificial intelligence research.
  • COMETS (USA) #public - Consortium of Metabolomics Studies.
  • Connect (USA) #public - Connect for Cancer Prevention Study (“Connect”) is a prospective cohort of 200,000 adult patients aged 40-65 years.
  • Copernicus (EU) #public|#private- The most ambitious Earth observation program headed by the European Commission (EC) and the European Space Agency (ESA).
  • CORD-19 (USA) #public - COVID-19 Open Research Dataset Challenge dataset.
  • CORD-19_corpus-Mining (USA) #public - Mining CORD-19 corpus for biomedical associations dataset captures associations between different entities in the provided Kaggle corpus.
  • CoreNLP (USA) #public - CoreNLP is a comprehensive solution for natural language processing in Java.
  • COSM (Sweden) #public - Cohort of Swedish Men dataset.
  • COUGHVID-3 (USA) #private - expert-labeled cough dataset that can be applied to a plethora of cough audio classification tasks.
  • COVID-19 vaccinations (USA) - COVID-19 Vaccinations in the United States.
  • COVID-19_EA (USA) #public|#private - The COVID-19 Evidence Accelerator.
  • COVID-19_Graphs (USA) #public - Contains a 4D sequence encoding of SARS Cov2 sequences.
  • COVID-19_Translations #public - Contains translations of COVID-19 related documents.
  • COVID-19_Tweets (USA) #public - Contains tweets with hashtags associated with Coronavirus
  • COVID-19_Tweets_India (India) #public - Contains day-wise aggregated tweets from the onset of the outbreak through July 30, 2020.
  • COVID-19_Xray #public - Contains manually drawn pixel-level lung segmentations, with and without COVID.
  • CP-CHILD-A (China) - Assesses the health-related quality of life (HRQoL) of children with Cerebral Palsy.
  • CP-CHILD-B (China) - Captures the caregivers' perspective on the HRQoL across various domains for children with moderate-severe CP.
  • CPRD (UK) #public - Clinical Practice Research Datalink.
  • CRE (USA) #public - Community Resilience Estimates.
  • CRISP (USA) #private - Chesapeake Regional Information System for our Patients
  • CRN (Norway) #public - Cancer Registry of Norway.
  • CRUK (UK) - Cancer Research UK.
  • CVC-ClinicVideoDB (Spain) #public - A collection of annotated colonoscopy videos used for research in computer-aided diagnosis (CAD) of colorectal cancer (CRC).
  • CVC‐EndoSceneStill (Spain) #public - Developed to aid in research related to gastrointestinal endoscopy, particularly the study and development of computer vision algorithms for colorectal cancer detection.
  • CVC-HDClassif (Spain) #public - Associated with medical imaging, particularly in the context of computer vision and classification tasks related to gastrointestinal (GI) health.
  • CVC-PolypHD (Spain) #public - A high-definition image dataset primarily used in colonoscopy research for detecting and segmenting polyps.
  • CVRG (USA) - The CardioVascular Research Grid.