00.Datasets01 (A C) - sporedata/researchdesigneR GitHub Wiki
Select datasets
This section focuses on a few select healthcare datasets that hold special value in patient-centered research:
- 200ks_Med_ResPaper_Abstracts (USA) #public - Consists of 200,000 abstracts for NLP and Sequential Sentence Classification Problems.
- 2010 i2b2/VA (USA) #public - The 2010 i2b2/VA dataset.
- 2011 i2b2/VA (USA) #public - The 2011 i2b2/VA dataset.
- 2012 i2b2 (USA) #public - The 2012 i2b2 (Informatics for Integrating Biology and the Bedside) dataset.
- 4DN-DP (USA) #public - The 4D Nucleome Data Portal.
- AARP (USA) - The American Association of Retired Persons.
- ACAG (USA) #public - The Atmospheric Composition Analysis Group.
- AccessClinicalData (USA) #public - Enables access to and sharing of data sets and reports from NIAID COVID-19 and other sponsored clinical trials.
- AC-MRI (USA) - Consists of 2,888 clinical MRIs of patients admitted with acute or early subacute stroke, including diverse protocols and MRI modalities with typical clinical resolution.
- ACRD (USA) #private - Archived Clinical Research Datasets.
- ACS (USA) #public - American Community Survey.
- ADDEP (USA) #public|#private - Archive of Data on Disability to Enable Policy and research.
- ADDI (USA) #public - Alzheimer’s Disease Data Initiative.
- ADI (USA) #public - Area Deprivation Index.
- ADKP (USA) #public - AD Knowledge Portal.
- AHA (USA) #private - American Hospital Association .
- AHRF (USA) #public - Area Health Resources Files.
- AHRQ (USA) #public|#private - Agency for Healthcare Research and Quality.
- Air-Quality-and-Meteorological-Information-of-Chile (Chile) - Compiles air quality data from the National Air Quality System (S.I.N.C.A.).
- ALFA (USA) #public - Allele Frequency Aggregator.
- AllOfUs (USA) #public|#private - All of Us Research Program stands as a major biomedical data resource of unparalleled scale.
- AMDS (The Netherlands) #public - Amsterdam's Medical Data Science.
- AMP-PD (USA) #public|#private- Accelerating Medicines Partnership – Parkinson's Disease.
- ANA (Brazil) #public - Brazil National Water Agency.
- APCDs (USA) #public|#private - All-Payer Claims Databases.
- AphasiaBank (USA) #private- A collaborative repository containing multimedia interactions aimed at researching communication in individuals with aphasia.
- ARB (USA) #public - AgingResearchBiobank.
- ARS (Italy) #public - Agenzia Regionale di Sanità della Toscana.
- ARSA (USA) #public - Atlas of Rural and Small-Town America.
- ASCQ-Me (USA) #public - Adult Sickle Cell Quality of Life Measurement.
- AYA-HOPE (USA) #public - Adolescent & Young Adult Health Outcomes & Patient Experience Study.
- Base de Datos de Facil Acceso del Censo 2017 de Chile (Chile) - The 2017 Chilean Census Easy Access Database provides convenient access to more than 17 million records from the 2017 Census database.
- BCAC (UK) #public- Breast Cancer Association Consortium.
- BCBSNC (USA) #private - Blue Cross Blue Shield of North Carolina (Blue Cross NC).
- BDC (USA) #public|#private - BioData Catalyst.
- Bengali_Medical_Dataset (Bangladesh, India) #public - The Bengali Medical Dataset
- BIDMC (USA) - Beth Israel Deaconess Medical Center.
- BIFAP (Spain) #public - Base de Datos para la Investigación Farmacoepidemiológica en Atención Primaria.
- BIL (USA) #public - Brain Image Library.
- BindingDB (USA) #public - BindingDB is an open, centralized, web-based repository primarily focused on cataloging measured binding affinities.
- BioASQ (USA) #public - BioASQ Challenge Data is a challenge on large-scale biomedical semantic indexing and question answering (QA).
- BioBERT_QA_Model (USA) #public - BioBERT-based extractive question-and-answering model, finetuned on SQuAD 2.0.
- BioLINCC (USA) #public|#private - Biologic Specimen and Data Repository Information Coordinating Center.
- BioPortal (USA) #private - BioPortal is the most expansive integrated repository of global biomedical ontologies and controlled terminologies.
- BioSystics-AP (USA) #private - BioSystics Analytics Platform.
- BioVU (USA) - Vanderbilt’s de-identified DNA data bank.
- BKAI-IGH_NeoPolyp-Small (Vietnam) #public - BKAI-IGH NeoPolypSmall is part of a larger NeoPolyp dataset aimed at medical imaging research, specifically for polyp segmentation and detection during endoscopy.
- BOD (Bavaria) - Bavaria Oncological Dataset.
- BossDB (USA) #Public|#private- Brain Observatory Storage Service & Database.
- BRFSS (USA) #public- Behavioral Risk Factor Surveillance System.
- Broadband Deployment Data (USA) #public - Used to develop broadband networks or infrastructure through which broadband services can be delivered.
- BSO (Austria) #public - Bundesanstalt Statistik Österreich / Statistik Austria
- BV-BRC (USA) #public - Bacterial and Viral Bioinformatics Resource Center.
- CAHPS(R) Database (USA) #public|#private - Consumer Assessment of Healthcare Providers and Systems Database.
- caNanoLab (USA) #public - Cancer Nanotechnology Laboratory portal.
- CAPER (USA) #private - Comprehensive Ambulatory Professional Encounter Record.
- CART (USA) #private - Clinical Assessment Reporting and Tracking.
- Caserta (USA) - Contains individual claims databases with information on NHS-covered healthcare services.
- CASI (USA) #public - Clinical Abbreviation Sense Inventory (CASI) for medical term disambiguation dataset.
- CCDI-CCDC (USA) #public - Childhood Cancer Data Initiative's Childhood Cancer Data Catalog.
- CCDI-MTP (USA) #public - Childhood Cancer Data Initiative's Molecular Targets Platform.
- CCHMC (USA) #Text - Cincinnati Children’s Hospital Medical Center ICD-9 radiology corpus.
- CDE (USA) #public - Crime Data Explorer.
- CDI (USA) #public - Chronic Disease Indicators.
- CDS (USA) #public|#private - Cancer Data Service.
- CDW (USA) #private - Veterans Health Administration’s (VHA) Corporate Data Warehouse.
- CEDCD (USA) #public - Cancer Epidemiology Descriptive Cohort Database.
- CEGS GRID (USA) #public - The 2016 CEGS N-GRID dataset.
- Census Tract (USA) #public - Social Determinants of Health by US Census Tract comprises social determinants of health (SDoH) constructs for each US census tract as defined by 2010 census tract boundaries.
- censusIncarceration (USA) - shows the number of people incarcerated across the United States, per the 2000, 2010, and 2020 Decennial Census.
- CépiDc (France) #public - Inserm CépiDc (Centre d'épidémiologie sur les causes médicales de décès).
- CFDE (USA) #public - Common Fund Data Ecosystem.
- CHARLS (China) #public - China Health and Retirement Longitudinal Study.
- Chatbot (USA) #private - Contains information about University Inquiry for ordinary purposes, including a list of intents with patterns, responses, tags, and context set.
- CHILDES (USA) #public - Child Language Data Exchange System.
- CHRR (USA) #public - County Health Rankings and Roadmaps dataset.
- CHSI (USA) #public - CDC's Community Health Status Indicators.
- CI5 #public - Cancer Incidence in Five Continents database.
- CIBMTR (USA) #public - The Center for International Blood and Marrow Transplant Research.
- CIL (USA) #public - Cell Image Library.
- CIMRD (USA) #private - California Independent Medical Review Dataset .
- CKD-SS (USA) #public - Chronic Kidney Disease Surveillance System.
- ClinicalTrials (USA) #public - ClinicalTrials.gov is a register and results database of clinical studies conducted worldwide and funded by public and private sources.
- ClinVar (USA) #public - A publicly available archive of reports on the associations between human variations and phenotypes, along with supporting data.
- CMDKP (USA) #public - Common Metabolic Diseases Knowledge Portal.
- CMS claims data (USA) #public|#private- Centers for Medicare and Medicaid Services.
- CMS Cost Reports (USA) #public - Comprises files with comprehensive cost center information from 2010-2021.
- CMS Hospital Compare (USA) #public - Displays hospital performance data in a consistent, unified manner to ensure the availability of credible information about the care delivered in the nation’s hospitals.
- CMS–Physician Compare (USA) #public - Provides useful information about the physicians and other healthcare professionals currently enrolled in Medicare.
- ColonoscopicImagingDatabases #public - Open-access publicly available colonoscopic imaging databases for artificial intelligence research.
- COMETS (USA) #public - Consortium of Metabolomics Studies.
- Connect (USA) #public - Connect for Cancer Prevention Study (“Connect”) is a prospective cohort of 200,000 adult patients aged 40-65 years.
- Copernicus (EU) #public|#private- The most ambitious Earth observation program headed by the European Commission (EC) and the European Space Agency (ESA).
- CORD-19 (USA) #public - COVID-19 Open Research Dataset Challenge dataset.
- CORD-19_corpus-Mining (USA) #public - Mining CORD-19 corpus for biomedical associations dataset captures associations between different entities in the provided Kaggle corpus.
- CoreNLP (USA) #public - CoreNLP is a comprehensive solution for natural language processing in Java.
- COSM (Sweden) #public - Cohort of Swedish Men dataset.
- COUGHVID-3 (USA) #private - expert-labeled cough dataset that can be applied to a plethora of cough audio classification tasks.
- COVID-19 vaccinations (USA) - COVID-19 Vaccinations in the United States.
- COVID-19_EA (USA) #public|#private - The COVID-19 Evidence Accelerator.
- COVID-19_Graphs (USA) #public - Contains a 4D sequence encoding of SARS Cov2 sequences.
- COVID-19_Translations #public - Contains translations of COVID-19 related documents.
- COVID-19_Tweets (USA) #public - Contains tweets with hashtags associated with Coronavirus
- COVID-19_Tweets_India (India) #public - Contains day-wise aggregated tweets from the onset of the outbreak through July 30, 2020.
- COVID-19_Xray #public - Contains manually drawn pixel-level lung segmentations, with and without COVID.
- CP-CHILD-A (China) - Assesses the health-related quality of life (HRQoL) of children with Cerebral Palsy.
- CP-CHILD-B (China) - Captures the caregivers' perspective on the HRQoL across various domains for children with moderate-severe CP.
- CPRD (UK) #public - Clinical Practice Research Datalink.
- CRE (USA) #public - Community Resilience Estimates.
- CRISP (USA) #private - Chesapeake Regional Information System for our Patients
- CRN (Norway) #public - Cancer Registry of Norway.
- CRUK (UK) - Cancer Research UK.
- CVC-ClinicVideoDB (Spain) #public - A collection of annotated colonoscopy videos used for research in computer-aided diagnosis (CAD) of colorectal cancer (CRC).
- CVC‐EndoSceneStill (Spain) #public - Developed to aid in research related to gastrointestinal endoscopy, particularly the study and development of computer vision algorithms for colorectal cancer detection.
- CVC-HDClassif (Spain) #public - Associated with medical imaging, particularly in the context of computer vision and classification tasks related to gastrointestinal (GI) health.
- CVC-PolypHD (Spain) #public - A high-definition image dataset primarily used in colonoscopy research for detecting and segmenting polyps.
- CVRG (USA) - The CardioVascular Research Grid.