00.Datasets02 (D L) - sporedata/researchdesigneR GitHub Wiki

Select datasets

This section focuses on a few select healthcare datasets that hold special value in patient-centered research:

  • DABI (USA) #private - Data Archive for the BRAIN Initiative.
  • DANDI (USA) #public- Distributed Archives for Neurophysiology Data Integration Archive or Dandiset.
  • DARR (USA) #public - Dietary Assessment Research Resources.
  • DARWIN (UK) #public - Data Analysis and Real World Interrogation Network initiative.
  • DASH (USA) #public - Data and Specimen Hub.
  • Databrary (USA) #public - A video data library that is specialized for storing and sharing sensitive and identifiable research data.
  • Datasets through NBER (USA) #public - Comprise of a mix of publicly available demographic, economic, and enterprise data for the specific requests of (NBER-affiliated) researchers.
  • DAVID (USA) #public - Database for Annotation, Visualization, and Integrated Discovery.
  • DAWN (USA) - A nationally represented public health surveillance system that continuously monitors drug-related visits to hospital emergency departments.
  • dbGaP (USA) #public|#private - Database of Genotypes and Phenotypes.
  • dbSNP (USA) #public - Database of Single Nucleotide Polymorphism.
  • dbVar (USA) #public - Database of human genomic structural Variation.
  • DCI (USA) #public - Distressed Communities Index.
  • DDCH (Denmark) #private - The Danish Diet, Cancer and Health cohort.
  • Delphi Epidata API - COVIDcast Epidata API (USA)- Provides data on the spread and impact of the COVID-19 pandemic across the United States, most of which is available at the county level and updated daily.
  • DEERS (USA) #private - Defense Eligibility Enrollment Reporting System.
  • dGTEx (USA) #public - Developmental Genotype-Tissue Expression.
  • DHDS (USA) #public - Disability and Health Data System.
  • DICA (The Netherlands) #private - Dutch Institute for Clinical Auditing.
  • dkNET (USA) #public - NIDDK Information Network.
  • DLCA (The Netherlands) #public - The Dutch Lung Cancer Audit.
  • DLSA (The Netherlands) - The Dutch Lung Surgery Audit.
  • DRM (USA) #public - Disaster Response Messages.
  • Drugs+MedConditions (USA) #public - Drugs Related to Medical Conditions dataset.
  • Drugs+SideEffects+MedConditions (USA) #public - Drugs, Side Effects and Medical Condition dataset.
  • DSCA (The Netherlands) #private - Dutch Surgical Colorectal Audit.
  • DSDR (USA) #public|#private - Data Sharing for Demographic Research.
  • DSID (USA) #public - Dietary Supplement Ingredient Database.
  • DSLD (USA) #public - NIH's Dietary Supplement Label Database.
  • Dyson (Australia, Singapore, USA) #private - Dyson indoor air quality is a unique dataset of 6 million indoor air quality sensors measuring particulate matter (PM2.5 and PM10), volatile organic compounds, and nitrogen dioxide.
  • DZchatbot (Algeria) #public - A medical assistant chatbot with 2,150 general medicine-related questions and answers written in Algerian Arabic.
  • Earthdata (USA) #public - Earth Data.
  • ECHO (USA) #public - Environmental influences on Child Health Outcomes.
  • ECIS (Europe) #public - European Cancer Information System.
  • EDD2020 (France, Italy) #public - Endoscopy Disease Detection and Segmentation (EDD2020).
  • EDG (USA) #public|#private - Environmental Dataset Gateway.
  • EHDEN (UK) #public - European Health Data & Evidence Network Academy.
  • EHIF (Estonia) #public - Estonian Health Insurance Fund.
  • EHIS (UK) #public - European Health Interview Survey.
  • EHR (USA) #private - Electronic Health Record.
  • eHOMD (USA) #public - Expanded Human Oral Microbiome Database.
  • ELSA (England) #public - English Longitudinal Study on Aging.
  • ENCODE (USA) #public - Encyclopedia of DNA Elements.
  • English_language_Sentences (USA) #public - Contains sentence data in English gathered to perform analysis and model training. It is not related to any particular field but includes medical, news, and other domains.
  • EPA (USA) #public - Environmental Protection Agency.
  • EPIC-Europe (UK) #public - European Prospective Investigation into Cancer and Nutrition.
  • EQI (USA) #public - Environmental Quality Index.
  • ESMÉ (France) #public|#private - Epidemiological Strategy and Medical Economics (Épidémio-Stratégie Médico-Economique programme).
  • ERS (Poland) - Endoscopic Reference System dataset.
  • EstHIS (Estonia) #public|#private - Estonian Health Interview Survey.
  • ETH Medical data science -- ricu (USA) - ICU data with R.
  • EU-SILC (UK) #public - EU Statistics on Income and Living Conditions.
  • EUROCARE (Europe) #public - EUROpean CAncer REgistry.
  • EUSOMA (Europe) - European Society of Breast Cancer Specialists.
  • exRNA (USA) - The exRNA Atlas includes exRNA profiles derived from various biofluids and conditions and stores data profiled from small RNA sequencing assays.
  • FARA (USA) #public - Food Access Research Atlas.
  • FaceBase (USA) #public - FaceBase is a publicly accessible repository that offers a wide range of data types, all aimed at supporting research in dental, oral, and craniofacial areas and related fields.
  • FARS (USA) #public - Fatality Analysis Reporting System.
  • FEA (USA) #public - Food Environment Atlas.
  • FHR (Germany) #public - Federal Health Reporting.
  • FITBIR (USA) #public|#private - Federal Interagency Traumatic Brain Injury Research Informatics System.
  • FluencyBank #private (USA) - FluencyBank is a shared database for the study of fluency development.
  • FoodData #public (USA) - FoodData Central offers a comprehensive data system that enhances nutrient profile data and connects to relevant agricultural and experimental studies.
  • FSCC (Sweden) - Family Studies of Childhood Cancer - Genealogy Database.
  • FARA (USA) #public - Food Access Research Atlas.
  • FaceBase (USA) #public - FaceBase is a publicly accessible repository that offers a wide range of data types, all aimed at supporting research in dental, oral, and craniofacial areas and related fields.
  • FARS (USA) #public - Fatality Analysis Reporting System.
  • FEA (USA) #public - Food Environment Atlas.
  • GAAIN (USA) #public - Global Alzheimer’s Association Interactive Network.
  • GCO-CToday (Worldwide) #public - Global Cancer Observatory Cancer Today subsite.
  • GCO-CTomorrow (Worldwide) #public - Global Cancer Observatory Cancer Tomorrow subsite.
  • Genbank (USA) #public - GenBank collects, maintains, and displays detailed nucleotide sequence information and related annotations from worldwide sources.
  • GENIA (USA) #public - Contains over 8000 sentences with labeled trigger words to detect events.
  • GEOS-Chem (USA) #public - GEOS-Chem is driven by assimilated meteorological input and observations to solve various atmospheric composition problems.
  • GePaRD (Germany) #private - German Pharmacoepidemiological Research Database.
  • GlyGen (USA) #public - Computational and Informatics Resources for Glycoscience.
  • GNOHIE (USA) #private - Greater New Orleans Health Information Exchange.
  • GTEx (USA) #public - Genotype-Tissue Expression Project.
  • GTR (USA) #public - Genetic Testing Registry.
  • GUS (Poland) #public - Główny Urząd Statystyczny / Statistics Poland.
  • GVA (USA) #public - Gun Violence Archive.
  • HamlynCentreEndoscopicVid (UK) #public - A collection of endoscopic video data curated by the Hamlyn Centre for Robotic Surgery at Imperial College London.
  • HCMI (USA) #public - Human Cancer Models Initiative.
  • HCUP (USA) #public|#private - Healthcare Cost and Utilization Project.
  • HDE (USA) #public - Hackathon Disease Extraction: Saving lives with AI dataset.
  • HDRP (USA) #public - Healthcare Delivery Research Program.
  • Healthcare_NLP (USA) - Healthcare NLP: LLMs, Transformers, Datasets comprises medical data and models to promote data science in healthcare.
  • healthdatacsv (USA) - Provides users with access to data from the healthdata.gov catalog.
  • HealthMeasures (USA) - Comprises four comprehensive and precise measurement systems that assess mental, physical, and social health, life satisfaction, symptoms, along with cognitive, motor, and sensory function.
  • HEDIS (USA) #public - Healthcare Effectiveness Data and Information Set.
  • HERC (USA) - Health Economics Research Centre.
  • HEROiC (USA) - Health Economics Research on Cancer.
  • HFA-DB (EU) #public - European Health for All database.
  • HHEAR (USA) #public - Human Health Exposure Analysis Resource.
  • HHS (USA) - HHS Protect Public Data Hub.
  • HINTS (USA) #public|#private - Health Information National Trends Survey.
  • HIS (Ireland) #public|#private - Healthy Ireland Survey.
  • HIV-DAU (USA) #public - NIAID-DOE LANL HIV Database and Analysis Unit.
  • House_MD_Transcripts (USA) #public - Contains 72286 rows and 2 columns of scraped data from the complete scripts of the Fox Medical Drama.
  • HPV-VU (USA) #public - HPV Vaccine Uptake.
  • HRD (USA) #public|#private - Health Risk Data.
  • HRS (USA) #public|#private - Health and Retirement Study.
  • HS2019 (Austria) - Health Survey 2019.
  • HSE-PCRS (Ireland) #public|#private - Health Service Executive - Primary Care Reimbursement Services database.
  • HSM (Austria) #private - Health Survey Microdata sets.
  • HSPW (USA) - Human Salivary Proteome represents saliva samples from two dozen men and women of various ethnic backgrounds, accounting for a saliva catalog containing more than 1,000 proteins.
  • HSR (USA) - Homeless Services Registry.
  • HuBMAP (USA) #public|#private - Human BioMolecular Atlas Program.
  • HyperKvasir (Norway) - A collection of medical images and videos used primarily in the field of gastroenterology for the study and development of computer-aided diagnostic (CAD) systems for endoscopic procedures.
  • i2b2/UTHealth (USA) - The 2014 i2b2/UTHealth dataset.
  • IAC (USA) #public - Immunization Action Coalition - State Vaccine Mandate.
  • IAHDS (USA) #public - Interactive Atlas of Heart Disease and Stroke.
  • IBM-Marketscan (USA) #private - IBM MarketScan Research databases.
  • ICPSR (USA) #public|#private - Inter-university Consortium for Political and Social Research.
  • ICS (Iceland) #public - Icelandic Cancer Society Cancer Registry.
  • ID_Entities_HealthCareData (USA) #public - a custom NER designed to get the list of diseases and their treatment.
  • IDC (USA) #public - Imaging Data Commons.
  • IDD (USA) - Curates drug lists from 44 countries' drug regulatory agencies and is mapped to the standardized drug vocabulary RxNorm. As a result, it is ideal for identifying lists of proprietary drug names, particularly of multi-national origin.
  • IDG (USA) #public - Illuminating the Druggable Genome program.
  • IEDB (USA) #public - Immune Epitope Database.
  • IHS (Ireland) #public - Irish Health Survey.
  • ImmuneSpace (USA) - A powerful engine designed to facilitate data exploration, management, and analysis, using state-of-the-art computational tools to enable integrative modeling of human immunological data.
  • ImmPort (USA) #public - Immunology Database and Analysis Portal.
  • IMRD (UK) #private - IQVIA Medical Research Data.
  • INCLASNS (Spain) #public - INdicadores CLAve del Sistema Nacional de Salud (Key Indicators of the National Health System).
  • INCLUDE (USA) #public - INCLUDE-DCC was designed to provide the world with the data access and analysis tools required to improve the health and quality of life for people with Down syndrome (DS).
  • INE (Spain) #public - Instituto Nacional de Estatística or National Institute of Statistics.
  • InGef (Germany) #private - Institut für Angewandte Gesundheitsforschung Berlin GmbH (Institute for Applied Health Research Berlin Database).
  • INMET (Brazil) - Instituto Nacional de Meteorologia (National Institute of Meteorology).
  • iPCQ (UK) - Productivity Cost Questionnaire.
  • IPUMS (USA) #public - Integrated Public Use Microdata Series.
  • ISD (USA) #public - Integrated Surface Database.
  • ISIT-UMR (France) - The ISIT-UMR Colonoscopy database.
  • ISNA-CoronaNews (Iran) #public - comprises articles published by the Iranian Students News Agency (ISNA) on the status of the transmission of the Coronavirus in Iran.
  • Kaggle-datasets (USA) - A list of 42 NLP-related medical datasets.
  • KF-DRC (USA) - Gabriella Miller Kids First Data Resource Center.
  • KUMC (USA) - University of Kansas Medical Center Colonoscopy dataset.
  • LAI (USA) #public - Location Affordability Index.
  • LASI (India) #public - Longitudinal Ageing Study in India.
  • LATCH (USA) #public - Local Area Transportation Characteristics for Households dataset.
  • LAUS (USA) #public - Local Area Unemployment Statistics program.
  • LCSDP (USA) - Lung Cancer Screening Demonstration Project.
  • LDbase (USA) #public|#private - Learning & Development Data Repository.
  • LDPolypVideo (China) #public - A dataset and evaluation benchmark primarily used in the field of medical imaging, particularly for the detection, segmentation, and classification of polyps during colonoscopy procedures.
  • LONI (USA) #private - Laboratory of Neuro Imaging database.
  • LPHI (USA) #public - Louisiana Public Health Institute database.
  • Luna (USA) - Luna improves connections between researchers and research participants and gives the latter direct control over how their data are used.