00.Datasets02 (D L) - sporedata/researchdesigneR GitHub Wiki
Select datasets
This section focuses on a few select healthcare datasets that hold special value in patient-centered research:
- DABI (USA) #private - Data Archive for the BRAIN Initiative.
- DANDI (USA) #public- Distributed Archives for Neurophysiology Data Integration Archive or Dandiset.
- DARR (USA) #public - Dietary Assessment Research Resources.
- DARWIN (UK) #public - Data Analysis and Real World Interrogation Network initiative.
- DASH (USA) #public - Data and Specimen Hub.
- Databrary (USA) #public - A video data library that is specialized for storing and sharing sensitive and identifiable research data.
- Datasets through NBER (USA) #public - Comprise of a mix of publicly available demographic, economic, and enterprise data for the specific requests of (NBER-affiliated) researchers.
- DAVID (USA) #public - Database for Annotation, Visualization, and Integrated Discovery.
- DAWN (USA) - A nationally represented public health surveillance system that continuously monitors drug-related visits to hospital emergency departments.
- dbGaP (USA) #public|#private - Database of Genotypes and Phenotypes.
- dbSNP (USA) #public - Database of Single Nucleotide Polymorphism.
- dbVar (USA) #public - Database of human genomic structural Variation.
- DCI (USA) #public - Distressed Communities Index.
- DDCH (Denmark) #private - The Danish Diet, Cancer and Health cohort.
- Delphi Epidata API - COVIDcast Epidata API (USA)- Provides data on the spread and impact of the COVID-19 pandemic across the United States, most of which is available at the county level and updated daily.
- DEERS (USA) #private - Defense Eligibility Enrollment Reporting System.
- dGTEx (USA) #public - Developmental Genotype-Tissue Expression.
- DHDS (USA) #public - Disability and Health Data System.
- DICA (The Netherlands) #private - Dutch Institute for Clinical Auditing.
- dkNET (USA) #public - NIDDK Information Network.
- DLCA (The Netherlands) #public - The Dutch Lung Cancer Audit.
- DLSA (The Netherlands) - The Dutch Lung Surgery Audit.
- DRM (USA) #public - Disaster Response Messages.
- Drugs+MedConditions (USA) #public - Drugs Related to Medical Conditions dataset.
- Drugs+SideEffects+MedConditions (USA) #public - Drugs, Side Effects and Medical Condition dataset.
- DSCA (The Netherlands) #private - Dutch Surgical Colorectal Audit.
- DSDR (USA) #public|#private - Data Sharing for Demographic Research.
- DSID (USA) #public - Dietary Supplement Ingredient Database.
- DSLD (USA) #public - NIH's Dietary Supplement Label Database.
- Dyson (Australia, Singapore, USA) #private - Dyson indoor air quality is a unique dataset of 6 million indoor air quality sensors measuring particulate matter (PM2.5 and PM10), volatile organic compounds, and nitrogen dioxide.
- DZchatbot (Algeria) #public - A medical assistant chatbot with 2,150 general medicine-related questions and answers written in Algerian Arabic.
- Earthdata (USA) #public - Earth Data.
- ECHO (USA) #public - Environmental influences on Child Health Outcomes.
- ECIS (Europe) #public - European Cancer Information System.
- EDD2020 (France, Italy) #public - Endoscopy Disease Detection and Segmentation (EDD2020).
- EDG (USA) #public|#private - Environmental Dataset Gateway.
- EHDEN (UK) #public - European Health Data & Evidence Network Academy.
- EHIF (Estonia) #public - Estonian Health Insurance Fund.
- EHIS (UK) #public - European Health Interview Survey.
- EHR (USA) #private - Electronic Health Record.
- eHOMD (USA) #public - Expanded Human Oral Microbiome Database.
- ELSA (England) #public - English Longitudinal Study on Aging.
- ENCODE (USA) #public - Encyclopedia of DNA Elements.
- English_language_Sentences (USA) #public - Contains sentence data in English gathered to perform analysis and model training. It is not related to any particular field but includes medical, news, and other domains.
- EPA (USA) #public - Environmental Protection Agency.
- EPIC-Europe (UK) #public - European Prospective Investigation into Cancer and Nutrition.
- EQI (USA) #public - Environmental Quality Index.
- ESMÉ (France) #public|#private - Epidemiological Strategy and Medical Economics (Épidémio-Stratégie Médico-Economique programme).
- ERS (Poland) - Endoscopic Reference System dataset.
- EstHIS (Estonia) #public|#private - Estonian Health Interview Survey.
- ETH Medical data science -- ricu (USA) - ICU data with R.
- EU-SILC (UK) #public - EU Statistics on Income and Living Conditions.
- EUROCARE (Europe) #public - EUROpean CAncer REgistry.
- EUSOMA (Europe) - European Society of Breast Cancer Specialists.
- exRNA (USA) - The exRNA Atlas includes exRNA profiles derived from various biofluids and conditions and stores data profiled from small RNA sequencing assays.
- FARA (USA) #public - Food Access Research Atlas.
- FaceBase (USA) #public - FaceBase is a publicly accessible repository that offers a wide range of data types, all aimed at supporting research in dental, oral, and craniofacial areas and related fields.
- FARS (USA) #public - Fatality Analysis Reporting System.
- FEA (USA) #public - Food Environment Atlas.
- FHR (Germany) #public - Federal Health Reporting.
- FITBIR (USA) #public|#private - Federal Interagency Traumatic Brain Injury Research Informatics System.
- FluencyBank #private (USA) - FluencyBank is a shared database for the study of fluency development.
- FoodData #public (USA) - FoodData Central offers a comprehensive data system that enhances nutrient profile data and connects to relevant agricultural and experimental studies.
- FSCC (Sweden) - Family Studies of Childhood Cancer - Genealogy Database.
- FARA (USA) #public - Food Access Research Atlas.
- FaceBase (USA) #public - FaceBase is a publicly accessible repository that offers a wide range of data types, all aimed at supporting research in dental, oral, and craniofacial areas and related fields.
- FARS (USA) #public - Fatality Analysis Reporting System.
- FEA (USA) #public - Food Environment Atlas.
- GAAIN (USA) #public - Global Alzheimer’s Association Interactive Network.
- GCO-CToday (Worldwide) #public - Global Cancer Observatory Cancer Today subsite.
- GCO-CTomorrow (Worldwide) #public - Global Cancer Observatory Cancer Tomorrow subsite.
- Genbank (USA) #public - GenBank collects, maintains, and displays detailed nucleotide sequence information and related annotations from worldwide sources.
- GENIA (USA) #public - Contains over 8000 sentences with labeled trigger words to detect events.
- GEOS-Chem (USA) #public - GEOS-Chem is driven by assimilated meteorological input and observations to solve various atmospheric composition problems.
- GePaRD (Germany) #private - German Pharmacoepidemiological Research Database.
- GlyGen (USA) #public - Computational and Informatics Resources for Glycoscience.
- GNOHIE (USA) #private - Greater New Orleans Health Information Exchange.
- GTEx (USA) #public - Genotype-Tissue Expression Project.
- GTR (USA) #public - Genetic Testing Registry.
- GUS (Poland) #public - Główny Urząd Statystyczny / Statistics Poland.
- GVA (USA) #public - Gun Violence Archive.
- HamlynCentreEndoscopicVid (UK) #public - A collection of endoscopic video data curated by the Hamlyn Centre for Robotic Surgery at Imperial College London.
- HCMI (USA) #public - Human Cancer Models Initiative.
- HCUP (USA) #public|#private - Healthcare Cost and Utilization Project.
- HDE (USA) #public - Hackathon Disease Extraction: Saving lives with AI dataset.
- HDRP (USA) #public - Healthcare Delivery Research Program.
- Healthcare_NLP (USA) - Healthcare NLP: LLMs, Transformers, Datasets comprises medical data and models to promote data science in healthcare.
- healthdatacsv (USA) - Provides users with access to data from the healthdata.gov catalog.
- HealthMeasures (USA) - Comprises four comprehensive and precise measurement systems that assess mental, physical, and social health, life satisfaction, symptoms, along with cognitive, motor, and sensory function.
- HEDIS (USA) #public - Healthcare Effectiveness Data and Information Set.
- HERC (USA) - Health Economics Research Centre.
- HEROiC (USA) - Health Economics Research on Cancer.
- HFA-DB (EU) #public - European Health for All database.
- HHEAR (USA) #public - Human Health Exposure Analysis Resource.
- HHS (USA) - HHS Protect Public Data Hub.
- HINTS (USA) #public|#private - Health Information National Trends Survey.
- HIS (Ireland) #public|#private - Healthy Ireland Survey.
- HIV-DAU (USA) #public - NIAID-DOE LANL HIV Database and Analysis Unit.
- House_MD_Transcripts (USA) #public - Contains 72286 rows and 2 columns of scraped data from the complete scripts of the Fox Medical Drama.
- HPV-VU (USA) #public - HPV Vaccine Uptake.
- HRD (USA) #public|#private - Health Risk Data.
- HRS (USA) #public|#private - Health and Retirement Study.
- HS2019 (Austria) - Health Survey 2019.
- HSE-PCRS (Ireland) #public|#private - Health Service Executive - Primary Care Reimbursement Services database.
- HSM (Austria) #private - Health Survey Microdata sets.
- HSPW (USA) - Human Salivary Proteome represents saliva samples from two dozen men and women of various ethnic backgrounds, accounting for a saliva catalog containing more than 1,000 proteins.
- HSR (USA) - Homeless Services Registry.
- HuBMAP (USA) #public|#private - Human BioMolecular Atlas Program.
- HyperKvasir (Norway) - A collection of medical images and videos used primarily in the field of gastroenterology for the study and development of computer-aided diagnostic (CAD) systems for endoscopic procedures.
- i2b2/UTHealth (USA) - The 2014 i2b2/UTHealth dataset.
- IAC (USA) #public - Immunization Action Coalition - State Vaccine Mandate.
- IAHDS (USA) #public - Interactive Atlas of Heart Disease and Stroke.
- IBM-Marketscan (USA) #private - IBM MarketScan Research databases.
- ICPSR (USA) #public|#private - Inter-university Consortium for Political and Social Research.
- ICS (Iceland) #public - Icelandic Cancer Society Cancer Registry.
- ID_Entities_HealthCareData (USA) #public - a custom NER designed to get the list of diseases and their treatment.
- IDC (USA) #public - Imaging Data Commons.
- IDD (USA) - Curates drug lists from 44 countries' drug regulatory agencies and is mapped to the standardized drug vocabulary RxNorm. As a result, it is ideal for identifying lists of proprietary drug names, particularly of multi-national origin.
- IDG (USA) #public - Illuminating the Druggable Genome program.
- IEDB (USA) #public - Immune Epitope Database.
- IHS (Ireland) #public - Irish Health Survey.
- ImmuneSpace (USA) - A powerful engine designed to facilitate data exploration, management, and analysis, using state-of-the-art computational tools to enable integrative modeling of human immunological data.
- ImmPort (USA) #public - Immunology Database and Analysis Portal.
- IMRD (UK) #private - IQVIA Medical Research Data.
- INCLASNS (Spain) #public - INdicadores CLAve del Sistema Nacional de Salud (Key Indicators of the National Health System).
- INCLUDE (USA) #public - INCLUDE-DCC was designed to provide the world with the data access and analysis tools required to improve the health and quality of life for people with Down syndrome (DS).
- INE (Spain) #public - Instituto Nacional de Estatística or National Institute of Statistics.
- InGef (Germany) #private - Institut für Angewandte Gesundheitsforschung Berlin GmbH (Institute for Applied Health Research Berlin Database).
- INMET (Brazil) - Instituto Nacional de Meteorologia (National Institute of Meteorology).
- iPCQ (UK) - Productivity Cost Questionnaire.
- IPUMS (USA) #public - Integrated Public Use Microdata Series.
- ISD (USA) #public - Integrated Surface Database.
- ISIT-UMR (France) - The ISIT-UMR Colonoscopy database.
- ISNA-CoronaNews (Iran) #public - comprises articles published by the Iranian Students News Agency (ISNA) on the status of the transmission of the Coronavirus in Iran.
- Kaggle-datasets (USA) - A list of 42 NLP-related medical datasets.
- KF-DRC (USA) - Gabriella Miller Kids First Data Resource Center.
- KUMC (USA) - University of Kansas Medical Center Colonoscopy dataset.
- LAI (USA) #public - Location Affordability Index.
- LASI (India) #public - Longitudinal Ageing Study in India.
- LATCH (USA) #public - Local Area Transportation Characteristics for Households dataset.
- LAUS (USA) #public - Local Area Unemployment Statistics program.
- LCSDP (USA) - Lung Cancer Screening Demonstration Project.
- LDbase (USA) #public|#private - Learning & Development Data Repository.
- LDPolypVideo (China) #public - A dataset and evaluation benchmark primarily used in the field of medical imaging, particularly for the detection, segmentation, and classification of polyps during colonoscopy procedures.
- LONI (USA) #private - Laboratory of Neuro Imaging database.
- LPHI (USA) #public - Louisiana Public Health Institute database.
- Luna (USA) - Luna improves connections between researchers and research participants and gives the latter direct control over how their data are used.