00.Datasets02 (D L) - sporedata/researchdesigneR GitHub Wiki

Select datasets

This section focuses on a few select healthcare datasets that hold special value in patient-centered research:

  • DABI (USA) - The Data Archive for the BRAIN Initiative (DABI) was conceived to rev the pace of discovery in the neurosciences.
  • DANDI (USA) - Distributed Archives for Neurophysiology Data Integration Archive or Dandiset.
  • DARR (USA) - The Dietary Assessment Research Resources (DARR) provides access to data collection tools, diet analysis tools, dietary collection resources, and food composition databases.
  • DARWIN (UK) - Data Analysis and Real World Interrogation Network (DARWIN EU) initiative.
  • DASH (USA) #public - Data and Specimen Hub (DASH) is a centralized database for researchers to access data and associated biospecimens.
  • Databrary (USA) #public - A video data library that is specialized for storing and sharing sensitive and identifiable research data.
  • Datasets through NBER (USA) - Comprise of a mix of publicly-available demographic, economic, and enterprise data for the specific requests of (NBER-affiliated) researchers.
  • DAVID (USA) - The Database for Annotation, Visualization and Integrated Discovery (DAVID).
  • DAWN (USA) - A nationally represented public health surveillance system that continuously monitors drug-related visits to hospital emergency departments.
  • dbGaP (USA) - The database of Genotypes and Phenotypes.
  • dbSNP (USA) - The Single Nucleotide Polymorphism Database.
  • dbVar (USA) - The database of human genomic structural Variation.
  • DCI (USA) - The Distressed Communities Index (DCI) analyzes financial health at the zip code level to give a thorough overview of the uneven distribution of American affluence
  • Delphi Epidata API - COVIDcast Epidata API (USA)- Provides data on the spread and impact of the COVID-19 pandemic across the United States, most of which is available at the county level and updated daily.
  • DEERS (USA) - The Defense Eligibility Enrollment Reporting System (DEERS) is the legal source of record that determines eligibility for care and access priority.
  • dGTEx (USA) - The Developmental Genotype-Tissue Expression (dGTEx) Project builds on the Genotype-Tissue Expression (GTEx).
  • DHDS (USA) - The Disability and Health Data System (DHDS) provides state-level data on adults with disabilities.
  • dkNET (USA) - The NIDDK Information Network (dkNET) aims to foster robust and reproducible science by keeping researchers informed about new tools, services, and mandates.
  • DLCA (Netherlands) - The Dutch Lung Cancer Audit.
  • DRM (USA) #public - The Disaster Response Messages (DRM) contains data related to disaster response and classified in 36 different categories.
  • Drugs+MedConditions (USA) #public - The Drugs Related to Medical Conditions dataset comprises data on drugs used for various medical conditions, including Acne, Cancer, and Heart Disease.
  • Drugs+SideEffects+MedConditions (USA) #public - The Disaster Response Messages (DRM) contains data related to disaster response and classified in 36 different categories.
  • DSDR (USA) - Data Sharing for Demographic Research.
  • DSID (USA) - The Dietary Supplement Ingredient Database (DSID) offers calculated estimates of ingredient concentrations in dietary supplements available in the US.
  • DSLD (USA) - The National Institutes of Health's Dietary Supplement Label Database.
  • Dyson (USA, Australia, Singapore) - Dyson indoor air quality is a unique dataset of 6 million indoor air quality sensors measuring particulate matter (PM2.5 and PM10), volatile organic compounds, and nitrogen dioxide.
  • DZchatbot (Algeria) #public - A medical assistant chatbot with 2,150 general medicine-related questions and answers written in Algerian Arabic.
  • Earthdata (USA) - Earth Data.
  • ECHO (USA) - Environmental influences on Child Health Outcomes.
  • ECIS (Europe) - European Cancer Information System.
  • EDG (USA) - The Environmental Dataset Gateway (EDG) gives users access to the Environmental Protection Agency's Open Data resources.
  • EHDEN (UK) - European Health Data & Evidence Network (EHDEN) Academy.
  • EHIF (Estonia) - The Estonian Health Insurance Fund contains claims data from Estonian healthcare providers including details on each medical contact, as well as prescription drugs, medical gadgets, and disability benefits.
  • EHIS (UK) - The European Health Interview Survey is a comprehensive, periodic assessment designed to gather information on the health, lifestyle determinants, and healthcare utilization of adults within the European Union (EU) Member States.
  • EHR (USA) - The Electronic Health Record (EHR) is connected to healthcare providers to update global patient health records in real-time.
  • eHOMD (USA) - The expanded Human Oral Microbiome Database offers a comprehensive curated collection of information about bacteria in the human oral cavity and aerodigestive tract.
  • ELSA (England) - English Longitudinal Study on Aging.
  • ENCODE (USA) #public - The Encyclopedia of DNA Elements (ENCODE) is a public research consortium designed to identify all functional elements in the human and mouse genomes.
  • English_language_Sentences (USA) #public - Contains sentence data in English gathered to perform analysis and model training. It is not related to any particular field but includes medical, news, and other domains.
  • EPA (USA) - The Environmental Protection Agency contains the mass quantity data on dioxin and dioxin-like compounds reported on the Toxics Release Inventory (TRI) Reporting Form R Schedule 1, with associated toxic equivalency data.
  • EPIC-Europe (UK) - The European Prospective Investigation into Cancer and Nutrition (EPIC-Europe) is a comprehensive, multi-national collaborative endeavor.
  • EQI (USA) - The Environmental Quality Index presents data in five domains: air, water, land, built, and sociodemographic environments to provide a county-by-county snapshot of overall environmental quality across the entire U.S.
  • ESMÉ (France) - The Épidémio-Stratégie Médico-Economique programme.
  • ETH Medical data science -- ricu (USA) - ICU data with R.
  • EUROCARE (Europe) - Europe's most extensive research effort into cancer patient survival and prevalence.
  • EUSOMA (Europe) - European Society of Breast Cancer Specialists.
  • exRNA (USA) - The exRNA Atlas includes exRNA profiles derived from various biofluids and conditions and stores data profiled from small RNA sequencing assays.
  • FARA (USA) - The Food Access Research Atlas uses measures of supermarket accessibility to create food access indicators for low-income and other census tracts.
  • FaceBase (USA) - FaceBase is a publicly accessible repository that offers a wide range of data types, all aimed at supporting research in dental, oral, and craniofacial areas and related fields.
  • FARS (USA) - The Fatality Analysis Reporting System (FARS) is a national survey that collects information on fatal road collisions.
  • FEA (USA) - The Food Environment Atlas (FEA) aims to compile data on food environment indicators to promote research into the factors that influence food choices and diet quality.
  • FITBIR (USA) - The Federal Interagency Traumatic Brain Injury Research Informatics System.
  • FluencyBank #private (USA) - FluencyBank is a shared database for the study of fluency development.
  • FoodData #public (USA) - FoodData Central offers a comprehensive data system that enhances nutrient profile data and connects to relevant agricultural and experimental studies.
  • GAAIN (USA) - The Global Alzheimer’s Association Interactive Network (GAAIN).
  • GBD - The Global Burden of Disease Study 2019 (GBD 2019).
  • Genbank (USA) #public - GenBank collects, maintains, and displays detailed nucleotide sequence information and related annotations from worldwide sources.
  • GENIA (USA) #public - Contains over 8000 sentences with labeled trigger words to detect events.
  • GEOS-Chem (USA) - GEOS-Chem is driven by assimilated meteorological input and observations to solve various atmospheric composition problems.
  • GePaRD (Germany) - German claims data from statutory health insurance providers.
  • GlyGen (USA) - Computational and Informatics Resources for Glycoscience.
  • GTEx (USA) - The Genotype-Tissue Expression (GTEx) Project aims to increase our understanding of how gene changes contribute to common human diseases to improve health care for future generations.
  • GTR (USA) - The Genetic Testing Registry is a freely accessible database of orderable clinical and research genetic test descriptions and the laboratories that provide them.
  • Gun Violence Archive (GVA) (USA) #Free - Freely available dataset on mass shootings.
  • HCMI (USA) - Human Cancer Models Initiative (HCMI) Searchable Catalog is a continually updated resource for querying the available next-generation models developed by the HCMI.
  • HCUP (USA)- The Healthcare Cost and Utilization Project is a group of healthcare databases and related software tools that includes the most extensive collection of hospital care data in the United States.
  • HDE (USA) #public - The Hackathon Disease Extraction: Saving lives with AI dataset.
  • HDRP (USA) #public - The Healthcare Delivery Research Program (HDRP) supports developing and maintaining a range of data resources and analytical tools designated for research.
  • Healthcare_NLP (USA) - Healthcare NLP: LLMs, Transformers, Datasets comprises medical data and models to promote data science in healthcare.
  • healthdatacsv (USA) - Provides users with access to data from the healthdata.gov catalog.
  • HealthMeasures (USA) - Comprises four comprehensive and precise measurement systems that assess mental, physical, and social health, life satisfaction, symptoms, along with cognitive, motor, and sensory function.
  • HEDIS (USA) - The Healthcare Effectiveness Data and Information Set (HEDIS).
  • HERC (USA) - The Health Economics Research Centre estimates the cost of inpatient and outpatient care.
  • HEROiC (USA) - The Health Economics Research on Cancer aims to enhance the availability of superior cancer care and lessen the economic impact of cancer in the United States.
  • HFA-DB (EU) - The European Health for All database.
  • HHEAR (USA) - The Human Health Exposure Analysis Resource (HHEAR) Data Center serves as a repository for epidemiological and biomarker information from CHEAR and HHEAR research studies.
  • HHS (USA) - The HHS Protect Public Data Hub.
  • HINTS (USA) - The Health Information National Trends Survey (HINTS) was designed to monitor changes in the rapidly evolving field of health communication.
  • HIV-DAU (USA) - NIAID-DOE LANL HIV Database and Analysis Unit.
  • House_MD_Transcripts (USA) #public - Contains 72286 rows and 2 columns of scraped data from the complete scripts of the Fox Medical Drama.
  • HPV-VU (USA) - HPV Vaccine Uptake's aimed at boosting HPV vaccination rates in areas with historically low uptake.
  • HRD (USA) - Health Risk Data.
  • HRS (USA) - Health and Retirement Study
  • HSE-PCRS (Ireland) - Health Service Executive—Primary Care Reimbursement Services Database (HSE-PCRS)
  • HSPW (USA) - The Human Salivary Proteome represents saliva samples from two dozen men and women of various ethnic backgrounds, accounting for a saliva catalog containing more than 1,000 proteins.
  • HSR (USA) - The Homeless Services Registry (HSR) contains a listing of veterans and their demographic data, a listing of the VHA homeless services the veteran has received.
  • HuBMAP (USA) - Human BioMolecular Atlas Program (HuBMAP) is an open, global atlas of the human body at the cellular level.
  • IAC (USA) - Immunization Action Coalition (IAC) - State Vaccine Mandate uses educational materials to increase immunization rates and prevent disease, thus promoting the delivery of safe and effective immunization services.
  • IAHDS (USA) - Interactive Atlas of Heart Disease and Stroke (IAHDS) is a useful resource for public health professionals, researchers, community leaders, and anyone who are interested in tracking CVD trends, prioritizing research, and organizing patient services.
  • IBM-Marketscan (USA) - IBM MarketScan Research Databases
  • ICPSR (USA) - Inter-university Consortium for Political and Social Research (ICPSR) offers leadership and training in data access, curation, and analysis methods for the social science research community.
  • ICS (Iceland) - The Icelandic Cancer Society.
  • ID_Entities_HealthCareData (USA) #public - a custom NER designed to get the list of diseases and their treatment.
  • IDC (USA) - Imaging Data Commons (IDC) is a cloud-based repository of publicly available cancer imaging data.
  • IDD (USA) - Curates drug lists from 44 countries' drug regulatory agencies and is mapped to the standardized drug vocabulary RxNorm. As a result, it is ideal for identifying lists of proprietary drug names, particularly of multi-national origin.
  • IDG (USA) - The Illuminating the Druggable Genome (IDG) program.
  • IEDB (USA) - The Immune Epitope Database (IEDB) and Analysis Resource is an open-access platform that compiles experimental findings on antibody and T-cell epitopes.
  • IKNL (Netherlands) - The Integrated Cancer Center of the Netherlands.
  • ImmuneSpace (USA) - A powerful engine designed to facilitate data exploration, management, and analysis, using state-of-the-art computational tools to enable integrative modeling of human immunological data.
  • ImmPort (USA) - The Immunology Database and Analysis Portal (ImmPort) is a public data-sharing repository designed to share data with the public.
  • IMRD (UK) - Anonymized electronic patient health records gathered from UK GP clinical systems.
  • INCLUDE (USA) #public - INCLUDE-DCC was designed to provide the world with the data access and analysis tools required to improve the health and quality of life for people with Down syndrome (DS).
  • InGef - Institut für angewandte Gesundheitsforschung Berlin GmbH (Institute for Applied Health Research Berlin Database) is a de-identified administrative database containing claims data from over 60 German statutory health insurances (SHIs).
  • INMET (Brazil) - Instituto Nacional de Meteorologia (National Institute of Meteorology)
  • IPUMS (USA) - for the United States and other countries.
  • ISD - Integrated Surface Database integrates hourly and synoptic surface observations from numerous sources
  • ISNA-CoronaNews (IRAN) #public - comprises articles published by the Iranian Students News Agency (ISNA) on the status of the transmission of the Coronavirus in Iran.
  • Kaggle-datasets (USA) - A list of 42 NLP-related medical datasets.
  • KF-DRC (USA) - The Gabriella Miller Kids First Data Resource Center (Kids First DRC) is a new, collaborative, pediatric research effort focused on accelerating research and understanding the genetic causes and linkages between childhood cancer and structural birth defects.
  • LAI (USA) - The Location Affordability Index (LAI) estimates household housing and transportation costs at the neighborhood level.
  • LASI (India) - Longitudinal Aging Study.
  • LATCH (USA) - The Local Area Transportation Characteristics for Households (LATCH) data provides statistics on the average weekday household person-miles traveled at census tract level.
  • LAUS (USA) - The Local Area Unemployment Statistics (LAUS) Map displays data on unemployment rates by month and 12-month net changes.
  • LCSDP (USA) - Contains information about patients who participated in the Lung Cancer Screening Demonstration Project.
  • LDbase (USA) - A Learning & Development Data Repository
  • LONI (USA) - LONI is a global leader in neuroscience data management and informatics solutions that facilitate data preservation, exploration, and sharing.
⚠️ **GitHub.com Fallback** ⚠️