00.Datasets02 (D L) - sporedata/researchdesigneR GitHub Wiki
This section focuses on a few select healthcare datasets that hold special value in patient-centered research:
- DABI (USA) - The Data Archive for the BRAIN Initiative (DABI) was conceived to rev the pace of discovery in the neurosciences.
- DANDI (USA) - Distributed Archives for Neurophysiology Data Integration Archive or Dandiset.
- DARR (USA) - The Dietary Assessment Research Resources (DARR) provides access to data collection tools, diet analysis tools, dietary collection resources, and food composition databases.
- DARWIN (UK) - Data Analysis and Real World Interrogation Network (DARWIN EU) initiative.
- DASH (USA) #public - Data and Specimen Hub (DASH) is a centralized database for researchers to access data and associated biospecimens.
- Databrary (USA) #public - A video data library that is specialized for storing and sharing sensitive and identifiable research data.
- Datasets through NBER (USA) - Comprise of a mix of publicly-available demographic, economic, and enterprise data for the specific requests of (NBER-affiliated) researchers.
- DAVID (USA) - The Database for Annotation, Visualization and Integrated Discovery (DAVID).
- DAWN (USA) - A nationally represented public health surveillance system that continuously monitors drug-related visits to hospital emergency departments.
- dbGaP (USA) - The database of Genotypes and Phenotypes.
- dbSNP (USA) - The Single Nucleotide Polymorphism Database.
- dbVar (USA) - The database of human genomic structural Variation.
- DCI (USA) - The Distressed Communities Index (DCI) analyzes financial health at the zip code level to give a thorough overview of the uneven distribution of American affluence
- Delphi Epidata API - COVIDcast Epidata API (USA)- Provides data on the spread and impact of the COVID-19 pandemic across the United States, most of which is available at the county level and updated daily.
- DEERS (USA) - The Defense Eligibility Enrollment Reporting System (DEERS) is the legal source of record that determines eligibility for care and access priority.
- dGTEx (USA) - The Developmental Genotype-Tissue Expression (dGTEx) Project builds on the Genotype-Tissue Expression (GTEx).
- DHDS (USA) - The Disability and Health Data System (DHDS) provides state-level data on adults with disabilities.
- dkNET (USA) - The NIDDK Information Network (dkNET) aims to foster robust and reproducible science by keeping researchers informed about new tools, services, and mandates.
- DLCA (Netherlands) - The Dutch Lung Cancer Audit.
- DRM (USA) #public - The Disaster Response Messages (DRM) contains data related to disaster response and classified in 36 different categories.
- Drugs+MedConditions (USA) #public - The Drugs Related to Medical Conditions dataset comprises data on drugs used for various medical conditions, including Acne, Cancer, and Heart Disease.
- Drugs+SideEffects+MedConditions (USA) #public - The Disaster Response Messages (DRM) contains data related to disaster response and classified in 36 different categories.
- DSDR (USA) - Data Sharing for Demographic Research.
- DSID (USA) - The Dietary Supplement Ingredient Database (DSID) offers calculated estimates of ingredient concentrations in dietary supplements available in the US.
- DSLD (USA) - The National Institutes of Health's Dietary Supplement Label Database.
- Dyson (USA, Australia, Singapore) - Dyson indoor air quality is a unique dataset of 6 million indoor air quality sensors measuring particulate matter (PM2.5 and PM10), volatile organic compounds, and nitrogen dioxide.
- DZchatbot (Algeria) #public - A medical assistant chatbot with 2,150 general medicine-related questions and answers written in Algerian Arabic.
- Earthdata (USA) - Earth Data.
- ECHO (USA) - Environmental influences on Child Health Outcomes.
- ECIS (Europe) - European Cancer Information System.
- EDG (USA) - The Environmental Dataset Gateway (EDG) gives users access to the Environmental Protection Agency's Open Data resources.
- EHDEN (UK) - European Health Data & Evidence Network (EHDEN) Academy.
- EHIF (Estonia) - The Estonian Health Insurance Fund contains claims data from Estonian healthcare providers including details on each medical contact, as well as prescription drugs, medical gadgets, and disability benefits.
- EHIS (UK) - The European Health Interview Survey is a comprehensive, periodic assessment designed to gather information on the health, lifestyle determinants, and healthcare utilization of adults within the European Union (EU) Member States.
- EHR (USA) - The Electronic Health Record (EHR) is connected to healthcare providers to update global patient health records in real-time.
- eHOMD (USA) - The expanded Human Oral Microbiome Database offers a comprehensive curated collection of information about bacteria in the human oral cavity and aerodigestive tract.
- ELSA (England) - English Longitudinal Study on Aging.
- ENCODE (USA) #public - The Encyclopedia of DNA Elements (ENCODE) is a public research consortium designed to identify all functional elements in the human and mouse genomes.
- English_language_Sentences (USA) #public - Contains sentence data in English gathered to perform analysis and model training. It is not related to any particular field but includes medical, news, and other domains.
- EPA (USA) - The Environmental Protection Agency contains the mass quantity data on dioxin and dioxin-like compounds reported on the Toxics Release Inventory (TRI) Reporting Form R Schedule 1, with associated toxic equivalency data.
- EPIC-Europe (UK) - The European Prospective Investigation into Cancer and Nutrition (EPIC-Europe) is a comprehensive, multi-national collaborative endeavor.
- EQI (USA) - The Environmental Quality Index presents data in five domains: air, water, land, built, and sociodemographic environments to provide a county-by-county snapshot of overall environmental quality across the entire U.S.
- ESMÉ (France) - The Épidémio-Stratégie Médico-Economique programme.
- ETH Medical data science -- ricu (USA) - ICU data with R.
- EUROCARE (Europe) - Europe's most extensive research effort into cancer patient survival and prevalence.
- EUSOMA (Europe) - European Society of Breast Cancer Specialists.
- exRNA (USA) - The exRNA Atlas includes exRNA profiles derived from various biofluids and conditions and stores data profiled from small RNA sequencing assays.
- FARA (USA) - The Food Access Research Atlas uses measures of supermarket accessibility to create food access indicators for low-income and other census tracts.
- FaceBase (USA) - FaceBase is a publicly accessible repository that offers a wide range of data types, all aimed at supporting research in dental, oral, and craniofacial areas and related fields.
- FARS (USA) - The Fatality Analysis Reporting System (FARS) is a national survey that collects information on fatal road collisions.
- FEA (USA) - The Food Environment Atlas (FEA) aims to compile data on food environment indicators to promote research into the factors that influence food choices and diet quality.
- FITBIR (USA) - The Federal Interagency Traumatic Brain Injury Research Informatics System.
- FluencyBank #private (USA) - FluencyBank is a shared database for the study of fluency development.
- FoodData #public (USA) - FoodData Central offers a comprehensive data system that enhances nutrient profile data and connects to relevant agricultural and experimental studies.
- GAAIN (USA) - The Global Alzheimer’s Association Interactive Network (GAAIN).
- GBD - The Global Burden of Disease Study 2019 (GBD 2019).
- Genbank (USA) #public - GenBank collects, maintains, and displays detailed nucleotide sequence information and related annotations from worldwide sources.
- GENIA (USA) #public - Contains over 8000 sentences with labeled trigger words to detect events.
- GEOS-Chem (USA) - GEOS-Chem is driven by assimilated meteorological input and observations to solve various atmospheric composition problems.
- GePaRD (Germany) - German claims data from statutory health insurance providers.
- GlyGen (USA) - Computational and Informatics Resources for Glycoscience.
- GTEx (USA) - The Genotype-Tissue Expression (GTEx) Project aims to increase our understanding of how gene changes contribute to common human diseases to improve health care for future generations.
- GTR (USA) - The Genetic Testing Registry is a freely accessible database of orderable clinical and research genetic test descriptions and the laboratories that provide them.
- Gun Violence Archive (GVA) (USA) #Free - Freely available dataset on mass shootings.
- HCMI (USA) - Human Cancer Models Initiative (HCMI) Searchable Catalog is a continually updated resource for querying the available next-generation models developed by the HCMI.
- HCUP (USA)- The Healthcare Cost and Utilization Project is a group of healthcare databases and related software tools that includes the most extensive collection of hospital care data in the United States.
- HDE (USA) #public - The Hackathon Disease Extraction: Saving lives with AI dataset.
- HDRP (USA) #public - The Healthcare Delivery Research Program (HDRP) supports developing and maintaining a range of data resources and analytical tools designated for research.
- Healthcare_NLP (USA) - Healthcare NLP: LLMs, Transformers, Datasets comprises medical data and models to promote data science in healthcare.
- healthdatacsv (USA) - Provides users with access to data from the healthdata.gov catalog.
- HealthMeasures (USA) - Comprises four comprehensive and precise measurement systems that assess mental, physical, and social health, life satisfaction, symptoms, along with cognitive, motor, and sensory function.
- HEDIS (USA) - The Healthcare Effectiveness Data and Information Set (HEDIS).
- HERC (USA) - The Health Economics Research Centre estimates the cost of inpatient and outpatient care.
- HEROiC (USA) - The Health Economics Research on Cancer aims to enhance the availability of superior cancer care and lessen the economic impact of cancer in the United States.
- HFA-DB (EU) - The European Health for All database.
- HHEAR (USA) - The Human Health Exposure Analysis Resource (HHEAR) Data Center serves as a repository for epidemiological and biomarker information from CHEAR and HHEAR research studies.
- HHS (USA) - The HHS Protect Public Data Hub.
- HINTS (USA) - The Health Information National Trends Survey (HINTS) was designed to monitor changes in the rapidly evolving field of health communication.
- HIV-DAU (USA) - NIAID-DOE LANL HIV Database and Analysis Unit.
- House_MD_Transcripts (USA) #public - Contains 72286 rows and 2 columns of scraped data from the complete scripts of the Fox Medical Drama.
- HPV-VU (USA) - HPV Vaccine Uptake's aimed at boosting HPV vaccination rates in areas with historically low uptake.
- HRD (USA) - Health Risk Data.
- HRS (USA) - Health and Retirement Study
- HSE-PCRS (Ireland) - Health Service Executive—Primary Care Reimbursement Services Database (HSE-PCRS)
- HSPW (USA) - The Human Salivary Proteome represents saliva samples from two dozen men and women of various ethnic backgrounds, accounting for a saliva catalog containing more than 1,000 proteins.
- HSR (USA) - The Homeless Services Registry (HSR) contains a listing of veterans and their demographic data, a listing of the VHA homeless services the veteran has received.
- HuBMAP (USA) - Human BioMolecular Atlas Program (HuBMAP) is an open, global atlas of the human body at the cellular level.
- IAC (USA) - Immunization Action Coalition (IAC) - State Vaccine Mandate uses educational materials to increase immunization rates and prevent disease, thus promoting the delivery of safe and effective immunization services.
- IAHDS (USA) - Interactive Atlas of Heart Disease and Stroke (IAHDS) is a useful resource for public health professionals, researchers, community leaders, and anyone who are interested in tracking CVD trends, prioritizing research, and organizing patient services.
- IBM-Marketscan (USA) - IBM MarketScan Research Databases
- ICPSR (USA) - Inter-university Consortium for Political and Social Research (ICPSR) offers leadership and training in data access, curation, and analysis methods for the social science research community.
- ICS (Iceland) - The Icelandic Cancer Society.
- ID_Entities_HealthCareData (USA) #public - a custom NER designed to get the list of diseases and their treatment.
- IDC (USA) - Imaging Data Commons (IDC) is a cloud-based repository of publicly available cancer imaging data.
- IDD (USA) - Curates drug lists from 44 countries' drug regulatory agencies and is mapped to the standardized drug vocabulary RxNorm. As a result, it is ideal for identifying lists of proprietary drug names, particularly of multi-national origin.
- IDG (USA) - The Illuminating the Druggable Genome (IDG) program.
- IEDB (USA) - The Immune Epitope Database (IEDB) and Analysis Resource is an open-access platform that compiles experimental findings on antibody and T-cell epitopes.
- IKNL (Netherlands) - The Integrated Cancer Center of the Netherlands.
- ImmuneSpace (USA) - A powerful engine designed to facilitate data exploration, management, and analysis, using state-of-the-art computational tools to enable integrative modeling of human immunological data.
- ImmPort (USA) - The Immunology Database and Analysis Portal (ImmPort) is a public data-sharing repository designed to share data with the public.
- IMRD (UK) - Anonymized electronic patient health records gathered from UK GP clinical systems.
- INCLUDE (USA) #public - INCLUDE-DCC was designed to provide the world with the data access and analysis tools required to improve the health and quality of life for people with Down syndrome (DS).
- InGef - Institut für angewandte Gesundheitsforschung Berlin GmbH (Institute for Applied Health Research Berlin Database) is a de-identified administrative database containing claims data from over 60 German statutory health insurances (SHIs).
- INMET (Brazil) - Instituto Nacional de Meteorologia (National Institute of Meteorology)
- IPUMS (USA) - for the United States and other countries.
- ISD - Integrated Surface Database integrates hourly and synoptic surface observations from numerous sources
- ISNA-CoronaNews (IRAN) #public - comprises articles published by the Iranian Students News Agency (ISNA) on the status of the transmission of the Coronavirus in Iran.
- Kaggle-datasets (USA) - A list of 42 NLP-related medical datasets.
- KF-DRC (USA) - The Gabriella Miller Kids First Data Resource Center (Kids First DRC) is a new, collaborative, pediatric research effort focused on accelerating research and understanding the genetic causes and linkages between childhood cancer and structural birth defects.
- LAI (USA) - The Location Affordability Index (LAI) estimates household housing and transportation costs at the neighborhood level.
- LASI (India) - Longitudinal Aging Study.
- LATCH (USA) - The Local Area Transportation Characteristics for Households (LATCH) data provides statistics on the average weekday household person-miles traveled at census tract level.
- LAUS (USA) - The Local Area Unemployment Statistics (LAUS) Map displays data on unemployment rates by month and 12-month net changes.
- LCSDP (USA) - Contains information about patients who participated in the Lung Cancer Screening Demonstration Project.
- LDbase (USA) - A Learning & Development Data Repository
- LONI (USA) - LONI is a global leader in neuroscience data management and informatics solutions that facilitate data preservation, exploration, and sharing.