00.Datasets03 (M R) - sporedata/researchdesigneR GitHub Wiki

Select datasets

This section focuses on a few select healthcare datasets that hold special value in patient-centered research:

  • MAA (USA) - Maps 104,000+ medical abbreviations and acronyms to 170,000+ different meanings.
  • Map the Meal Gap (USA) - Map the Meal Gap is an interactive map of food insecurity and child food security in the United States.
  • mapMECFS (USA) - mapMECFS is an interactive data portal that provides access to research results across multiple biological disciplines from studies focused on advancing our understanding of Myalgic Encephalomyelitis / Chronic Fatigue Syndrome (ME/CFS).
  • MBSAQIP (USA) - Metabolic and Bariatric Surgery Accreditation and Quality Improvement Program
  • MCBS (USA) - Medicare Current Beneficiary Survey is used to monitor and evaluate beneficiaries' health and health-care policies.
  • MeDAL (USA) #public - Curated for abbreviation disambiguation and designed for understanding natural language pre-training in the medical domain.
  • Medical_Transcriptions (USA) #public - Contains samples of medical transcriptions for different medical specialties.
  • MedicalNewsToday (USA) #public - Contains 2,000 approved NLP-related medical articles from @MedicalNewsToday.
  • Medicine-Graph (USA) #private - Offers a special collection of co-occurrence matrices that quantify the pairwise mentions of 3 million terms mapped onto 1 million clinical concepts.
  • MedNLI (USA) #private - A Natural Language Inference Dataset For The Clinical Domain.
  • MEPS (USA) - Medical Expenditure Panel Survey provides comprehensive data on the types of health care services American use, how frequently they use them, the cost of the services, and who pays for them.
  • MetaMap (USA) - MetaMap is a versatile and adaptable tool crafted to link biomedical texts to the UMLS Metathesaurus, enabling the identification of Metathesaurus concepts mentioned within the text.
  • MHOS (USA) - The Medicare Health Outcomes Survey gathers valid, clinically meaningful and reliable data to enhance the understanding of cancer patient and survivor health-related quality of life (HRQOL) of cancer patients and survivors in Medicare Advantage Organizations (MAOs). MHOS can also be connected to SEER-Medicare
  • MHS (USA) - The Military Health System contains records of all healthcare events paid for by the MHS, regardless of setting.
  • MI (USA) - The Monarch Initiative is a comprehensive knowledge graph and a suite of tools designed to support clinicians, researchers, and scientists.
  • MIDRC (USA) - The Medical Imaging and Data Resource Center was designed to develop a high-quality repository for medical images related to acute and long-term COVID-19 and associated clinical data.
  • MIMIC-III (USA) #private - Medical Information Mart for Intensive Care.
  • MIMIC-IV (USA) #private - Medical Information Mart for Intensive Care.
  • MIMIC-CXR (USA) #private - Medical Information Mart for Intensive Care Chest X-ray (MIMIC-CXR).
  • MoTrPAC (USA) #public - The Molecular Transducers of Physical Activity Consortium (MoTrPAC) Data Hub is a national research portal to access data generated by the MoTrPAC.
  • MPOG (USA) - Multicenter Perioperative Outcomes Group.
  • MVP (USA) - Million Veteran Program is one of the world's most extensive programs on genetics and health.
  • MW (USA) - The Metabolomics Workbench was created to enhance the United States' capabilities in the field of metabolomics.
  • Myers_Abortion_Facility_Database (USA) - provides comprehensive information on the dates of operation, identities, and locations of all publicly-identifiable abortion facilities in the United States from January 1, 2009, to June 1, 2021.
  • n2c2 NLP (USA) - Unstructured notes from the Research Patient Data Registry at Partners Healthcare (originally developed during the i2b2 project).
  • N3C (USA) #private - National COVID Cohort Collaborative (N3C) is a large national data repository designed to analyze patient-level data from multiple clinical centers to reveal patterns in COVID-19 patients.
  • NACC (USA) - The National Alzheimer’s Coordinating Center provides a valuable resource for both exploratory and explanatory Alzheimer's disease research.
  • NACDA (USA) - The National Archive of Computerized Data on Aging (NACDA) acquires, preserves, and shares data relevant to gerontological research.
  • NAHDAP (USA) - The National Addiction & HIV Data Archive Program is a NIDA-funded data archive program conceived to promote research on drug addiction and HIV infection.
  • NASA (USA) #public - National Aeronautics and Space Administration (NASA) is America's civil space program and the forerunner in space exploration.
  • NCATS-ODP (USA) - The NCATS OpenData Portal is a platform established to swiftly and transparently distribute screening data and assay details.
  • NCBI (USA) - National Center for Biotechnology Information.
  • NCCOR (USA) - National Collaborative on Childhood Obesity Research.
  • NCDB (USA) - National Cancer Database.
  • NCHHSTP_AtlasPlus (USA) - The National Center for HIV, Viral Hepatitis, STD, and TB Prevention (NCHHSTP) AtlasPlus provides instant access to over 15 years worth of the CDC's surveillance data on HIV, sexually transmitted diseases (STDs), tuberculosis (TB), viral hepatitis.
  • NCHS Data Linkage (USA) - NCHS Data Linkage Activities.
  • NCI-IND (USA) - The CIP (Cancer Imaging Program) IND Directory is a centralized resource designed to facilitate the sharing of IND information.
  • NCR (Ireland) - National Cancer Registry.
  • NCS (USA) - National Congregations Study (NCS). Zip code is a restricted variable.
  • NDA (USA) - The National Institute of Mental Health Data Archive (NDA) was established to support autism research but has since been developed to facilitate data sharing amongst the mental health and other research communities.
  • NDB (Japan) - The National Database of Health Insurance Claims and Specific Health Checkups of Japan is a comprehensive database of health insurance claims data under Japan’s National Health Insurance system.
  • NDC (USA) - The National Drug Code Database contains information on active and certified finished and unfinished drugs submitted to FDA in structured product labeling (SPL) electronic listing files by labelers.
  • NDEx (USA) - The Network Data Exchange operates as a digital commons where researchers can upload, exchange, and widely disseminate biological networks and pathway models.
  • NDI (USA) - National Death Index
  • NEI (USA) - The National Eye Institute (NEI) Data Commons.
  • NER_CRF_Medical (USA) #public - Connects medical communities with patients across the country
  • Neuro-QoL (USA) #public - The Quality of Life in Neurological Disorders (Neuro-QoL) Measure is a self-report of health-related quality of life (HRQOL) in 17 domains and sub-domains for adults and 11 for children with neurological disorders.
  • News_Title_Dataset_CSV (USA) #public - News Title dataset with 4 categories
  • NHANES (USA) - National Health and Nutrition Examination Survey
  • NHATS (USA) - National Health and Aging Trends Study
  • NHCDR (USA) - NINDS Human Cell and Data Repository
  • NHGRI-EBI (USA) - The NHGRI-EBI GWAS Catalog is a freely available and FAIR knowledgebase providing detailed, interoperable, standardized, and structured human genome-wide association study (GWAS) data.
  • NHIS-CCS (USA) - NHIS's annual Cancer Control Supplement focuses on issues pertaining to attitudes, knowledge, and practices of cancer-related health behaviors, screening, and risk assessment.
  • NHIS-NSC (South Korea) - NHIS-NSC
  • NHIS (USA) - National Health Interview Survey is unique for its ability to categorize health characteristics by a variety of demographic and socioeconomic characteristics.
  • NIAAA (USA) - National Institute on Alcohol Abuse and Alcoholism
  • NIAGADS (USA) - National Institute on Aging Genetics of Alzheimer's Disease Data Storage Site
  • NIDA (USA) - National Institute on Drug Abuse was designed to produce, store, and distribute clinical data and biomaterials for Research on the Genetics of Addiction.
  • NIDDK-CR (USA) - The Central Repository of the National Institute of Diabetes and Digestive and Kidney Diseases.
  • NIF (USA) - The Neuroscience Information Framework (NIF) boasts the most comprehensive searchable database of neuroscience data.
  • NIH-Toolbox (USA) - The NIH Toolbox includes over 80 stand-alone measures available in 30-minute batteries to assess Cognition, Emotion, Motor skills, and Sensation.
  • NITRC (USA) - The NeuroImaging Tools and Resources Collaboratory is an award-winning and free web-based resource that provides free and comprehensive data access to an ever-expanding scope of neuroinformatics software and data.
  • NLTK (USA) - The Natural Language Toolkit (NLTK) NLTK is a premier toolkit for creating Python applications that handle human language data.
  • NORDCAN (USA) - Association of the Nordic Cancer Registries.
  • NPI (USA) - The National Provider Identifier (NPI) facilitates effective electronic transmission of health information.
  • NSQIP (USA) - National Surgical Quality Improvement Program
  • NSRR (USA) - National Sleep Research Resource (NSRR) is focused on sharing large amounts of sleep data from clinical trials, multiple cohorts, and other data sources.
  • NTDB (USA) - National Trauma Data Bank
  • NUMI (USA) - The National Utilization Management Integration
  • OASIS (USA) #image - The Open Access Series of Imaging Studies (OASIS) is an initiative designed to provide the scientific community with open access to neuroimaging brain data sets.
  • ODC-SCI (USA) - The Open Data Commons for Spinal Cord Injury (ODC-SCI) is a cloud-based, community-driven repository designed to share, store, and publish spinal cord injury research data.
  • OHSUMED (USA) - A set of 348,566 references from MEDLINE consisting of titles and/or abstracts from approximately 270 medical journals over a five-year period.
  • ONR (USA) - The Office of Nutrition Research (ONR) at the National Institutes of Health (NIH) is dedicated to enhancing nutrition science with the goal of improving health and reducing the prevalence of diseases related to diet and nutritional disparities.
  • openICPSR (USA) - openICPSR is a self-publishing platform for behavioral, health science, and social research data.
  • OpenNeuro (USA) - OpenNeuro is a facilitator for data sharing and analysis, specifically focusing on raw data from EEG, iEEG, MEG, MRI, and PET modalities.
  • OpenPain (USA) - OpenPain is an open access data sharing platform focused on brain imaging studies of human pain.
  • PbEHDE (Taiwan) - Population-Based Electronic Health Data Environment.
  • PBM (USA) - The Pharmacy Benefits Management is a national database containing information about all prescriptions dispensed within the VHA systems beginning in 1999.
  • PBS (Australia) - Pharmaceutical Benefits Scheme is Australia's national drug subsidy program and comprises information about PBS scripts and payments, patients, prescribers, and dispensing pharmacies.
  • PCORNET (USA) - Patient-Centered Clinical Research Network
  • PDB (USA) - The Protein Data Bank aims to expand the frontiers of fundamental biology, biotechnology, energy, and health through open and sustainable access to the 3D structure, function, and evolution of biological macromolecules.
  • PDBP (USA) - The Parkinson's Disease Biomarker Program supports clinical and laboratory-based discovery projects designed to facilitate the identification of promising diagnostic and progression biomarkers for Parkinson's disease.
  • PDTS (USA) - The Pharmacy Data Transaction Service (PDTS) is an operational centralized database system that provides a comprehensive patient medication profile for each DoD beneficiary.
  • Pedianet (Italy) - Collects data from outpatient family paediatricians in Italy for clinical and epidemiological research
  • PeptideAtlas (USA) - PeptideAtlas is a multi-organism collection of peptides identified through extensive tandem mass spectrometry proteomics experiments.
  • PGS (USA) - The PGS Catalog is an open database of published polygenic scores (PGS) and the relevant metadata required for accurate application and evaluation.
  • PHARMO (The Netherlands) - Provides a unique opportunity to gain insight into the complete patient journey and healthcare.
  • PhonBank (USA) - An open database for studying early phonological development using the Phon program.
  • PhysioNet (USA) - A repository of freely available medical research data designed to conduct and catalyze biomedical research and education.
  • PLACES (USA) - Offers model-based, population-level analysis and community estimates of health metrics for all counties
  • PLCO (USA) #public - The Prostate, Lung, Colorectal and Ovarian (PLCO) Cancer Screening Trial
  • POC (USA) - The Patterns of Care (POC) initiative was initiated in 1987, with SEER cases serving as controls for a study examining the provision of state-of-the-art therapy.
  • Population Density Maps (USA) - Population Density Maps
  • PPCR (USA) - The Pediatric Proton/Photon Consortium Registry (PPCR) is the most comprehensive multi-institutional radiation-based pediatric patient registry.
  • PRECIS-2 (USA) - PRagmatic Explanatory Continuum Indicator Summary-2 (PRECIS-2) tool guides the design of RCTs.
  • Premier (USA) - Premier Healthcare Database
  • PRO-CTCAE (USA) - The Patient-Reported Outcomes version of the Common Terminology Criteria for Adverse Events.
  • PROMIS (USA) - Patient-Reported Outcomes Measurement Information System.
  • PROSPR (USA) - Population-based Research to Optimize the Screening PRocess (PROSPR) DataShare (PDS)
  • PS (USA) - Physician Surveys evaluate how physicians adopt new and traditional cancer control technologies.
  • PubChem (USA) #public - An open chemistry database at the NIH and the world's most extensive collection of freely accessible chemical information.
  • PubMed (USA) #public - Comprises over 29 million citations for biomedical literature from life science journals, MEDLINE, and online books.
  • PubMed_200k_RCT (USA) #public - Consists of about 200,000 abstracts of randomized controlled trials (RCTs), totaling 2.3 million sentences.
  • R4R (USA) #public - The Resources for Researchers (R4R) is a comprehensive listing of NHI-supported tools and services designed to aid cancer researchers.
  • Radiologist_Notes (USA) #private - Offers a semantic understanding of developing automated pipelines and terminologies for captioning medical conditions related to the lumbar spine.
  • RAI-MDS (USA) - The Resident Assessment Instrument-Minimum Data Set.
  • RAMQ (Canada) - Régie de l'assurance maladie du Québec (Quebec Health Insurance Board) contains data on medical services delivered to all Québec citizens, as well as prescription drugs delivered to persons enrolled by the RAMQ's Prescription Drug Insurance Plan.
  • RedditData (USA) #public - The Domain-specific RedditData: Medical and Finance dataset constitutes Reddit posts for Natural Language Summarization.
  • RHIhub (USA) - The Am I Rural? service can be used to help determine whether a specific location is considered rural based on various definitions of rural, including definitions that are used as eligibility criteria for federal programs.
  • Ru_DrugAddiction #private - Early Russian News Articles on Drug Addiction dataset.
  • Rural Hospital Closures (USA) - Provides data on Rural Hospital Closures since January 2005.
  • Russian_Voice - Russian Voice dataset contains over 2000 recordings put together by the Central Research Institute of General Speech Pathology of the Russian Academy of Medical Sciences.
⚠️ **GitHub.com Fallback** ⚠️