00.Datasets04 (S Z) - sporedata/researchdesigneR GitHub Wiki

Select datasets

This section focuses on a few select healthcare datasets that hold special value in patient-centered research:

  • SADR (USA) - The Standard Ambulatory Data Record (SADR) contains information on outpatient visits from 1997 forward.
  • SAGE (USA) - Study on global AGEing and adult health
  • SAMD (USA) #public - Sentiment Analysis for Medical Drugs dataset
  • SANAD #public - A collection of Arabic news articles
  • SAVEE (USA) #public - Contains annotated speech emotion data for emotion recognition systems.
  • scikit-learn (USA) - Scikit-learn is an open python library provides efficient tools on data mining and data analysis including methods on classification, clustering, and regression.
  • SCORCH (USA) #public - The Single Cell Opioid Responses in the Context of HIV (SCORCH) program was designed to create resources for detailed tissue analysis at the single-cell level.
  • SCRCR (USA) #public - The Swedish Colorectal Cancer Registry.
  • SDOH (USA) - The Social Determinants of Health Database comprise data corresponding to five key domains of the social determinants of health (SDoH).
  • SEDD (USA) #private - The State Emergency Department Databases collects discharge information for all visits to the emergency department that do not result in an admission and contains data on social determinants of health (SDoH).
  • SEER-CAHPS (USA) #private - Provides data on the experiences of Medicare beneficiaries with their care at different stages of the cancer care continuum.
  • SEER-Medicaid (USA) #private - A unique population-based resource that can be used for various epidemiological and health services research.
  • SEER-Medicare (USA) #private - Surveillance, Epidemiology and End Results - Medicare
  • SEER-MHOS (USA) #private - Designed to promote understanding of the health-related quality of life of cancer patients and survivors registered in Medicare Advantage Organizations.
  • SEER (USA) #private - Can be used to conduct studies on care patterns for persons with cancer before a cancer diagnosis, throughout initial diagnosis and treatment, and during long-term follow-up.
  • SenNet - The Cellular Senescence Network
  • SHARE - Survey of Health, Ageing, and Retirement in Europe
  • ShARe_corpus - Shared Annotated Resources (ShARe) disorders corpus consists of 531 deidentified clinical notes (a blend of discharge summaries and radiology reports) from the MIMIC II clinical database version 2.5.
  • SIDIAP (Spain) - The Information System for Research in Primary Care contains data from primary care patient records for use in biomedical research.
  • SIDR (USA) - The Standard Inpatient Data Record (SIDR) contains information about hospitalizations covering all individuals in the Army admitted to military medical treatment facilities (MTFs) and civilian hospitals.
  • SLORA (Slovenia) - The Cancer Registry of the Republic of Slovenia.
  • SNIRAM-SNDS-HDH (France) - The French national health insurance claims database.
  • SpanishTweets_Depression (Spain) #public - A curated selection of Spanish Tweets indicating symptoms of depression.
  • SPARC (USA) - The Stimulating Peripheral Activity to Relieve Conditions program was designed to transform our understanding of nerve-organ interactions to advance bioelectronic medicine toward treatments that change lives.
  • SPARCS (USA) - Statewide Planning and Research Cooperative System collects hospital information, including discharges, ambulatory surgery, and outpatient services visits, and patient-level detail on patient characteristics.
  • SQuAD_Translated_To_Persian (Iran) #public - Contains the English SQuAD (v1) columns with their corresponding translation.
  • SRA (USA) #public - Sequence Read Archive (SRA) stores alignment information and raw sequencing data to promote reproducibility and facilitate new discoveries through data analysis.
  • SRTR/UNOS (USA) #private - United Network for Organ Sharing provides accurate, clear, and timely information on the status of solid organ allocation and transplantation and the transplantation system in the United States.
  • SSA (USA) - Social Security Administration keeps track of Social Security and Supplemental Security Income beneficiaries, as well as applicants for Social Security numbers.
  • Stanford-AIMI (USA) - A collection of de-identified annotated medical imaging data to foster transparent and reproducible collaborative research.
  • STAR (USA) - Includes patient-level data on transplant recipients, deceased and live donors, and waiting list candidates going as far back as October 1, 1987.
  • StatFin (Finland) - National Statistical Institute of Finland
  • Stroke_MRIs (USA) #public - dataset of annotated clinical MRIs and metadata of patients with acute and subacute stroke.
  • STS (USA) - Society of Thoracic Surgeons National Database
  • SUS (Brazil) – Sistema Único de Saúde (Unified Health System) database.
  • SVI (USA) - Social Vulnerability Index determines the social vulnerability of each census tract using census data.
  • SVS-VQI (USA) - Vascular Quality Initiative from Society for Vascular Surgery
  • Symptom2Disease (USA) #private - comprises 1,200 datapoints of symptom descriptions.
  • T4SA (Italy) #public - Twitter for Sentiment Analysis.
  • TBPP (USA) #public - TB Portals Program is an international cooperative initiative focused on sharing and analyzing tuberculosis (TB) data to propel TB research forward.
  • TCIA (USA) #public - The Cancer Imaging Archive.
  • TEDI (USA) #public - The Institutional Purchased Care Data (TEDI).
  • TEDNI (USA) #public - The Non-Institutional Purchased Care Data (TEDNI).
  • The State of Senior Hunger (USA) - The State of Senior Hunger in America report series examines the demographics and characteristics of seniors who lack access to enough nutritious food.
  • THIN (Italy) #private - The Health Improvement Network is a non-invasive medical data collection and analysis project comprising anonymized longitudinal patient data representing approximately 6% of the UK population since 1994.
  • THYME_corpus (USA) - The Temporal Histories of Your Medical Event (THYME) corpus consists of 1,254 deidentified clinical notes from the Mayo Clinic.
  • TLMS (USA) #public - Tobacco Longitudinal Mortality Study consists of a database developed to study the effects of demographic and socio-economic characteristics on differentials in U.S. mortality rates, emphasizing tobacco use.
  • TMDS (USA) #public - The Theater Medical Data Store (TMDS) provides web-based access to service member information collected at theater-based medical treatment facilities (MTFs).
  • Tracking Network (USA) #public - The National Environmental Public Health Tracking Network has statistics and information on environments and risks, health effects, population health, and social determinants of health (SDoH).
  • Trans_Law (Spain) #public - Trans Law or Spanish Trans Law Twitter Dataset contains approximately 1.5 million tweets collected from Twitter pertaining to the Spanish Transsexuality Law
  • TrialShare (USA) - A transformative approach to data sharing that promotes clinical trial transparency.
  • TRICARE (USA) #private - Military Health System Tricare Encounter Data
  • TriNetX (USA) - Longitudinal real-world data from Epic
  • TUS-CPS (USA) #public - Tobacco Use Supplement to the Current Population Survey (TUS–CPS)
  • UFAL (USA) #private - UFAL Medical Corpus v. 1.0 is a collection of parallel corpora that aims at a more reliable machine translation of medical texts.
  • UK_Biobank (UK) #public - The UK Biobank comprises anonymized in-depth genetic and health information from over half a million UK participants.
  • UMLS (USA) #public - The Unified Medical Language System (UMLS) comprises a collection of files and software designed to unify various health and biomedical vocabularies and standards.
  • UniProt (USA) #public - The Universal Protein Resource (UniProt) is the world’s leading comprehensive, freely accessible, and high-quality resource of protein sequence and functional annotation data.
  • USRDS (USA) #private - United States Renal Data System database collects, analyzes, and distributes information about chronic kidney disease and end-stage renal disease (ESRD) in the United States.
  • VA-databases (USA) #private - List of databases from the Veterans Affairs Health Service, including the Corporate Data Warehouse (CDW)
  • VAHS-CDW (USA) #private - Provides a repeatable and large scale data warehouse for business management, clinical and administrative research, and healthcare system innovation.
  • VASQIP (USA) #private - Veterans Affairs Surgical Quality Improvement Program
  • VEuPathDB (USA) #public - The Eukaryotic Pathogen, Vector and Host Informatics Resource.
  • VHA-SP (USA) #public - The Veterans Health Administration Surgery Program (VHA-SP) contains information on VHA leadership and Program Office reference from a national and regional perspective.
  • Video_Transcript_Summarization (USA) #private - comprises nlp video transcripts from twenty-six (26) different categories.
  • VIIRS (USA) - Visible Infrared Imaging Radiometer Suite
  • VPF (USA) - The Vulnerable Population Footprint is a mapping and reporting instrument that identifies areas with significant proportions of people living in poverty and people without a high school education.
  • WIdO (Germany) - Wissenschaftliches Institut der AOK (AOK Scientific Institute).
  • WikiMed_Q+A (Iran) #public|#text- A Persian Q&A Dataset from Wikipedia about Medicine.
  • WLS (USA) #public - The Wisconsin Longitudinal Study (WLS) is an extensive, long-term social science study involving a random sample of 10,317 graduates from Wisconsin high schools in 1957.
  • WONDER (USA) #public - The Wide-ranging Data for Epidemiologic Research (WONDER) is a straightforward, menu-based system designed to distribute the CDC's information resources to both public health experts and the wider public.
  • WVS-Database (USA) - Includes data on socio-cultural and political change worldwide. Since 1981 a worldwide network of social scientists has conducted representative national surveys in over 90 countries.
  • ZfKD (Germany) - (Zentrum für Krebsregisterdaten - ZfKD) The German Centre for Cancer Registry Data (ZfKD).
⚠️ **GitHub.com Fallback** ⚠️