HCUP SID Healthcare Cost and Utilization Project, State Inpatient Database - onetomapanalytics/Meta_Data GitHub Wiki

HCUP SID - Healthcare Cost and Utilization Project, State Inpatient Database

General description

  1. Database primary purpose - Provide inpatient discharge records from non-federal hospitals in individual participating states; since it encompasses all patients, SID provides a unique view of inpatient care in a defined market or state over time.
  2. Overall data type - Health outcomes
  3. Dataset type - Cross-sectional
  4. Data source - Claims
  5. Data level - Patient level
  6. Geographic location of the data collection sites - California, Florida, Iowa, Maryland, Massachusets, New York, and Washington
  7. Sponsor, manager, or home institution - Agency for Healthcare Research and Quality's (AHRQ)
  8. Date range - California: 2003 - 2011, Florida: 2004 - 2019, Iowa: 2009 - 2014, Maryland: 2009 - 2017; Massachusets: 2012 - 2014, New York: 2009 - 2014, North Carolina: 2010 - 2015, Washington: 2009 - 2015, and Wisconsin: 2013 - 2015
  9. Geolocation data - Hospital's state postal code, patient's state postal code, zip code, state/county FIPS code
  10. Dates - Admission and discharge hour, day, month, and year; days to event
  11. Hospital identifiers - HCUP-specific hospital identifier (Medicare Provider ID) and AHA hospital identifier (may vary by state and date organizations)
  12. Physicians identifiers - HCUP provides de-identified physician identifiers, which can be used to distinguish between physicians
  13. Longitudinal tracking - Track patients within and accross hospitals (up to one year), track hospitals
  14. Financial variables - Contains charge information and provides supplemental files containing cost-to-charge ratios
  15. Clinical areas of interest - all
  16. Variables that are uniquely present in this dataset - Includes the universe of the inpatient discharge abstracts from participating States that are translated into a standard format to permit multistate comparisons and analyses. Also, include a core set of clinical and nonclinical information on all visits, regardless of the expected payer, including but not limited to Medicare, Medicaid, private insurance, self-pay, or those billed as 'no charge'.
  17. Database caveats and limitations - Not all data elements are available from every state, and not in all the years. Also, patients cannot be longitudinally followed, except regarding readmission

Applicable methods

  1. Association methods, such as multivariable analysis (1, 2, 3), logistic regression models (4, 5, 6, 7), generalized linear mixed-effect models (8, 9, 10), Poisson regression (11)
  2. Descriptive statistics (12, 13)
  3. Exploratory anaysis (14)
  4. Interrupted time series (15
  5. Machine learning (16, 17)
  6. Time to event (18
  7. Propensity score (19, 12, 20, 21)

High impact designs

  • Enrichment of the SID dataset through linkage to other datasets, such as Medicare inpatient claims data (22), AHA (23, 24, 25), NY SPARCS (24), SEDD (26, CMS Hospital Compare (27

  • Evaluate quality and cost of surgery at safety-net hospitals (28, 20)

  • Assess the effects of different geographic measures of socioeconomic status and deprivation on surgical outcomes (30)

  • Develop a systematic approach to detect surgical access disparities (31)

  • Assess between-hospital variation in interventions (32)

  • Develop a scale to predict readmission rates (33)

  • Evaluate racial/ethnic disparities (34, 35, 36)

  • Assess socioeconomic disparities (37

  • Compare outcomes between states that implemented or not Medicaid expansion (38)

  • Develop a method to delineate hospital service areas (HSAs) and hospital referral regions (HRRs) (39)

  • Compare methods between cohorts over time (40)

  • Investigate clinical features, management strategies, and outcomes (41)

  • Evaluate outcomes related to Medicare's Nonpayment Program (42)

  • Assess factors associated with the length of readmission following a procedure (43)

Data dictionary

To access the HCUP SID data dictionary, click here

Variable categories

  1. Patient demographics [e.g., age, sex, race, ethnicity, language, residence indicator (i.e., homeless), marital status]
  2. Hospital discharge records (e.g., primary discharge diagnosis, dates of admission and discharge, LOS, patient discharge status etc)
  3. Charges (expected payer, total charges)
  4. Injury information (i.e., type and intent)
  5. Diagnosis codes
  6. Procedure codes

Linkage to other datasets

  • The SID can be linked to hospital-level data from the American Hospital Association's Annual Survey of Hospitals and county-level data from the Bureau of Health Professions' Area Resource File, except in those States that do not allow the release of hospital identifiers.

  • SID can also be linked to social determinants of health data using patient ZIP codes (e.g., Distressed Communities Index Data)