SEER Surveillance, Epidemiology, and End Results program - onetomapanalytics/Meta_Data GitHub Wiki
SEER - Surveillance, Epidemiology, and End Results program
General description
- Database primary purpose - Source for cancer statistics, including incidence, prevalence, and survival data for cancer patients collected and reported over time.
- Overall data type - Health outcomes
- Dataset type - Cross-sectional (mean values for patient block)
- Data source - Registry
- Data level - Patient level (mean values for patient block)
- Geographic location of the data collection sites - United States
- Sponsor, manager, or home institution - National Cancer Institute (NCI), Division of Cancer Control and Population Sciences (DCCPS), Surveillance Research Program (SRP)
- Date range - 1975 - 2016
- Geolocation data - County and state
- Dates - Year and month of diagnosis, year of birth
- Longitudinal tracking - Track patients within hospitals and across hospitals (the patient ID in conjunction with the SEER registry uniquely identifies a patient)
- Clinical areas of interest - Cancer
- Variables that are uniquely present in this dataset - SEER is an authoritative source for cancer statistics in the U.S.
Applicable methods
- Association, such as logistic regression (1, 2), generalized linear models (3), hierarchical linear models (4)
- Cost-Effectiveness Analysis (5, 6)
- Machine learning (7, 8)
- Sensitivity analysis (9)
- Time-series (10)
- Univariate analysis (11, 12)
High-impact designs
-
Overall cancer statistics, such as general (13, 14), by type (15, 16, 17), by treatment and survivorship (18), by demographics (19)
-
Evaluate the effect of advanced treatments on population mortality (20)
-
Trends in the incidence, prevalence, and survival outcomes in cancer patients (21)
-
Methods for predicting survival and individual treatment recommendations (22)
-
Characterize specific-disease mortality risk for multiple cancer sites (23)
-
Evaluate treatment and survival outcomes disparities related to race (24, 25, 26), ethnicity (27, 28), gender (29), geographic location (30, 31), socioeconomic status (32, 33), education (31)
-
Cancer pain management (34)
-
Evaluate risk factors (35)
-
Enriching SEER by linking it to another dataset, such as the U.S. Census Bureau (36), National Lung Screening Trial (NLST) (5), CMS (5), Medicare (37, 38), Medicare Health Outcomes Survey (MHOS) (39), Medicare Consumer Assessment of Healthcare Providers and Systems (CAHPS) (40)
Data dictionary
To access the SEER STAT data dictionary, click here
Variable categories
- Patient demographics (e.g., age, sex, race, ethnicity, year of diagnosis, marital status, and geographic areas)
- Tumor characteristics (e.g., histologic type, size, behavior, grade, marker)
- Treatment (e.g., surgery)
- Follow-up
- Cancer type records and specific variables
- Death (i.e., cause-specific or other cause)
- Survival months
Linkage to other datasets
-
Linkages can be established with other U.S. datasets through SEER Stat software as follows:
- U.S. mortality (1969 - 2020): mortality data collected and maintained by the National Center for Health Statistics (NCHS)
- U.S. county population (1969 - 2020): interagency data, including U.S. Census Bureau's Population Estimates Program, NCHS, and NCI, for county population estimates by age, sex, race, and Hispanic origin
- U.S. census tract population (2006 - 2019): produced by Woods & Poole Economics, Inc. (W&P) with support from NCI, includes data for annual residential population estimates by age group, race/ethnicity, and gender
- Standard populations (millions) for age-adjusted data: age distributions used as weights to create age-adjusted statistics; standard millions are available for U.S., Canada, Europe (Scandinavian and EU), and "World" (Segi 1960 and WHO)
- County/tract attributes: links data collected at different time points to cancer cases/deaths, bith regardless or by matching the diagnosis/death year
- Expected survival life tables: general population life tables, which can be used to calculate expected survival, representing survival, and crude probability of death statistics. Two sets are available: SES/Geography/Race Annual, available from 1992, and the U.S. annual, available from 1970 with less detail by race
-
Linkages can also be established with CMS large databases at:
- SEER-Medicare: provides detailed information about Medicare beneficiaries with cancer
- SEER-MHOS: provides information about the health-related quality of life (HRQOL) of Medicare Advantage Organization enrollees
- SEER-CAHPS: a resource for quality of cancer care research