BCSC Risk estimation dataset - onetomapanalytics/Meta_Data GitHub Wiki
BCSC Risk estimation dataset
General description
- Database primary purpose - Provide data so investigators may explore the modification of associations, risk factors, or statistical issues such as the effect of data imputation for missing values or alternative estimation models.
- Overall data type - Health outcomes
- Dataset type - Cross-sectional
- Data source - Registry, survey
- Data level - Patient level
- Geographic location of the data collection sites - United States
- Sponsor, manager, or home institution - Breast Cancer Surveillance Consortium
- Date range - 1996 - 2002
- Clinical areas of interest - Breast Cancer
- Number of records - 2,392,998 screening mammograms (called the "index mammogram"). To reduce the size of the dataset, the data have been aggregated by the cross-classification of risk factors and outcomes with a count indicating the frequency of each combination, which reduces the dataset to 280,660 records
- Variables that are uniquely present in this dataset - Mammogram index of women who did not have a previous diagnosis of breast cancer and did not have any breast imaging in the nine months preceding the index screening mammogram. All women had undergone previous breast mammography in the prior five years (though not in the last nine months). Cancer registry and pathology data were linked to the mammography data, and incident breast cancer (invasive or ductal carcinoma in situ) within one year following the index screening mammogram were assessed.
- Database caveats and limitations - (1) Covariates (i.e., menopause, age group, race, ethnicity, BMI, age at first birth, number of first-degree relatives with breast cancer, previous breast procedure, surgical menopause, and current hormone therapy) are based on self-report at time of the index mammogram. Breast density is judged by the radiologist from the screening mammogram, and the result of the last mammogram prior to the index mammogram is based on recorded data. (2) There has been a recent change in the BI-RADS breast density definition that may lead to greater use of the more extreme values. (3) Post-menopausal includes all women who report their periods have stopped permanently, or who are on hormone replacement, or who are age 55 or greater. Women under age 45 who report they are post-menopausal are excluded from the dataset. (4) Pre-menopausal women are women under age 55 who report their periods have not stopped permanently. Unknown menopausal status includes women 35-54 for whom menopausal information was unknown. (5) The data contains a variable (i.e., "training data") indicating whether an observation was in the training data (75% random sample) or the validation data (remaining 25%). If using the entire dataset, this variable may be ignored. (6) If there was a positive family history in the first-degree relatives, but the number of relatives with breast cancer could not be determined, it was coded as "1". (7) The count variable must be used to obtain correct estimates. (8) While the sponsor believes data to be both correct and reliable, they assume there is always the possibility of error in collecting such diverse data. (9) The data in the mammography registries are updated annually. However, this dataset is static, and there is no plan to update this data.
Applicable methods
High-impact designs
-
Compare BCSC screening rates with other datasets, such as National Breast and Cervical Cancer Early Detection Program (NBCCEDP, United States) and the National Health Service Breast Screening Program (NHSBSP, United Kingdom) (1)
-
Evaluate biopsy rates and yield in the 90 days following screening (3)
Data dictionary
To access the BCSC Risk estimation Dataset dictionary, click here
Variable categories
- Patient demographics (e.g., age group, race, ethnicity)
- Patient overall characteristics (e.g., BMI group, menopausal status, age at first birth, number of first-degree relatives with breast cancer)
- Cancer-related data (e.g., BI-RADS breast density, hormone replacement therapy, diagnosis, previous breast procedure, surgical menopause, current hormone therapy)