Calculating_Rates - PHSKC-APDE/apde.chi.tools GitHub Wiki
Generating CHI Rate Estimates
Introduction
The apde.chi.tools
package provides tools for a standardized workflow
for preparing King County Community Health Indicators (CHI) estimates.
This vignette demonstrates how to use the package’s core functions to
generate estimates of rates that use population denominators. If
you need to calculate prevalences, proportions, or means, please refer
to the separate
vignette
for that purpose.
For the sake of simplicity, we’ll walk through the analysis pipeline for
calculating the adolescent birth rate as an example. However, estimation
of CHARS and death rates follows the same basic process, albeit with
some additional complexity that can be observed in the most recent code
in the CHI repository. To view a
complete list of documented functions available from the
apde.chi.tools
package, enter help(package = 'apde.chi.tools')
at
the R prompt.
The CHI standards are documented in SharePoint > Community Health Indicators > CHI-Vizes > CHI-Standards-TableauReady Output.xlsx. We’ll follow those standards throughout our analysis.
Finally, please remember that you can always get more information about
a specific function by accessing its help file, e.g.,
?chi_count_by_age
, ?chi_generate_tro_shell
, etc.
Load Packages
library(glue) # For creating dynamic strings
library(future) # For parallel processing
library(Microsoft365R) # For SharePoint connections
library(DBI) # For SQL Server connections
library(openxlsx) # For Excel output
library(rads) # For APDE analyses
library(data.table) # For wicked fast data manipulation
library(apde.chi.tools) # The package we're demonstrating
Analysis Configuration
First, let’s set up our configuration parameters. This step defines key variables and paths used throughout the analysis. Doing this once at the top of your code will help you maintain and adapt it for subsequent years.
# Specify the most recent year available in the raw data
latest_year <- 2023
# Specify a directory for saving the output in the CHI SharePoint site
sharepoint_output_dir <- paste0('JUNK_testing/', latest_year, '-update/')
Getting the Raw Data
Next, let’s retrieve the birth data we’ll be analyzing. The
rads::get_data_birth()
function pulls data from SQL, filtered to our
specifications.
[!NOTE]
What is
race3_hispanic
and why do I need it?Generally, CHI has two versions of composite race/ethnicity data. In
race4
, Hispanic is defined as a race and overwrites other OMB race categories. For example, if someone is Black and Hispanic, inrace4
, the person would be categorized as Hispanic. In contrast,race3
defines Hispanic as an ethnicity. This means a person can be any OMB race (e.g., AIAN, Asian, Black, etc.) AND be of Hispanic ethnicity. A single analytic ready variable cannot contain both race and ethnicity data since it can only have one value. Therefore, every time you want to processrace3
estimates, you must download and use both'race3'
and'race3_hispanic'
columns from the analytic ready data that you get with therads::get_data_*()
functions. Theapde.chi.tools
functions will “know” how to appropriately userace3
andrace3_hispanic
, but you must get both from the analytic ready data.
# Get birth data for the past 10 years
birthsdt <- get_data_birth(
cols = c("bigcities", "chi_geo_kc", "chi_geo_region", "chi_race_aic_asianother",
"chi_race_aic_chinese", "chi_race_aic_filipino",
"chi_race_aic_guam_or_chamorro", "chi_race_aic_hawaiian",
"chi_race_aic_his_cuban", "chi_race_aic_his_mexican",
"chi_race_aic_his_puerto_rican", "chi_race_aic_indian",
"chi_race_aic_japanese", "chi_race_aic_korean", "chi_race_aic_samoan",
"chi_race_aic_vietnamese", "edu_grp", "hra20_name", "mage5",
"pov200grp", "race3", "race3_hispanic", "race4",
"teen1517", 'chi_age', 'chi_year', 'creation_date'),
year = (latest_year-9):latest_year, # latest_year was defined above
kingco = TRUE)
Getting the Analysis Set
Each CHI data source has an analysis set - a compact summary of all
calculations needed for all of the CHI indicators. It may be saved in
the appropriate GitHub repo
sub-directory, along with your annual CHI code. However, typically, the
chi_generate_analysis_set()
function should be used to create a new
copy based on the latest year’s results in the production server. For
example, the following line of code creates an analysis set based on the
contents of [PHExtractStore].[APDE].[birth_results]
in
KCITSQLPRPHIP40
.
# Generate the analysis set for birth data
analysis_sets <- chi_generate_analysis_set('birth')
Curious what an analysis set looks like? Let’s take a peek at a few rows:
cat1 | cat1_varname | _kingcounty | _wastate | demgroups | crosstabs | trends | set | set_indicator_keys |
---|---|---|---|---|---|---|---|---|
Big cities | bigcities | NA | NA | x | x | x | 1 | breastfed, bw_low, bw_low_sing, bw_norm, bw_norm_sing, bw_vlow, bw_vlow_sing, csec_lowrisk, infmort, kotelchuck, pnc_lateno, preterm, smoking_dur |
Big cities | bigcities | NA | NA | x | x | x | 2 | teen1517 |
Neighborhood poverty | pov200grp | NA | NA | x | x | NA | 2 | teen1517 |
The analysis set contains important information about:
- the category variables to use in the analyses (
cat1
,cat1_varname
) - the types of analyses to perform (
_kingcounty
,_wastate
,demgroups
, etc.) - the indicators that share a common pattern of utilized category
variables and analysis types (
set
andset_indicator_keys
)
Generating Instructions
We use chi_generate_tro_shell()
to create a standardized set of
calculation instructions based on the analysis set created in the
previous step.
myinstructions <- chi_generate_tro_shell(
ph.analysis_set = analysis_sets, # from by chi_generate_analysis_set
end.year = latest_year, # latest year used in analyses
year.span = 5, # number of years in a single analysis period
trend.span = 3, # number of years in a single trend period
trend.periods = 10 # max number of trend time periods
)
As stated in the introduction, for this vignette we will focus on the
adolescent birth birth rate (teen1517
). This is because it is the only
true rate, with a population denominator, among all of the birth
indicators.
myinstructions <- myinstructions[indicator_key == 'teen1517']
Let’s examine the top six rows of our instructions:
indicator_key | tab | cat1 | cat1_varname | cat2 | cat2_varname | end | start |
---|---|---|---|---|---|---|---|
teen1517 | _kingcounty | King County | chi_geo_kc | NA | NA | 2023 | 2019 |
teen1517 | demgroups | Big cities | bigcities | NA | NA | 2023 | 2019 |
teen1517 | demgroups | Birthing person’s ethnicity | race3_hispanic | NA | NA | 2023 | 2019 |
teen1517 | demgroups | Birthing person’s race | race3 | NA | NA | 2023 | 2019 |
teen1517 | demgroups | Birthing person’s race/ethnicity | race4 | NA | NA | 2023 | 2019 |
teen1517 | demgroups | King County | chi_geo_kc | NA | NA | 2023 | 2019 |
Tidying Instructions
Now, let’s clean up our instructions to prevent illogical cross-tabulations like Seattle HRAs in East King County. Note that this is one place where the analyst will need to think carefully and deeply about illogical cross-tabulations that should be removed. You should assume that this step will be specific to each analysis.
# Remove crosstabs of big cities with HRAs or Regions
myinstructions <- myinstructions[is.na(cat2) |
!(cat1_varname == 'bigcities' &
cat2_varname %in% c('hra20_name', 'chi_geo_region'))]
# Remove crosstabs of HRAs or Regions with big cities
myinstructions <- myinstructions[is.na(cat2) |
!(cat2_varname == 'bigcities' &
cat1_varname %in% c('hra20_name', 'chi_geo_region'))]
Performing Calculations
Configure Parallel Processing
The future package allows you to use
parallel processing to speed up certain operations. Setting it up
correctly will allow you to run some of the apde.chi.tools
functions,
like chi_count_by_age()
, substantially faster. The example code below
designates:
-
the creation of parallel worker processes on all available cores except one (leaving that core for your main R session)
-
a 2 GB limit on how much data can be transferred to each worker process
As of May 2025, APDE’s performance laptops offer more available cores
than our virtual machines (VMs). Therefore, chi_calc()
is generally
faster on a performance laptop compared to a VM.
# Configures parallel processing using multiple sessions, reserving one core
future::plan(future::multisession, workers = future::availableCores() - 1)
# Sets the maximum memory (in GB) allowed per future process
future.GB = 2
options(future.globals.maxSize = future.GB * 1024^3)
chi_count_by_age()
Calculating the Numerator with As its name implies, chi_count_by_age()
creates a detailed breakdown
of counts by age for CHI data analysis. Summing the data by age is
critical for calculating
age-adjusted
rates after merging with population denominators. Like
chi_calc()
,
which is used for proportions and means, chi_count_by_age()
processes
data according to the instructions created by chi_generate_tro_shell()
and handles demographic groupings with special treatment for race and
ethnicity variables.
[!NOTE]
What’s going on under the hood?
The
chi_count_by_age()
function shares many validation steps withchi_calc()
, including checking inputs againstrads.data::misc_chi_byvars
to ensure they follow CHI encoding standards. After validation, the function usesrads::calc()
to generate counts by single-year age values (default = 0-100). Likechi_calc()
,chi_count_by_age()
applies special handling for race and ethnicity variables, particularly the relationship between race3 and race3_hispanic, to ensure consistent representation in output data. A key feature ofchi_count_by_age()
is its handling of missing combinations. When no data exists for a specific demographic-age group combination (e.g., a particular racial group at age 97), the function creates a complete reference table using Cartesian products to ensure all possible combinations are represented with counts of zero rather than being omitted entirely. This comprehensive approach enables proper age standardization in downstream analyses.The demographic grouping standards in
rads.data::misc_chi_byvars
can be traced back to SharePoint > Community Health Indicators > CHI-Vizes > CHI-Standards-TableauReady Output.xlsx. However, directly referencing a SharePoint file from within an R package creates a fragile dependency on external, user-specific infrastructure that can break portability, reproducibility, and automation.
# Get counts
mycounts <- chi_count_by_age(ph.data = birthsdt,
ph.instructions = myinstructions,
source_date = unique(birthsdt$creation_date))
# Limit counts to the ages of adolescents (counts for other rows are all zero anyway)
mycounts <- mycounts[chi_age %in% 15:17]
Here are a few rows from mycounts
that will show the structure and
contents of the chi_counts_by_age()
output:
indicator_key | year | tab | cat1 | cat1_varname | cat1_group | cat2 | cat2_varname | cat2_group | chi_age | count | source_date |
---|---|---|---|---|---|---|---|---|---|---|---|
teen1517 | 2019-2023 | _kingcounty | King County | chi_geo_kc | King County | NA | NA | NA | 15 | 60 | 2025-05-06 |
teen1517 | 2019-2023 | demgroups | Big cities | bigcities | Auburn city | NA | NA | NA | 16 | 12 | 2025-05-06 |
teen1517 | 2019-2023 | crosstabs | Big cities | bigcities | Auburn city | Birthing person’s ethnicity | race3 | Hispanic | 17 | 13 | 2025-05-06 |
teen1517 | 2021-2023 | trends | King County | chi_geo_kc | King County | NA | NA | NA | 15 | 37 | 2025-05-06 |
Calculating the Denominator
chi_generate_instructions_pop()
Use We pass mycounts
to chi_generate_instructions_pop()
to create
instructions for downloading corresponding population data.
Note that the povgeo
parameter is dependent upon how the data source
defines the 'pov200grp'
indicator in the ETL process. In birth data it
is defined by blocks, but in some other data it is defined by ZIP codes.
mypop.instructions <- chi_generate_instructions_pop(mycount.data = mycounts,
povgeo = 'blk')
Here are a few rows of mypop.instructions
that will show the structure
and contents of the chi_generate_instructions_pop()
output:
year | cat1 | cat1_varname | cat2 | cat2_varname | tab | start | stop | race_type | geo_type | group_by1 | group_by2 |
---|---|---|---|---|---|---|---|---|---|---|---|
2019-2023 | King County | chi_geo_kc | NA | NA | _kingcounty | 2019 | 2023 | race_eth | kc | NA | NA |
2019-2023 | Ethnicity | race3 | NA | NA | demgroups | 2019 | 2023 | race_eth | kc | race_eth | NA |
2019-2023 | Big cities | bigcities | Race | race3 | crosstabs | 2019 | 2023 | race | hra | NA | race |
2021-2023 | Big cities | bigcities | NA | NA | trends | 2021 | 2023 | race_eth | hra | NA | NA |
chi_get_proper_pop()
Use We can now pass mypop.instructions
to chi_get_proper_pop()
to
download and structure the population data.
mypop <- chi_get_proper_pop(pop.template = mypop.instructions,
pop.genders = 'f', # females are the denominator
pop.ages = 15:17, # limit to adolescents
is_chars = FALSE) # Not CHARS analysis, so FALSE
Now let’s peek at the population table to see what we’ve created::
chi_age | year | cat1 | cat1_varname | cat1_group | cat2 | cat2_varname | cat2_group | pop | tab |
---|---|---|---|---|---|---|---|---|---|
15 | 2019-2023 | Big cities | bigcities | Auburn city | Ethnicity | race3 | Hispanic | 692.6477 | crosstabs |
15 | 2019-2023 | Big cities | bigcities | Bellevue city | Ethnicity | race3 | Hispanic | 576.1097 | crosstabs |
15 | 2019-2023 | Big cities | bigcities | Federal Way city | Ethnicity | race3 | Hispanic | 780.5977 | crosstabs |
15 | 2019-2023 | Big cities | bigcities | Kent city | Ethnicity | race3 | Hispanic | 978.4379 | crosstabs |
15 | 2019-2023 | Big cities | bigcities | Kirkland city | Ethnicity | race3 | Hispanic | 363.1643 | crosstabs |
15 | 2019-2023 | Big cities | bigcities | Redmond city | Ethnicity | race3 | Hispanic | 238.9179 | crosstabs |
15 | 2019-2023 | Big cities | bigcities | Renton city | Ethnicity | race3 | Hispanic | 770.0707 | crosstabs |
15 | 2019-2023 | Big cities | bigcities | Seattle city | Ethnicity | race3 | Hispanic | 2143.1132 | crosstabs |
chi_get_proper_pop()
output
Tidy Race and ethnicity cat1
and cat2
values in mypop
need to be
brought into alignment with the values in the mycounts
. This is
specific to the quirks of the CHI birth standards.
mypop[cat1 == 'Ethnicity', cat1 := "Birthing person's ethnicity"]
mypop[cat1 == 'Race', cat1 := "Birthing person's race"]
mypop[cat1 == 'Race/Ethnicity', cat1 := "Birthing person's race/ethnicity"]
mypop[cat1 == 'Detailed Race/Ethnicity', cat1 := "Birthing person's detailed race/ethnicity"]
mypop[cat2 == 'Ethnicity', cat2 := "Birthing person's ethnicity"]
mypop[cat2 == 'Race', cat2 := "Birthing person's race"]
mypop[cat2 == 'Race/Ethnicity', cat2 := "Birthing person's race/ethnicity"]
mypop[cat2 == 'Detailed Race/Ethnicity', cat2 := "Birthing person's detailed race/ethnicity"]
Restoring Sequential Processing
Now that we’ve completed the steps that use parallel processing, it’s a
good idea to set future::plan(sequential)
. This returns R to its
normal, single-threaded mode and prevents surprises in subsequent code.
future::plan(future::sequential)
Merging counts and populations
Merge counts and populations based on CHI columns.
mycombo <- merge(mycounts,
mypop,
by = c("year", "tab", "cat1", "cat1_varname", "cat1_group",
"cat2", "cat2_varname", "cat2_group", "chi_age"),
all = T)
Tidy the combined data.
# There are zero 'Very high poverty areas' in North King County, so drop these cross-tabulations
droprows <- mycombo[(cat1_group == 'North' & cat2_group == 'Very high poverty areas') |
(cat2_group == 'North' & cat1_group == 'Very high poverty areas')]
mycombo <- fsetdiff(mycombo, droprows)
# We need a variable named 'age' for rads::age_standardize
setnames(mycombo, 'chi_age', 'age')
Due to the way that APDE decided to display race3
(Hispanic as
ethnicity) and race4
(Hispanic as race), we need to manipulate the
results to align them with CHI standards.
# For race4 categories (both cat1 and cat2)
mycombo[tab %in% c('demgroups', 'crosstabs') & cat1_varname == 'race4',
cat1 := "Birthing person's race"]
mycombo[tab %in% c('demgroups', 'crosstabs') & cat2_varname == 'race4',
cat2 := "Birthing person's race"]
# For race3 categories - default to race, override for Hispanic ethnicity
mycombo[tab %in% c('demgroups', 'crosstabs') & cat1_varname == 'race3',
cat1 := "Birthing person's race"]
mycombo[tab %in% c('demgroups', 'crosstabs') & cat1_varname == 'race3' & cat1_group == 'Hispanic',
cat1 := "Birthing person's ethnicity"]
mycombo[tab %in% c('demgroups', 'crosstabs') & cat2_varname == 'race3',
cat2 := "Birthing person's race"]
mycombo[tab %in% c('demgroups', 'crosstabs') & cat2_varname == 'race3' & cat2_group == 'Hispanic',
cat2 := "Birthing person's ethnicity"]
# Update trend data labels (both race3 and race4 at once)
mycombo[tab == 'trends' & cat1_varname %in% c('race3', 'race4'),
cat1 := "Birthing person's race/ethnicity"]
mycombo[tab == 'trends' & cat2_varname %in% c('race3', 'race4'),
cat2 := "Birthing person's race/ethnicity"]
Calculating Rates
We use
rads::age_standarize()
to generate the crude and adjusted rates.
myrates <- rads::age_standardize(ph.data = mycombo,
ref.popname = "2000 U.S. Std Population (11 age groups)",
collapse = T,
my.count = 'count',
my.pop = 'pop',
per = 1000, # adolescent birth rate is per 1,000, not 100,000
conf.level = 0.90,
group_by = c("indicator_key", "year", "tab", "cat1", "cat1_group",
"cat1_varname", "cat2", "cat2_group", "cat2_varname"))
Tidy Rates
Since rads::age_standardize()
is not specific to CHI, you will have to
massage the estimates a bit to align them with CHI standards.
[!NOTE]
Approximating the standard error (SE) and relative standard error (RSE)
The method APDE uses to calculate the confidence intervals for rates is that recommended by WA DOH, i.e., the Fay-Feuer method. You can find our implementation by typing
View(rads::adjust_direct)
in your R console. While our confidence intervals align perfectly with those from DOH (when given the same underlying data), this method does not have a corresponding SE. APDE consulted with a WA DOH biostatistician who said, “SE’s are not particularly useful for approximating the sampling distribution on the scale of adjusted rates.” We therefore calculate approximations of the SE and RSE, as suggested by various state health departments. This means that the SE will be internally inconsistent with the confidence intervals … c’est la vie!For reference, here are the approximations:
$$SE = {\text{adjusted rate} \over{\sqrt{{\text{number of cases}}}}}$$
$$RSE = {1 \over{\sqrt{\text{number of cases}}}}$$
# Drop the name of the reference population
myrates <- myrates[, reference_pop := NULL]
# Set constants
myrates[, data_source := 'birth']
myrates[, chi := 1]
myrates[, source_date := max(birthsdt$creation_date)]
myrates[, run_date := as.Date(Sys.Date(), "%Y%m%d")]
# Use CRUDE estimates because these rates are for an age stratum, not all ages
# In CHI, for rates we round results to 1 decimal place, SE to 2, and RSE to 3
myrates[, result := round2(crude.rate, 1)]
myrates[, lower_bound := round2(crude.lci, 1)]
myrates[, upper_bound := round2(crude.uci, 1)]
# Approximate the SE
myrates[count != 0, se := round2(crude.rate / sqrt(count), 2)]
myrates[count == 0 & (!is.na(pop) & pop != 0), se := 0]
# Approximate the RSE
myrates[count != 0, rse := round2(100 / sqrt(count), 3)]
# Set numerator and denominator
myrates[, numerator := count]
myrates[, denominator := round2(pop)]
# Apply primary and secondary suppression
myrates <- apde.chi.tools::chi_suppress_results(
ph.data = myrates,
suppress_range = c(1, 9),
secondary = T,
secondary_exclude = cat1_varname != 'race3')
# Only keep CHI columns
myrates <- myrates[, chi_get_cols(), with = F]
Updating Metadata
This step uses chi_generate_metadata()
to combine existing metadata
with our current estimates calculated above.
# Connect to the production server where we stored last year's metadata
db_chi_prod <- odbc::dbConnect(
odbc::odbc(),
Driver = "SQL Server",
Server = "KCITSQLPRPHIP40",
Database = "PHExtractStore")
# Retrieve existing metadata from the database
metadata_old <- setDT(odbc::dbGetQuery(
conn = db_chi_prod,
statement = glue::glue_sql("SELECT * FROM [PHExtractStore].[APDE].[birth_metadata]
WHERE indicator_key IN ({unique(myrates$indicator_key)*})", .con = db_chi_prod)))
# Generate updated metadata
mymetadata <- chi_generate_metadata(meta.old = metadata_old,
est.current = myrates)
This is what the metadata table looks like:
indicator_key | data_source | result_type | valence | latest_year | latest_year_result | latest_year_kc_pop | latest_year_count | map_type | unit | valid_years | chi | run_date |
---|---|---|---|---|---|---|---|---|---|---|---|---|
teen1517 | birth | rate | negative | 2023 | 2.3 | 38001 | 89 | region | rate per 1,000 females 15-17 | 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 | 1 | 2025-05-23 |
Quality Assurance
After calculation, we need to perform quality assurance checks to ensure
our estimates and metadata conform to CHI standards. chi_qa_tro()
checks whether the CHI estimates and metadata are properly formatted,
complete, and compliant with required standards, ensuring that column
names, values, and data types meet specified criteria. The function
returns 1 for pass and 0 for failures to meet CHI standards. The
function also provides warnings when estimates have unexpected patterns
that do not not necessarily violate CHI standards.
[!NOTE]
What’s going on under the hood?
chi_qa_tro
uses reference data from:
- Internal YAML configurations accessed via
chi_get_yaml()
- Standard column names from
chi_get_cols()
- Category validation using
rads.data::misc_chi_byvars
- Field type / data class validation using
rads::tsql_validate_field_types()
It performs numerous checks including proper rounding based on result type, absence of missing critical data, and data integrity rules such as ensuring confidence intervals are properly bounded (lower_bound ≤ result ≤ upper_bound) and that proportions fall within [0,1].
# Perform QA checks
qa_result <- chi_qa_tro(CHIestimates = myrates,
CHImetadata = mymetadata,
acs = F,
verbose = F)
🙂 Success! Your desired TSQL data types are suitable for your dataset.
🙂 Success! Your desired TSQL data types are suitable for your dataset.
print(qa_result)
[1] 1
Comparing to Previous Estimates
An important validation step is comparing our new estimates with
previous ones to identify ‘notable differences’. The notable differences
criteria were specified by Joie McCracken and are the same for each data
source. They are used both for human QA and for sharing high-level
summaries to accompany new releases of CHI estimates. If there is an
issue, the table will have notable == 1
, otherwise it will have
is.na(notable)
.
# Get previous _kingcounty and demgroups estimates from the database
rates_old <- setDT(DBI::dbGetQuery(
conn = db_chi_prod,
statement = glue::glue_sql("SELECT * FROM [PHExtractStore].[APDE].[birth_results]
WHERE tab IN ('_kingcounty', 'demgroups') AND chi = 1 AND
indicator_key IN ({unique(myrates$indicator_key)*})", .con = db_chi_prod)))
# Compare old and new estimates
mycomparison <- chi_compare_estimates(OLD = rates_old,
NEW = myrates,
OLD.year = paste0(latest_year-5, '-', latest_year-1),
NEW.year = paste0(latest_year-4, '-', latest_year),
META = mymetadata)
Let’s examine three rows from mycomparison
to see the table structure:
indicator_key | absolute.diff | relative.diff | result_type | tab | cat1 | cat1_group | cat1_varname | cat2 | cat2_group | cat2_varname | year.OLD | year.NEW | result.OLD | result.NEW | lower_bound.OLD | lower_bound.NEW | upper_bound.OLD | upper_bound.NEW | numerator.OLD | numerator.NEW | denominator.OLD | denominator.NEW | se.OLD | se.NEW | qa_type | notable |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
teen1517 | 0.7 | 17.9 | rate | demgroups | Birthing person’s race | AIAN | race3 | NA | NA | NA | 2018-2022 | 2019-2023 | 3.9 | 4.6 | 2.0 | 2.7 | 7.0 | 7.3 | 11 | 13 | 2815 | 2844 | 1.18 | 1.27 | relative | NA |
teen1517 | 0.6 | 15.0 | rate | demgroups | Birthing person’s race | Black | race3 | NA | NA | NA | 2018-2022 | 2019-2023 | 4.0 | 3.4 | 3.1 | 2.7 | 5.1 | 4.3 | 62 | 54 | 15547 | 15775 | 0.51 | 0.47 | relative | NA |
teen1517 | 0.5 | 13.5 | rate | demgroups | Birthing person’s race | Black | race4 | NA | NA | NA | 2018-2022 | 2019-2023 | 3.7 | 3.2 | 2.8 | 2.4 | 4.8 | 4.1 | 53 | 46 | 14346 | 14536 | 0.51 | 0.47 | relative | NA |
Export Analyses to SharePoint
Now that we’ve finished our analyses, let’s save our results to SharePoint.
Exporting Estimates & Metadata
# Connect to SharePoint
team <- get_team("Community Health Indicators")
drv <- team$get_drive("CHI-Vizes")
# Create an empty Excel workbook
wb <- openxlsx::createWorkbook()
# Add estimates worksheet and write data
openxlsx::addWorksheet(wb, "Estimates")
openxlsx::writeDataTable(wb, "Estimates",
x = myrates,
tableStyle = "TableStyleMedium9")
# Add metadata worksheet and write data
openxlsx::addWorksheet(wb, "Metadata")
openxlsx::writeDataTable(wb, "Metadata",
x = mymetadata,
tableStyle = "TableStyleMedium9")
# Save workbook to tempfile
tempy <- tempfile(fileext = ".xlsx")
openxlsx::saveWorkbook(wb,
file = tempy,
overwrite = TRUE)
# Upload to SharePoint
drv$upload_file(src = tempy,
dest = paste0(sharepoint_output_dir,
"Tableau_Ready_",
latest_year-4, "_", latest_year, ".xlsx"))
rm(tempy)
mycomparison
Exporting # Connect to SharePoint
team <- get_team("Community Health Indicators")
drv <- team$get_drive("CHI-Vizes")
# Create a temporary file to store mycomparison as an Excel file
tempy <- tempfile(fileext = ".xlsx")
# Write mycomparison to the temporary Excel file
openxlsx::write.xlsx(x = mycomparison,
file = tempy,
asTable = TRUE, # Ensure data is written as a table
overwrite = TRUE, # Allow overwriting the file if it exists
tableStyle = "TableStyleMedium9")
# Upload the Excel file to SharePoint
drv$upload_file(src = tempy,
dest = paste0(sharepoint_output_dir,
"qa_result_old_vs_new_",
latest_year-4, "_", latest_year, ".xlsx"))
rm(tempy)
Saving Estimates & Metadata to SQL Server
Finally, we need to save our results and metadata to the development SQL
Server using chi_update_sql()
. Later, once it passes human QA, it will
be transferred to the production SQL Server.
chi_update_sql(CHIestimates = myrates,
CHImetadata = mymetadata,
table_name = 'junk', # use actual data source name, e.g., 'birth', 'brfss', etc.
server = 'development',
replace_table = TRUE)
Delete Temporary Tables
In the process of creating this vignette, we created some temporary tables on SharePoint and SQL Server. Let’s delete these tables to keep our servers clean. Obviously, in your real analyses, you’d skip this step.
# Drop SharePoint directory
SharePoint_Parent <- strsplit(sharepoint_output_dir, "/")[1](/PHSKC-APDE/apde.chi.tools/wiki/1)[1]
fff = drv$get_item(SharePoint_Parent)$delete(confirm = FALSE)
# Drop the SQL Server tables
db_chi_dev <- odbc::dbConnect(
odbc::odbc(),
Driver = "SQL Server",
Server = "KCITSQLUATHIP40", # dev server
Database = "PHExtractStore")
DBI::dbExecute(conn = db_chi_dev, "DROP TABLE [PHExtractStore].[APDE_WIP].[junk_results]")
DBI::dbExecute(conn = db_chi_dev, "DROP TABLE [PHExtractStore].[APDE_WIP].[junk_metadata]")
Conclusion
Congratulations on completing the CHI rate analysis workflow! This workflow provides a standardized process for generating rate estimates, ensuring consistency and traceability. Follow these steps to streamline your analyses and maintain CHI standards across datasets.
New functions you used:
chi_generate_analysis_set()
to create an analysis set from last year’s production estimateschi_generate_tro_shell()
to generate calculation instructions based on the output ofchi_generate_analysis_set()
chi_count_by_age()
to generate a detailed breakdown of counts by age that will serve as the numerator for CHI rate analyseschi_generate_instructions_pop()
to create an instruction set for downloading population denominator datachi_get_proper_pop()
to generate a table of population denominators based on the output ofchi_generate_instructions_pop()
chi_generate_metadata()
to create an updated metadata table based on the output ofchi_calc()
and last year’s production metadatachi_qa_tro()
to perform quality assurance checks on the output ofchi_calc()
andchi_generate_metadata()
chi_compare_estimates()
to identify notable differences between the output ofchi_calc()
and previous estimateschi_update_sql()
to save estimates and metadata to SQL Server
If you encounter issues or believe you’ve found a bug, please submit a GitHub issue at Issues · PHSKC-APDE/apde.chi.tools.
Remember that this workflow is specifically for rate calculations. For calculations of proportions or means, please refer to the dedicated dedicated vignette.
– Updated May 23, 2025 (apde.chi.tools v2025.0.3)