Calculating_Rates - PHSKC-APDE/apde.chi.tools GitHub Wiki

Generating CHI Rate Estimates

Introduction

The apde.chi.tools package provides tools for a standardized workflow for preparing King County Community Health Indicators (CHI) estimates. This vignette demonstrates how to use the package’s core functions to generate estimates of rates that use population denominators. If you need to calculate prevalences, proportions, or means, please refer to the separate vignette for that purpose.

For the sake of simplicity, we’ll walk through the analysis pipeline for calculating the adolescent birth rate as an example. However, estimation of CHARS and death rates follows the same basic process, albeit with some additional complexity that can be observed in the most recent code in the CHI repository. To view a complete list of documented functions available from the apde.chi.tools package, enter help(package = 'apde.chi.tools') at the R prompt.

The CHI standards are documented in SharePoint > Community Health Indicators > CHI-Vizes > CHI-Standards-TableauReady Output.xlsx. We’ll follow those standards throughout our analysis.

Finally, please remember that you can always get more information about a specific function by accessing its help file, e.g., ?chi_count_by_age, ?chi_generate_tro_shell, etc.

Load Packages

library(glue)           # For creating dynamic strings
library(future)         # For parallel processing
library(Microsoft365R)  # For SharePoint connections
library(DBI)            # For SQL Server connections
library(openxlsx)       # For Excel output
library(rads)           # For APDE analyses
library(data.table)     # For wicked fast data manipulation
library(apde.chi.tools) # The package we're demonstrating

Analysis Configuration

First, let’s set up our configuration parameters. This step defines key variables and paths used throughout the analysis. Doing this once at the top of your code will help you maintain and adapt it for subsequent years.

# Specify the most recent year available in the raw data
latest_year <- 2023

# Specify a directory for saving the output in the CHI SharePoint site
sharepoint_output_dir <- paste0('JUNK_testing/', latest_year, '-update/')

Getting the Raw Data

Next, let’s retrieve the birth data we’ll be analyzing. The rads::get_data_birth() function pulls data from SQL, filtered to our specifications.

[!NOTE]

What israce3_hispanic and why do I need it?

Generally, CHI has two versions of composite race/ethnicity data. In race4, Hispanic is defined as a race and overwrites other OMB race categories. For example, if someone is Black and Hispanic, in race4, the person would be categorized as Hispanic. In contrast, race3 defines Hispanic as an ethnicity. This means a person can be any OMB race (e.g., AIAN, Asian, Black, etc.) AND be of Hispanic ethnicity. A single analytic ready variable cannot contain both race and ethnicity data since it can only have one value. Therefore, every time you want to process race3 estimates, you must download and use both 'race3' and 'race3_hispanic' columns from the analytic ready data that you get with the rads::get_data_*() functions. The apde.chi.tools functions will “know” how to appropriately use race3 and race3_hispanic, but you must get both from the analytic ready data.

# Get birth data for the past 10 years
birthsdt <- get_data_birth(
  cols = c("bigcities", "chi_geo_kc", "chi_geo_region", "chi_race_aic_asianother",
           "chi_race_aic_chinese", "chi_race_aic_filipino",
           "chi_race_aic_guam_or_chamorro", "chi_race_aic_hawaiian",
           "chi_race_aic_his_cuban", "chi_race_aic_his_mexican",
           "chi_race_aic_his_puerto_rican", "chi_race_aic_indian",
           "chi_race_aic_japanese", "chi_race_aic_korean", "chi_race_aic_samoan",
           "chi_race_aic_vietnamese", "edu_grp", "hra20_name", "mage5",
           "pov200grp", "race3", "race3_hispanic", "race4",
           "teen1517", 'chi_age', 'chi_year', 'creation_date'),
  year = (latest_year-9):latest_year, # latest_year was defined above
  kingco = TRUE)

Getting the Analysis Set

Each CHI data source has an analysis set - a compact summary of all calculations needed for all of the CHI indicators. It may be saved in the appropriate GitHub repo sub-directory, along with your annual CHI code. However, typically, the chi_generate_analysis_set() function should be used to create a new copy based on the latest year’s results in the production server. For example, the following line of code creates an analysis set based on the contents of [PHExtractStore].[APDE].[birth_results] in KCITSQLPRPHIP40.

# Generate the analysis set for birth data
analysis_sets <- chi_generate_analysis_set('birth')

Curious what an analysis set looks like? Let’s take a peek at a few rows:

cat1	cat1_varname	_kingcounty	_wastate	demgroups	crosstabs	trends	set	set_indicator_keys
Big cities	bigcities	NA	NA	x	x	x	1	breastfed, bw_low, bw_low_sing, bw_norm, bw_norm_sing, bw_vlow, bw_vlow_sing, csec_lowrisk, infmort, kotelchuck, pnc_lateno, preterm, smoking_dur
Big cities	bigcities	NA	NA	x	x	x	2	teen1517
Neighborhood poverty	pov200grp	NA	NA	x	x	NA	2	teen1517

The analysis set contains important information about:

the category variables to use in the analyses (cat1, cat1_varname)
the types of analyses to perform (_kingcounty, _wastate, demgroups, etc.)
the indicators that share a common pattern of utilized category variables and analysis types (set and set_indicator_keys)

Generating Instructions

We use chi_generate_tro_shell() to create a standardized set of calculation instructions based on the analysis set created in the previous step.

myinstructions <- chi_generate_tro_shell(
  ph.analysis_set = analysis_sets,  # from by chi_generate_analysis_set 
  end.year = latest_year,           # latest year used in analyses
  year.span = 5,                    # number of years in a single analysis period
  trend.span = 3,                   # number of years in a single trend period
  trend.periods = 10                # max number of trend time periods
)

As stated in the introduction, for this vignette we will focus on the adolescent birth birth rate (teen1517). This is because it is the only true rate, with a population denominator, among all of the birth indicators.

myinstructions <- myinstructions[indicator_key == 'teen1517']

Let’s examine the top six rows of our instructions:

indicator_key	tab	cat1	cat1_varname	cat2	cat2_varname	end	start
teen1517	_kingcounty	King County	chi_geo_kc	NA	NA	2023	2019
teen1517	demgroups	Big cities	bigcities	NA	NA	2023	2019
teen1517	demgroups	Birthing person’s ethnicity	race3_hispanic	NA	NA	2023	2019
teen1517	demgroups	Birthing person’s race	race3	NA	NA	2023	2019
teen1517	demgroups	Birthing person’s race/ethnicity	race4	NA	NA	2023	2019
teen1517	demgroups	King County	chi_geo_kc	NA	NA	2023	2019

Tidying Instructions

Now, let’s clean up our instructions to prevent illogical cross-tabulations like Seattle HRAs in East King County. Note that this is one place where the analyst will need to think carefully and deeply about illogical cross-tabulations that should be removed. You should assume that this step will be specific to each analysis.

# Remove crosstabs of big cities with HRAs or Regions
myinstructions <- myinstructions[is.na(cat2) | 
                                   !(cat1_varname == 'bigcities' &
                                     cat2_varname %in% c('hra20_name', 'chi_geo_region'))]

# Remove crosstabs of HRAs or Regions with big cities
myinstructions <- myinstructions[is.na(cat2) | 
                                   !(cat2_varname == 'bigcities' &
                                     cat1_varname %in% c('hra20_name', 'chi_geo_region'))]

Performing Calculations

Configure Parallel Processing

The future package allows you to use parallel processing to speed up certain operations. Setting it up correctly will allow you to run some of the apde.chi.tools functions, like chi_count_by_age(), substantially faster. The example code below designates:

the creation of parallel worker processes on all available cores except one (leaving that core for your main R session)
a 2 GB limit on how much data can be transferred to each worker process

As of May 2025, APDE’s performance laptops offer more available cores than our virtual machines (VMs). Therefore, chi_calc() is generally faster on a performance laptop compared to a VM.

# Configures parallel processing using multiple sessions, reserving one core
future::plan(future::multisession, workers = future::availableCores() - 1)

# Sets the maximum memory (in GB) allowed per future process
future.GB = 2 
options(future.globals.maxSize = future.GB * 1024^3)

Calculating the Numerator with `chi_count_by_age()`

As its name implies, chi_count_by_age() creates a detailed breakdown of counts by age for CHI data analysis. Summing the data by age is critical for calculating age-adjusted rates after merging with population denominators. Like chi_calc(), which is used for proportions and means, chi_count_by_age() processes data according to the instructions created by chi_generate_tro_shell() and handles demographic groupings with special treatment for race and ethnicity variables.

[!NOTE]

What’s going on under the hood?

The chi_count_by_age()function shares many validation steps withchi_calc(), including checking inputs againstrads.data::misc_chi_byvars to ensure they follow CHI encoding standards. After validation, the function uses rads::calc() to generate counts by single-year age values (default = 0-100). Like chi_calc(), chi_count_by_age() applies special handling for race and ethnicity variables, particularly the relationship between race3 and race3_hispanic, to ensure consistent representation in output data. A key feature of chi_count_by_age() is its handling of missing combinations. When no data exists for a specific demographic-age group combination (e.g., a particular racial group at age 97), the function creates a complete reference table using Cartesian products to ensure all possible combinations are represented with counts of zero rather than being omitted entirely. This comprehensive approach enables proper age standardization in downstream analyses.

The demographic grouping standards in rads.data::misc_chi_byvars can be traced back to SharePoint > Community Health Indicators > CHI-Vizes > CHI-Standards-TableauReady Output.xlsx. However, directly referencing a SharePoint file from within an R package creates a fragile dependency on external, user-specific infrastructure that can break portability, reproducibility, and automation.

# Get counts
mycounts <- chi_count_by_age(ph.data = birthsdt, 
                             ph.instructions = myinstructions, 
                             source_date = unique(birthsdt$creation_date))

# Limit counts to the ages of adolescents (counts for other rows are all zero anyway) 
mycounts <- mycounts[chi_age %in% 15:17]

Here are a few rows from mycounts that will show the structure and contents of the chi_counts_by_age() output:

indicator_key	year	tab	cat1	cat1_varname	cat1_group	cat2	cat2_varname	cat2_group	chi_age	count	source_date
teen1517	2019-2023	_kingcounty	King County	chi_geo_kc	King County	NA	NA	NA	15	60	2025-05-06
teen1517	2019-2023	demgroups	Big cities	bigcities	Auburn city	NA	NA	NA	16	12	2025-05-06
teen1517	2019-2023	crosstabs	Big cities	bigcities	Auburn city	Birthing person’s ethnicity	race3	Hispanic	17	13	2025-05-06
teen1517	2021-2023	trends	King County	chi_geo_kc	King County	NA	NA	NA	15	37	2025-05-06

Calculating the Denominator

Use `chi_generate_instructions_pop()`

We pass mycounts to chi_generate_instructions_pop() to create instructions for downloading corresponding population data.

Note that the povgeo parameter is dependent upon how the data source defines the 'pov200grp' indicator in the ETL process. In birth data it is defined by blocks, but in some other data it is defined by ZIP codes.

mypop.instructions <- chi_generate_instructions_pop(mycount.data = mycounts, 
                                                    povgeo = 'blk')

Here are a few rows of mypop.instructions that will show the structure and contents of the chi_generate_instructions_pop() output:

year	cat1	cat1_varname	cat2	cat2_varname	tab	start	stop	race_type	geo_type	group_by1	group_by2
2019-2023	King County	chi_geo_kc	NA	NA	_kingcounty	2019	2023	race_eth	kc	NA	NA
2019-2023	Ethnicity	race3	NA	NA	demgroups	2019	2023	race_eth	kc	race_eth	NA
2019-2023	Big cities	bigcities	Race	race3	crosstabs	2019	2023	race	hra	NA	race
2021-2023	Big cities	bigcities	NA	NA	trends	2021	2023	race_eth	hra	NA	NA

Use `chi_get_proper_pop()`

We can now pass mypop.instructions to chi_get_proper_pop() to download and structure the population data.

mypop <- chi_get_proper_pop(pop.template = mypop.instructions, 
                            pop.genders = 'f', # females are the denominator
                            pop.ages = 15:17,  # limit to adolescents
                            is_chars = FALSE)  # Not CHARS analysis, so FALSE

Now let’s peek at the population table to see what we’ve created::

chi_age	year	cat1	cat1_varname	cat1_group	cat2	cat2_varname	cat2_group	pop	tab
15	2019-2023	Big cities	bigcities	Auburn city	Ethnicity	race3	Hispanic	692.6477	crosstabs
15	2019-2023	Big cities	bigcities	Bellevue city	Ethnicity	race3	Hispanic	576.1097	crosstabs
15	2019-2023	Big cities	bigcities	Federal Way city	Ethnicity	race3	Hispanic	780.5977	crosstabs
15	2019-2023	Big cities	bigcities	Kent city	Ethnicity	race3	Hispanic	978.4379	crosstabs
15	2019-2023	Big cities	bigcities	Kirkland city	Ethnicity	race3	Hispanic	363.1643	crosstabs
15	2019-2023	Big cities	bigcities	Redmond city	Ethnicity	race3	Hispanic	238.9179	crosstabs
15	2019-2023	Big cities	bigcities	Renton city	Ethnicity	race3	Hispanic	770.0707	crosstabs
15	2019-2023	Big cities	bigcities	Seattle city	Ethnicity	race3	Hispanic	2143.1132	crosstabs

Tidy `chi_get_proper_pop()` output

Race and ethnicity cat1 and cat2 values in mypop need to be brought into alignment with the values in the mycounts. This is specific to the quirks of the CHI birth standards.

mypop[cat1 == 'Ethnicity', cat1 := "Birthing person's ethnicity"]
mypop[cat1 == 'Race', cat1 := "Birthing person's race"]
mypop[cat1 == 'Race/Ethnicity', cat1 := "Birthing person's race/ethnicity"]
mypop[cat1 == 'Detailed Race/Ethnicity', cat1 := "Birthing person's detailed race/ethnicity"]

mypop[cat2 == 'Ethnicity', cat2 := "Birthing person's ethnicity"]
mypop[cat2 == 'Race', cat2 := "Birthing person's race"]
mypop[cat2 == 'Race/Ethnicity', cat2 := "Birthing person's race/ethnicity"]
mypop[cat2 == 'Detailed Race/Ethnicity', cat2 := "Birthing person's detailed race/ethnicity"]

Restoring Sequential Processing

Now that we’ve completed the steps that use parallel processing, it’s a good idea to set future::plan(sequential). This returns R to its normal, single-threaded mode and prevents surprises in subsequent code.

future::plan(future::sequential)

Merging counts and populations

Merge counts and populations based on CHI columns.

mycombo <- merge(mycounts, 
                 mypop,
                 by = c("year", "tab", "cat1", "cat1_varname", "cat1_group",
                        "cat2", "cat2_varname", "cat2_group", "chi_age"),
                 all = T)

Tidy the combined data.

# There are zero 'Very high poverty areas' in North King County, so drop these cross-tabulations
droprows <- mycombo[(cat1_group == 'North' & cat2_group == 'Very high poverty areas') | 
                          (cat2_group == 'North' & cat1_group == 'Very high poverty areas')]
mycombo <- fsetdiff(mycombo, droprows) 

# We need a variable named 'age' for rads::age_standardize
setnames(mycombo, 'chi_age', 'age')

Due to the way that APDE decided to display race3 (Hispanic as ethnicity) and race4 (Hispanic as race), we need to manipulate the results to align them with CHI standards.

# For race4 categories (both cat1 and cat2)
mycombo[tab %in% c('demgroups', 'crosstabs') & cat1_varname == 'race4', 
            cat1 := "Birthing person's race"]
mycombo[tab %in% c('demgroups', 'crosstabs') & cat2_varname == 'race4', 
            cat2 := "Birthing person's race"]

# For race3 categories - default to race, override for Hispanic ethnicity
mycombo[tab %in% c('demgroups', 'crosstabs') & cat1_varname == 'race3', 
            cat1 := "Birthing person's race"]
mycombo[tab %in% c('demgroups', 'crosstabs') & cat1_varname == 'race3' & cat1_group == 'Hispanic', 
            cat1 := "Birthing person's ethnicity"]

mycombo[tab %in% c('demgroups', 'crosstabs') & cat2_varname == 'race3', 
            cat2 := "Birthing person's race"]
mycombo[tab %in% c('demgroups', 'crosstabs') & cat2_varname == 'race3' & cat2_group == 'Hispanic', 
            cat2 := "Birthing person's ethnicity"]

# Update trend data labels (both race3 and race4 at once)
mycombo[tab == 'trends' & cat1_varname %in% c('race3', 'race4'), 
            cat1 := "Birthing person's race/ethnicity"]
mycombo[tab == 'trends' & cat2_varname %in% c('race3', 'race4'), 
            cat2 := "Birthing person's race/ethnicity"]

Calculating Rates

We use rads::age_standarize() to generate the crude and adjusted rates.

myrates <- rads::age_standardize(ph.data = mycombo,
                                 ref.popname = "2000 U.S. Std Population (11 age groups)", 
                                 collapse = T,
                                 my.count = 'count',
                                 my.pop = 'pop',
                                 per = 1000, # adolescent birth rate is per 1,000, not 100,000
                                 conf.level = 0.90,
                                 group_by = c("indicator_key", "year", "tab", "cat1", "cat1_group", 
                                              "cat1_varname", "cat2", "cat2_group", "cat2_varname"))

Tidy Rates

Since rads::age_standardize() is not specific to CHI, you will have to massage the estimates a bit to align them with CHI standards.

[!NOTE]

Approximating the standard error (SE) and relative standard error (RSE)

The method APDE uses to calculate the confidence intervals for rates is that recommended by WA DOH, i.e., the Fay-Feuer method. You can find our implementation by typing View(rads::adjust_direct) in your R console. While our confidence intervals align perfectly with those from DOH (when given the same underlying data), this method does not have a corresponding SE. APDE consulted with a WA DOH biostatistician who said, “SE’s are not particularly useful for approximating the sampling distribution on the scale of adjusted rates.” We therefore calculate approximations of the SE and RSE, as suggested by various state health departments. This means that the SE will be internally inconsistent with the confidence intervals … c’est la vie!

For reference, here are the approximations:

$$SE = {\text{adjusted rate} \over{\sqrt{{\text{number of cases}}}}}$$

$$RSE = {1 \over{\sqrt{\text{number of cases}}}}$$

# Drop the name of the reference population
myrates <- myrates[, reference_pop := NULL]

# Set constants
myrates[, data_source := 'birth']
myrates[, chi := 1]
myrates[, source_date := max(birthsdt$creation_date)]
myrates[, run_date := as.Date(Sys.Date(), "%Y%m%d")]

# Use CRUDE estimates because these rates are for an age stratum, not all ages
# In CHI, for rates we round results to 1 decimal place, SE to 2, and RSE to 3
myrates[, result := round2(crude.rate, 1)]
myrates[, lower_bound := round2(crude.lci, 1)]
myrates[, upper_bound := round2(crude.uci, 1)]

# Approximate the SE
myrates[count != 0, se := round2(crude.rate / sqrt(count), 2)] 
myrates[count == 0 & (!is.na(pop) & pop != 0), se := 0] 

# Approximate the RSE
myrates[count != 0, rse := round2(100 / sqrt(count), 3)] 

# Set numerator and denominator
myrates[, numerator := count]
myrates[, denominator := round2(pop)]

# Apply primary and secondary suppression
myrates <- apde.chi.tools::chi_suppress_results(
  ph.data = myrates,
  suppress_range = c(1, 9),
  secondary = T,
  secondary_exclude = cat1_varname != 'race3') 

# Only keep CHI columns
myrates <- myrates[, chi_get_cols(), with = F]

Updating Metadata

This step uses chi_generate_metadata() to combine existing metadata with our current estimates calculated above.

# Connect to the production server where we stored last year's metadata
db_chi_prod <- odbc::dbConnect(
  odbc::odbc(),
  Driver = "SQL Server",
  Server = "KCITSQLPRPHIP40", 
  Database = "PHExtractStore")

# Retrieve existing metadata from the database
metadata_old <- setDT(odbc::dbGetQuery(
  conn = db_chi_prod,
  statement = glue::glue_sql("SELECT * FROM [PHExtractStore].[APDE].[birth_metadata]
      WHERE indicator_key IN ({unique(myrates$indicator_key)*})", .con = db_chi_prod)))

# Generate updated metadata
mymetadata <- chi_generate_metadata(meta.old = metadata_old,
                                    est.current = myrates)

This is what the metadata table looks like:

indicator_key	data_source	result_type	valence	latest_year	latest_year_result	latest_year_kc_pop	latest_year_count	map_type	unit	valid_years	chi	run_date
teen1517	birth	rate	negative	2023	2.3	38001	89	region	rate per 1,000 females 15-17	2014 2015 2016 2017 2018 2019 2020 2021 2022 2023	1	2025-05-23

Quality Assurance

After calculation, we need to perform quality assurance checks to ensure our estimates and metadata conform to CHI standards. chi_qa_tro() checks whether the CHI estimates and metadata are properly formatted, complete, and compliant with required standards, ensuring that column names, values, and data types meet specified criteria. The function returns 1 for pass and 0 for failures to meet CHI standards. The function also provides warnings when estimates have unexpected patterns that do not not necessarily violate CHI standards.

[!NOTE]

What’s going on under the hood?

chi_qa_tro uses reference data from:

Internal YAML configurations accessed via chi_get_yaml()

Standard column names from chi_get_cols()

Category validation using rads.data::misc_chi_byvars

Field type / data class validation using rads::tsql_validate_field_types()

It performs numerous checks including proper rounding based on result type, absence of missing critical data, and data integrity rules such as ensuring confidence intervals are properly bounded (lower_bound ≤ result ≤ upper_bound) and that proportions fall within [0,1].

# Perform QA checks
qa_result <- chi_qa_tro(CHIestimates = myrates,
                        CHImetadata = mymetadata,
                        acs = F,
                        verbose = F)

🙂 Success! Your desired TSQL data types are suitable for your dataset.
🙂 Success! Your desired TSQL data types are suitable for your dataset.

print(qa_result)

[1] 1

Comparing to Previous Estimates

An important validation step is comparing our new estimates with previous ones to identify ‘notable differences’. The notable differences criteria were specified by Joie McCracken and are the same for each data source. They are used both for human QA and for sharing high-level summaries to accompany new releases of CHI estimates. If there is an issue, the table will have notable == 1, otherwise it will have is.na(notable).

# Get previous _kingcounty and demgroups estimates from the database
rates_old <- setDT(DBI::dbGetQuery(
  conn = db_chi_prod,
  statement = glue::glue_sql("SELECT * FROM [PHExtractStore].[APDE].[birth_results]
        WHERE tab IN ('_kingcounty', 'demgroups') AND chi = 1 AND
        indicator_key IN ({unique(myrates$indicator_key)*})", .con = db_chi_prod)))

# Compare old and new estimates
mycomparison <- chi_compare_estimates(OLD = rates_old,
                                      NEW = myrates,
                                      OLD.year = paste0(latest_year-5, '-', latest_year-1),
                                      NEW.year = paste0(latest_year-4, '-', latest_year),
                                      META = mymetadata)

Let’s examine three rows from mycomparison to see the table structure:

indicator_key	absolute.diff	relative.diff	result_type	tab	cat1	cat1_group	cat1_varname	cat2	cat2_group	cat2_varname	year.OLD	year.NEW	result.OLD	result.NEW	lower_bound.OLD	lower_bound.NEW	upper_bound.OLD	upper_bound.NEW	numerator.OLD	numerator.NEW	denominator.OLD	denominator.NEW	se.OLD	se.NEW	qa_type	notable
teen1517	0.7	17.9	rate	demgroups	Birthing person’s race	AIAN	race3	NA	NA	NA	2018-2022	2019-2023	3.9	4.6	2.0	2.7	7.0	7.3	11	13	2815	2844	1.18	1.27	relative	NA
teen1517	0.6	15.0	rate	demgroups	Birthing person’s race	Black	race3	NA	NA	NA	2018-2022	2019-2023	4.0	3.4	3.1	2.7	5.1	4.3	62	54	15547	15775	0.51	0.47	relative	NA
teen1517	0.5	13.5	rate	demgroups	Birthing person’s race	Black	race4	NA	NA	NA	2018-2022	2019-2023	3.7	3.2	2.8	2.4	4.8	4.1	53	46	14346	14536	0.51	0.47	relative	NA

Export Analyses to SharePoint

Now that we’ve finished our analyses, let’s save our results to SharePoint.

Exporting Estimates & Metadata

# Connect to SharePoint
team <- get_team("Community Health Indicators")
drv <- team$get_drive("CHI-Vizes")

# Create an empty Excel workbook
wb <- openxlsx::createWorkbook()

# Add estimates worksheet and write data
openxlsx::addWorksheet(wb, "Estimates")
openxlsx::writeDataTable(wb, "Estimates",
                         x = myrates,
                         tableStyle = "TableStyleMedium9")

# Add metadata worksheet and write data
openxlsx::addWorksheet(wb, "Metadata")
openxlsx::writeDataTable(wb, "Metadata",
                         x = mymetadata,
                         tableStyle = "TableStyleMedium9") 

# Save workbook to tempfile
tempy <- tempfile(fileext = ".xlsx")
openxlsx::saveWorkbook(wb, 
                       file = tempy, 
                       overwrite = TRUE)

# Upload to SharePoint
drv$upload_file(src = tempy,
                dest = paste0(sharepoint_output_dir,
                             "Tableau_Ready_",
                             latest_year-4, "_", latest_year, ".xlsx"))
rm(tempy)

Exporting `mycomparison`

# Connect to SharePoint
team <- get_team("Community Health Indicators")
drv <- team$get_drive("CHI-Vizes")

# Create a temporary file to store mycomparison as an Excel file
tempy <- tempfile(fileext = ".xlsx")

# Write mycomparison to the temporary Excel file
openxlsx::write.xlsx(x = mycomparison,
                     file = tempy,
                     asTable = TRUE,    # Ensure data is written as a table
                     overwrite = TRUE,  # Allow overwriting the file if it exists
                     tableStyle = "TableStyleMedium9")

# Upload the Excel file to SharePoint
drv$upload_file(src = tempy,
                dest = paste0(sharepoint_output_dir,
                             "qa_result_old_vs_new_",
                             latest_year-4, "_", latest_year, ".xlsx"))

rm(tempy)

Saving Estimates & Metadata to SQL Server

Finally, we need to save our results and metadata to the development SQL Server using chi_update_sql(). Later, once it passes human QA, it will be transferred to the production SQL Server.

chi_update_sql(CHIestimates = myrates,
               CHImetadata = mymetadata,
               table_name = 'junk', # use actual data source name, e.g., 'birth', 'brfss', etc.
               server = 'development', 
               replace_table = TRUE)

Delete Temporary Tables

In the process of creating this vignette, we created some temporary tables on SharePoint and SQL Server. Let’s delete these tables to keep our servers clean. Obviously, in your real analyses, you’d skip this step.

# Drop SharePoint directory
SharePoint_Parent <- strsplit(sharepoint_output_dir, "/")[1](/PHSKC-APDE/apde.chi.tools/wiki/1)[1]
fff = drv$get_item(SharePoint_Parent)$delete(confirm = FALSE)

# Drop the SQL Server tables
db_chi_dev <- odbc::dbConnect(
  odbc::odbc(),
  Driver = "SQL Server",
  Server = "KCITSQLUATHIP40", # dev server
  Database = "PHExtractStore")

DBI::dbExecute(conn = db_chi_dev, "DROP TABLE [PHExtractStore].[APDE_WIP].[junk_results]")
DBI::dbExecute(conn = db_chi_dev, "DROP TABLE [PHExtractStore].[APDE_WIP].[junk_metadata]")

Conclusion

Congratulations on completing the CHI rate analysis workflow! This workflow provides a standardized process for generating rate estimates, ensuring consistency and traceability. Follow these steps to streamline your analyses and maintain CHI standards across datasets.

New functions you used:

chi_generate_analysis_set() to create an analysis set from last year’s production estimates
chi_generate_tro_shell() to generate calculation instructions based on the output of chi_generate_analysis_set()
chi_count_by_age() to generate a detailed breakdown of counts by age that will serve as the numerator for CHI rate analyses
chi_generate_instructions_pop() to create an instruction set for downloading population denominator data
chi_get_proper_pop() to generate a table of population denominators based on the output of chi_generate_instructions_pop()
chi_generate_metadata() to create an updated metadata table based on the output of chi_calc() and last year’s production metadata
chi_qa_tro() to perform quality assurance checks on the output of chi_calc() and chi_generate_metadata()
chi_compare_estimates() to identify notable differences between the output of chi_calc() and previous estimates
chi_update_sql() to save estimates and metadata to SQL Server

If you encounter issues or believe you’ve found a bug, please submit a GitHub issue at Issues · PHSKC-APDE/apde.chi.tools.

Remember that this workflow is specifically for rate calculations. For calculations of proportions or means, please refer to the dedicated dedicated vignette.

– Updated May 23, 2025 (apde.chi.tools v2025.0.3)