Converting 2010 to 2020 geographies - NYCPlanning/db-factfinder GitHub Wiki

Overview

In order to facilitate time-series analysis of demographic trends, data originally released in 2010 census geographies need to be converted to 2020 census geographies. This involves allocating count values in 2010 census tracts to 2020 tracts, accounting for tract splits or merges. In cases of tract splits, counts from the 2010 tract-level data are distributed to the multiple 2020 tracts in a way that is proportional to the 2010 population distribution within the tract. DCP uses a one-to-one relationship between 2010 blocks and 2020 blocks in order to estimate the proportion of 2010 population contained within each new tract.

For example:

2010 tract 1 (containing 8 blocks) split into 2020 tracts 1.1 (containing blocks 1, 2, 3 and 4) and 1.2 (containing blocks 5, 6, 7, and 8)
In 2010, tract 1 had a total population of 4000, made up of:
- Block 1: 1000
- Block 2: 500
- Block 3: 1000
- Block 4: 500
- Block 5: 200
- Block 6: 200
- Block 7: 500
- Block 8: 100
The 2010 population contained in the blocks now associated with each of the 2020 tracts is:
- Tract 1.1 (blocks 1-4): 1000 + 500 + 1000 + 500 = 3000
- Tract 1.2 (blocks 5-8): 200 + 200 + 500 + 100 = 1000
The proportion of total 2010 population in the blocks now associated with each of the 2020 tracts is:
- Tract 1.1: 75%
- Tract 1.2: 25%

These proportions are contained in ratio.csv, in the following format (using the above example for demonstration):

2020 Tract	2010 Tract	ratio
1.1	1	.75
1.2	1	.25

For cases of merges, ratios are 1 if the entireties of multiple 2010 tracts are combined into a new, larger 2020 tract.

These ratios are used to proportionately allocate count values from 2010 to 2020 tracts. Tract-to-tract conversion is the first step before higher-level spatial aggregation. For more information about aggregating census tracts into larger geographies, see the vertical aggregation documentation page.

Calculating 2020 tract-level estimates & MOEs in cases of splits

Conversion of 2010 tract-level estimates and MOEs occurs in the AggregateGeography class for year 2010_to_2020. This class contains a method ct2010_to_ct2020, which takes a DataFrame of 2010 tract-level data and returns a DataFrame of 2020 tract-level data (as estimated using the proportional allocation of total population).

Consider the example tract split described above, along with the following example 2010 tract-level estimates:

2010 Tract	Workers Under 16 Estimate	Workers Under 16 MOE
1	1000	100

In order to estimate the number of workers under 16 in 2020 tacts 1.1 and 1.2, we assume that the spatial distribution of workers under 16 is well-approximated by the spatial distribution of total people within the tract.

First, we merge the 2010 data with the ratios described in the previous section, yielding:

2020 Tract	2010 Tract	Ratio	Estimate (2010)	MOE (2010)
1.1	1	.75	1000	100
1.2	1	.25	1000	100

2020 tract-level estimates are simply the 2010 estimate multiplied by the ratio:

2020 Tract	2010 Tract	Ratio	Estimate (2010)	MOE (2010)	Estimate (2020)
1.1	1	.75	1000	100	1000 * .75 = 750
1.2	1	.25	1000	100	1000 * .25 = 250

Calculating 2020 MOEs depends on an empirically-derived formula, convert_moe:

If the ratio is 1 (not a tract split), 2020 MOE is the same as 2010 MOE
If the 2020 estimate is 0 (prior to any rounding), the 2020 MOE is NULL
If ((ratio * 100)^(0.56901)) * 7.96309 >= 100, the 2010 MOE is the same as the 2020 MOE
Otherwise, the 2020 MOE is equal to: ((((ratio * 100)^(0.56901)) * 7.96309) / 100) * (2010 MOE)

This formula comes from an empirical model capturing the relationships between published block group MOEs as a percent of published tract MOEs and block group estimates as a percent of tract estimates, with R-squared of 0.81:

(block group MOEs as a percent of tract MOEs) = 7.96309 * (block group estimates as a percent of tract estimates)^0.56901

This formula is based on 10 selected variables, for 314 random NYC block groups.

Males 85 years and older
Non-hispanic of 2 or more races
Single female household with children
65 years and older living alone
Household income $200,000 or more
Worked from home
Employed civilians 16 years and older
Occupied housing with a mortgage
Vacant housing units
GRAPI 30% to 34.9%

The nested relationship of block groups within tracts mimics the relationship of 2020 tracts within 2010 tracts in cases of a tract split.

Using the example above, MOE is calculated as follows:

2020 Tract	2010 Tract	Ratio	Estimate (2010)	MOE (2010)	Estimate (2020)	MOE (2020)
1.1	1	.75	1000	100	750	7.96309 * (75)^0.56901 = 92.8988
1.2	1	.25	1000	100	250	7.96309 * (25)^0.56901 = 49.7191

MOEs and estimates are rounded in the final cleaning and rounding step.

Calculating 2020 tract-level estimates and MOEs in cases of merges

Cases of tract merges are much simpler, and generally follow the same logic as other small-to-large spatial aggregation.

Consider the following example, representing a complete merge of 2.1 and 2.2 into 2:

2020 Tract	2010 Tract	ratio
2	2.1	1
2	2.2	1

In this case, joining with an example 2010 tract-level dataset would produce:

2020 Tract	2010 Tract	ratio	Estimate (2010)	MOE (2010)	Estimate (2020)	MOE (2020)
2	2.1	1	100	10	100 * 1 = 100	10
2	2.2	1	200	20	200 * 1 = 200	20

At this point, rows of the joined table are aggregated to get 2020 tract-level data, following these steps. Estimates are summed, and MOEs are aggregated using the square root of a sum of squares, in agg_moe.

2020 Tract	Estimate (2020)	MOE (2020)
2	100 + 200 = 300	SQRT(10^2 + 20^2) = 22.3607

Converting from 2010 tracts to other 2020 geographies

If the requested geography type is not a tract, but is instead another 2020 geography type (NTA, CDTA, etc.), other methods in the AggregateGeography class first call ct2010_to_ct2020 to estimate 2020 tract-level data. From there, aggregation proceeds using the same techniques as other data years. For example, the method tract_to_cdta:

Converts 2010 tracts to 2020 tracts using the workflow described above
Joins the resulting 2020 tract-level data with the 2010_to_2020 lookup_geo (for more information about spatial lookups, see here)
Groups by 2020 NTA field, nta_geoid, using aggregation techniques defined in create_output
Renames nta_geoid as geoid and sets geotype to "NTA" to standardize format

Entire PFF workflow in cases of converting 2010 to 2020 data

The following image shows the entire workflow for converting 2010 to 2020 geographies. The example shown is transforming 2019 ACS data into 2020 NTA-level data for the PFF variable capturing the population of South Asian origin, or asnsouth.

factfinder_convert