Converting 2010 to 2020 geographies - NYCPlanning/db-factfinder GitHub Wiki
Overview
In order to facilitate time-series analysis of demographic trends, data originally released in 2010 census geographies need to be converted to 2020 census geographies. This involves allocating count values in 2010 census tracts to 2020 tracts, accounting for tract splits or merges. In cases of tract splits, counts from the 2010 tract-level data are distributed to the multiple 2020 tracts in a way that is proportional to the 2010 population distribution within the tract. DCP uses a one-to-one relationship between 2010 blocks and 2020 blocks in order to estimate the proportion of 2010 population contained within each new tract.
For example:
- 2010 tract 1 (containing 8 blocks) split into 2020 tracts 1.1 (containing blocks 1, 2, 3 and 4) and 1.2 (containing blocks 5, 6, 7, and 8)
- In 2010, tract 1 had a total population of 4000, made up of:
- Block 1: 1000
- Block 2: 500
- Block 3: 1000
- Block 4: 500
- Block 5: 200
- Block 6: 200
- Block 7: 500
- Block 8: 100
- The 2010 population contained in the blocks now associated with each of the 2020 tracts is:
- Tract 1.1 (blocks 1-4): 1000 + 500 + 1000 + 500 = 3000
- Tract 1.2 (blocks 5-8): 200 + 200 + 500 + 100 = 1000
- The proportion of total 2010 population in the blocks now associated with each of the 2020 tracts is:
- Tract 1.1: 75%
- Tract 1.2: 25%
These proportions are contained in ratio.csv
, in the following format (using the above example for demonstration):
2020 Tract | 2010 Tract | ratio |
---|---|---|
1.1 | 1 | .75 |
1.2 | 1 | .25 |
For cases of merges, ratios are 1 if the entireties of multiple 2010 tracts are combined into a new, larger 2020 tract.
These ratios are used to proportionately allocate count values from 2010 to 2020 tracts. Tract-to-tract conversion is the first step before higher-level spatial aggregation. For more information about aggregating census tracts into larger geographies, see the vertical aggregation documentation page.
Calculating 2020 tract-level estimates & MOEs in cases of splits
Conversion of 2010 tract-level estimates and MOEs occurs in the AggregateGeography
class for year 2010_to_2020
. This class contains a method ct2010_to_ct2020
, which takes a DataFrame of 2010 tract-level data and returns a DataFrame of 2020 tract-level data (as estimated using the proportional allocation of total population).
Consider the example tract split described above, along with the following example 2010 tract-level estimates:
2010 Tract | Workers Under 16 Estimate | Workers Under 16 MOE |
---|---|---|
1 | 1000 | 100 |
In order to estimate the number of workers under 16 in 2020 tacts 1.1 and 1.2, we assume that the spatial distribution of workers under 16 is well-approximated by the spatial distribution of total people within the tract.
First, we merge the 2010 data with the ratios described in the previous section, yielding:
2020 Tract | 2010 Tract | Ratio | Estimate (2010) | MOE (2010) |
---|---|---|---|---|
1.1 | 1 | .75 | 1000 | 100 |
1.2 | 1 | .25 | 1000 | 100 |
2020 tract-level estimates are simply the 2010 estimate multiplied by the ratio:
2020 Tract | 2010 Tract | Ratio | Estimate (2010) | MOE (2010) | Estimate (2020) |
---|---|---|---|---|---|
1.1 | 1 | .75 | 1000 | 100 | 1000 * .75 = 750 |
1.2 | 1 | .25 | 1000 | 100 | 1000 * .25 = 250 |
Calculating 2020 MOEs depends on an empirically-derived formula, convert_moe
:
- If the ratio is 1 (not a tract split), 2020 MOE is the same as 2010 MOE
- If the 2020 estimate is 0 (prior to any rounding), the 2020 MOE is NULL
- If
((ratio * 100)^(0.56901)) * 7.96309 >= 100
, the 2010 MOE is the same as the 2020 MOE - Otherwise, the 2020 MOE is equal to:
((((ratio * 100)^(0.56901)) * 7.96309) / 100) * (2010 MOE)
This formula comes from an empirical model capturing the relationships between published block group MOEs as a percent of published tract MOEs and block group estimates as a percent of tract estimates, with R-squared of 0.81:
(block group MOEs as a percent of tract MOEs) = 7.96309 * (block group estimates as a percent of tract estimates)^0.56901
This formula is based on 10 selected variables, for 314 random NYC block groups.
- Males 85 years and older
- Non-hispanic of 2 or more races
- Single female household with children
- 65 years and older living alone
- Household income $200,000 or more
- Worked from home
- Employed civilians 16 years and older
- Occupied housing with a mortgage
- Vacant housing units
- GRAPI 30% to 34.9%
The nested relationship of block groups within tracts mimics the relationship of 2020 tracts within 2010 tracts in cases of a tract split.
Using the example above, MOE is calculated as follows:
2020 Tract | 2010 Tract | Ratio | Estimate (2010) | MOE (2010) | Estimate (2020) | MOE (2020) |
---|---|---|---|---|---|---|
1.1 | 1 | .75 | 1000 | 100 | 750 | 7.96309 * (75)^0.56901 = 92.8988 |
1.2 | 1 | .25 | 1000 | 100 | 250 | 7.96309 * (25)^0.56901 = 49.7191 |
MOEs and estimates are rounded in the final cleaning and rounding step.
Calculating 2020 tract-level estimates and MOEs in cases of merges
Cases of tract merges are much simpler, and generally follow the same logic as other small-to-large spatial aggregation.
Consider the following example, representing a complete merge of 2.1 and 2.2 into 2:
2020 Tract | 2010 Tract | ratio |
---|---|---|
2 | 2.1 | 1 |
2 | 2.2 | 1 |
In this case, joining with an example 2010 tract-level dataset would produce:
2020 Tract | 2010 Tract | ratio | Estimate (2010) | MOE (2010) | Estimate (2020) | MOE (2020) |
---|---|---|---|---|---|---|
2 | 2.1 | 1 | 100 | 10 | 100 * 1 = 100 | 10 |
2 | 2.2 | 1 | 200 | 20 | 200 * 1 = 200 | 20 |
At this point, rows of the joined table are aggregated to get 2020 tract-level data, following these steps. Estimates are summed, and MOEs are aggregated using the square root of a sum of squares, in agg_moe
.
2020 Tract | Estimate (2020) | MOE (2020) |
---|---|---|
2 | 100 + 200 = 300 | SQRT(10^2 + 20^2) = 22.3607 |
Converting from 2010 tracts to other 2020 geographies
If the requested geography type is not a tract, but is instead another 2020 geography type (NTA, CDTA, etc.), other methods in the AggregateGeography
class first call ct2010_to_ct2020
to estimate 2020 tract-level data. From there, aggregation proceeds using the same techniques as other data years. For example, the method tract_to_cdta:
- Converts 2010 tracts to 2020 tracts using the workflow described above
- Joins the resulting 2020 tract-level data with the
2010_to_2020 lookup_geo
(for more information about spatial lookups, see here) - Groups by 2020 NTA field,
nta_geoid
, using aggregation techniques defined increate_output
- Renames
nta_geoid
asgeoid
and setsgeotype
to "NTA" to standardize format
Entire PFF workflow in cases of converting 2010 to 2020 data
The following image shows the entire workflow for converting 2010 to 2020 geographies. The example shown is transforming 2019 ACS data into 2020 NTA-level data for the PFF variable capturing the population of South Asian origin, or asnsouth
.