Vertical aggregation: small to large geographies - NYCPlanning/db-factfinder GitHub Wiki
Aggregating small areas to larger areas
If the requested geography type is not a Census geography, tract- or block-group level PFF data are aggregated to calculate e, m, p, and z for larger geographies. For example, rows containing tract-level estimates and MOEs for a given PFF variable get combined to produce NTA-level estimates and MOEs.
In general, aggregate geography types include:
- NTA
- CDTA (post 2020)
- Portion of Community Districts within 100 year floodplain
- Portion of Community Districts within 500 year floodplain
- Portion of Community Districts within walking distance of a park
Note: when converting data published in 2010 geographies to 2020 geographies, 2020 census tracts function as an aggregate geography type.
Geographic relationships
Relationships between geographic areas are maintained in the directory data/lookup_geo
. Lookups are specific to the latest decennial census year, since tract boundaries change each decade. When the Calculate
class is initialized, the specified geography year determines which spatial lookup is referenced.
AggregateGeography
Class
Each decennial year corresponds with a different version of the AggregateGeography
class. These classes are defined in year-specific python files in the geography directory.
Class properties and methods
lookup_geo
: The AggregateGeography
class (for both years) contains a property lookup_geo
. This property is a DataFrame with parsed columns from the geographic lookups in the data directory.
options
: This property contains a lookup between Census geography types, aggregate geography types they can be combined into, and the function necessary for converting raw geography types to aggregate geography types. For example, one record in the 2010 lookup is:
"tract": {"NTA": self.tract_to_nta, "cd": self.tract_to_cd}
Both NTA- and CD-level data are built from tract-level raw data. The function for aggregating tract-level data into NTA-level data is tract_to_nta
, and the function for aggregating tract-level data into CD-level data is tract_to_cd
.
aggregated_geography
: This property is a list of all aggregated geography types for a given year, e.g. ["nta", "cd", "cd_fp_500", "cd_fp_100", "cd_park_access"].
format_geoid
and format_geotype
: These methods convert FIPS census geoids and types into the format displayed in Planning Labs' application, as implemented in labs_geotype
. See the final output cleaning documentation page for more info.
Functions to convert smaller to larger geography types
The majority of methods in the AggregateGeography
class aggregate tract- or block group-level data into larger geographies. While the methods are specific to the geography type, they follow a similar structure. Consider the example:
Tract-level data
geoid | Estimate | MOE |
---|---|---|
tract 1 | 1 | 2 |
tract 2 | 3 | 4 |
tract 3 | 5 | 6 |
tract 4 | 7 | 8 |
Geo-lookup
tract_geoid | nta_geoid | ... |
---|---|---|
tract 1 | NTA 1 | ... |
tract 2 | NTA 1 | ... |
tract 3 | NTA 2 | ... |
tract 4 | NTA 3 | ... |
- Join tract/block-group data with lookup_geo (which defines how small geographies nest within larger ones) on geoid_tract or geoid_block_group. Following the example data above, this would produce:
geoid | Estimate | MOE | nta_geoid |
---|---|---|---|
tract 1 | 1 | 2 | NTA 1 |
tract 2 | 3 | 4 | NTA 1 |
tract 3 | 5 | 6 | NTA 2 |
tract 4 | 7 | 8 | NTA 3 |
- Call the function
create_output
in order to group by the aggregate geography geoid. Within each group, estimates get summed. MOEs are aggregated using the square root of sum of squares, defined inagg_moe
.
nta_geoid | Estimate | MOE |
---|---|---|
NTA 1 | (1 + 3) = 3 | SQRT(2^2 + 4^2) = SQRT(20) |
NTA 2 | 5 | 6 |
NTA 3 | 7 | 8 |
- Rename GEOID column to standardize output
census_geoid | Estimate | MOE |
---|---|---|
NTA 1 | (1 + 3) = 3 | SQRT(2^2 + 4^2) = SQRT(20) |
NTA 2 | 5 | 6 |
NTA 3 | 7 | 8 |
Special case: Converting 2010 tracts to 2020 aggregate geographies
When converting 2010 input geographies to 2020 outputs, the methods described in the previous section contain an additional step. Prior to step one, in which tract-level data are joined with lookup_geo, 2010 tracts are converted to 2020 tracts using the method ct2010_to_ct2020
. The following steps proceed as described above, using 2020 tract-level data as the input to further vertical aggregation.
For more information about tract-to-tract conversion, see the 2010 to 2020 geography conversion documentation page.