Vertical aggregation: small to large geographies - NYCPlanning/db-factfinder GitHub Wiki

Aggregating small areas to larger areas

If the requested geography type is not a Census geography, tract- or block-group level PFF data are aggregated to calculate e, m, p, and z for larger geographies. For example, rows containing tract-level estimates and MOEs for a given PFF variable get combined to produce NTA-level estimates and MOEs.

In general, aggregate geography types include:

NTA
CDTA (post 2020)
Portion of Community Districts within 100 year floodplain
Portion of Community Districts within 500 year floodplain
Portion of Community Districts within walking distance of a park

Note: when converting data published in 2010 geographies to 2020 geographies, 2020 census tracts function as an aggregate geography type.

Geographic relationships

Relationships between geographic areas are maintained in the directory data/lookup_geo. Lookups are specific to the latest decennial census year, since tract boundaries change each decade. When the Calculate class is initialized, the specified geography year determines which spatial lookup is referenced.

`AggregateGeography` Class

Each decennial year corresponds with a different version of the AggregateGeography class. These classes are defined in year-specific python files in the geography directory.

Class properties and methods

lookup_geo: The AggregateGeography class (for both years) contains a property lookup_geo. This property is a DataFrame with parsed columns from the geographic lookups in the data directory.

options: This property contains a lookup between Census geography types, aggregate geography types they can be combined into, and the function necessary for converting raw geography types to aggregate geography types. For example, one record in the 2010 lookup is:

"tract": {"NTA": self.tract_to_nta, "cd": self.tract_to_cd}

Both NTA- and CD-level data are built from tract-level raw data. The function for aggregating tract-level data into NTA-level data is tract_to_nta, and the function for aggregating tract-level data into CD-level data is tract_to_cd.

aggregated_geography: This property is a list of all aggregated geography types for a given year, e.g. ["nta", "cd", "cd_fp_500", "cd_fp_100", "cd_park_access"].

format_geoid and format_geotype: These methods convert FIPS census geoids and types into the format displayed in Planning Labs' application, as implemented in labs_geotype. See the final output cleaning documentation page for more info.

Functions to convert smaller to larger geography types

The majority of methods in the AggregateGeography class aggregate tract- or block group-level data into larger geographies. While the methods are specific to the geography type, they follow a similar structure. Consider the example:

Tract-level data

geoid	Estimate	MOE
tract 1	1	2
tract 2	3	4
tract 3	5	6
tract 4	7	8

Geo-lookup

tract_geoid	nta_geoid	...
tract 1	NTA 1	...
tract 2	NTA 1	...
tract 3	NTA 2	...
tract 4	NTA 3	...

Join tract/block-group data with lookup_geo (which defines how small geographies nest within larger ones) on geoid_tract or geoid_block_group. Following the example data above, this would produce:

geoid	Estimate	MOE	nta_geoid
tract 1	1	2	NTA 1
tract 2	3	4	NTA 1
tract 3	5	6	NTA 2
tract 4	7	8	NTA 3

Call the function create_output in order to group by the aggregate geography geoid. Within each group, estimates get summed. MOEs are aggregated using the square root of sum of squares, defined in agg_moe.

nta_geoid	Estimate	MOE
NTA 1	(1 + 3) = 3	SQRT(2^2 + 4^2) = SQRT(20)
NTA 2	5	6
NTA 3	7	8

Rename GEOID column to standardize output

census_geoid	Estimate	MOE
NTA 1	(1 + 3) = 3	SQRT(2^2 + 4^2) = SQRT(20)
NTA 2	5	6
NTA 3	7	8

Special case: Converting 2010 tracts to 2020 aggregate geographies

When converting 2010 input geographies to 2020 outputs, the methods described in the previous section contain an additional step. Prior to step one, in which tract-level data are joined with lookup_geo, 2010 tracts are converted to 2020 tracts using the method ct2010_to_ct2020. The following steps proceed as described above, using 2020 tract-level data as the input to further vertical aggregation.

For more information about tract-to-tract conversion, see the 2010 to 2020 geography conversion documentation page.