Downloading data from the API - NYCPlanning/db-factfinder GitHub Wiki
Downloading Census Data
Overview
Input data for all population factfinder calculations come from the US Census Bureau's API, as accessed using the census
python wrapper package.
Download
Class
The Download
class accesses input data for all population factfinder calculations by formatting a geoquery and calling the appropriate US Census Bureau API endpoint. When initialized, this class contains the following properties, all necessary for selecting endpoints and creating queries:
- The census API access key, contained in a .env file
- The year of data to access (In the case of 5-year ACS data, this is the final year. For example, 5-year data from the 2015-2019 rolling sample would correspond with
year = 2019
) - The source type (i.e. acs, decennial)
- Necessary state and county FIPS codes, set by default to the five NYC counties within NY state
The geoqueries
method uses state and county FIPS codes to generate an appropriate query for the requested spatial unit. For example, calling geoqueries('tract')
will return the string query expected by the US Census Bureau API (via the census
python wrapper) to download all tracts within the five NYC counties.
The download_variable
method then calls either download_e_m
or download_e_m_p_z
. These methods set the census
client based on the specified source, identifies the census variable codes associated with the pff_variable name using Metadata, identifies the appropriate geoquery for the requested geotype, then calls client.get
to store data in a pandas dataframe. Upon download, types are enforced (set to float64), outliers are replaced with NULLs, and MOEs for zero estimates are set to zero.
In order to improve performance, the Download
class writes results of each call to a cache (via utils.write_to_cache). Prior to re-downloading, the Download
class checks the cache for previously-stored results.