Downloading data from the API - NYCPlanning/db-factfinder GitHub Wiki

Downloading Census Data

Overview

Input data for all population factfinder calculations come from the US Census Bureau's API, as accessed using the census python wrapper package.

Download Class

The Download class accesses input data for all population factfinder calculations by formatting a geoquery and calling the appropriate US Census Bureau API endpoint. When initialized, this class contains the following properties, all necessary for selecting endpoints and creating queries:

  • The census API access key, contained in a .env file
  • The year of data to access (In the case of 5-year ACS data, this is the final year. For example, 5-year data from the 2015-2019 rolling sample would correspond with year = 2019)
  • The source type (i.e. acs, decennial)
  • Necessary state and county FIPS codes, set by default to the five NYC counties within NY state

The geoqueries method uses state and county FIPS codes to generate an appropriate query for the requested spatial unit. For example, calling geoqueries('tract') will return the string query expected by the US Census Bureau API (via the census python wrapper) to download all tracts within the five NYC counties.

The download_variable method then calls either download_e_m or download_e_m_p_z. These methods set the census client based on the specified source, identifies the census variable codes associated with the pff_variable name using Metadata, identifies the appropriate geoquery for the requested geotype, then calls client.get to store data in a pandas dataframe. Upon download, types are enforced (set to float64), outliers are replaced with NULLs, and MOEs for zero estimates are set to zero.

In order to improve performance, the Download class writes results of each call to a cache (via utils.write_to_cache). Prior to re-downloading, the Download class checks the cache for previously-stored results.