HUD Crosswalk Mappings - stlrda/211Dashboard-Workflows GitHub Wiki

Understanding the Data

The U.S. Department of Housing and Urban Development (HUD) supplies the Crosswalk files for mapping zip codes to census geographies, and vice versa. The summary below comes directly from the HUD website and provides valuable insight into the nature/purpose of these data files. For additional information please visit the HUD crosswalk site.

One of the many challenges that social science researchers and practitioners face is the difficulty of relating United States Postal Service (USPS) ZIP codes to Census Bureau geographies. There are valuable data available only at the ZIP code level that, when combined with demographic data tabulated at various Census geography levels, could open up new avenues of exploration.

While some acceptable methods of combining ZIP codes and Census geography exist, they have limitations. To provide additional avenues for merging these data, PD&R has released the HUD-USPS Crosswalk Files. These unique files are derived from data in the quarterly USPS Vacancy Data. They originate directly from the USPS; are updated quarterly, making them highly responsive to changes in ZIP code configurations; and reflect the locations of both business and residential addresses. The latter feature is of particular interest to housing researchers because many of the phenomena that they study are based on housing unit or address. By using an allocation method based on residential addresses rather than by area or by population, analysts can take into account not only the spatial distribution of population, but also the spatial distribution of residences. This enables a slightly more nuanced approach to allocating data between disparate geographies. Please note that the USPS Vacancy Data is constructed from ZIP+4 data that contains records of addresses, it does not contain ZIP+4 data that are associated with ZIP codes that exclusively serve Postal Office Boxes (PO Boxes). As a result, ZIP codes that only serve PO Boxes will not appear in the files.

From its conception, the 211Dashboard project has wanted to provide both county level and zip-code level data for its end users. However, because some of our data sources (e.g. Census, unemployment data, etc.) aren't provided at a zip-code level, we decided to utilize the HUD Crosswalk files to map data to zip codes (and the reverse). We understand that this method is an extremely rough estimate—primarily when mapping county level data to the zip-code level—that's why we've also include the Census Tract data, so that we can map tracts to zip-codes with better precision. However, it should be emphasized that all data derived from the various HUD mapping techniques is simply a rough estimate. So now that you know why we are using these files, let's outline the collection process.

Collecting the Data

Eventually, this data should be collected via API request. However, because we never realized there was an API until late in the development process and because we are already having project maintainers collect census data manually, we figured it'd be easiest (for the time being) to make this process manual.

Steps

Create a folder named crosswalk.
Navigate to www.huduser.gov/portal/datasets/usps_crosswalk.html.
Select the Data tab.
Under "Select Crosswalk Type" select ZIP_COUNTY/COUNTY_ZIP/TRACT_ZIP.
- Note: You need to download all three file types.
Under "Select Data Year and Quarter" choose the most recent quarter (e.g. 1st Quarter 2020).
Make sure all three .xlsx downloads are placed into the crosswalk folder.
Upload the folder (and its contents) to S3 bucket: aws s3 cp crosswalk s3://uw211dashboard-workbucket/crosswalk --recursive
- Note: Make sure you are in the directory with the crosswalk folder when running the aws s3 command above.

Mapping the Data

In all 3 of the HUD mapping tables, you'll find an attribute associated with each item in the mapping. For example, for the TRACT_ZIP table you'll find a tract_cd and zip_cd attribute. You will also have a res_ratio column. The res_ratio attribute gives the ratio of residential addresses found in the second geographic attribute (e.g. zip-code) for a single instance of the first geographic attribute (e.g. census tract). To derive the estimates for the second geographic attribute, you simple multiply the values associated with the first geographic attribute by the res_ratio attribute. This provides a proportional mapping of data to whatever level you are mapping. More specifically, in the example described in this section, mapping TRACT to ZIP you simply take the census tract values (like total_population) and multiply it by the res_ratio of TRACT to ZIP. The result is the estimated total_population for that zip-code based on the census reported value for that tract. If that was confusing (which I'm sure it was) see the SQL code in Unemployment Data, this may help in better understanding the mapping method.