Percent Estimates and Percent MOEs - NYCPlanning/db-factfinder GitHub Wiki

Calculating Percent Variables

The Calculate class method calculate_c_e_m_p_z first calculates the estimate and margin of error of a variable, then calculates percent and percent MOE as appropriate.

Calculating pff_variable and base_variable estimate and MOEs

Calculating percent estimates and MOEs require the estimate and MOE of the numerator (the PFF variable of interest), as well as the estimate and MOE of the denominator (the variable representing the population of which the PFF variable is a subset). For example, if we were to calculate the estimated percent of workers 16+ who commute by walking, the PFF variable is the the estimated count of workers 16+ who commute by walking, while the denominator is the estimated count of workers 16+. In our metadata and code, the denominator variables are referred to as "base variables". Base variables for each PFF variable are stored in the metadata, and are accessible as the base_variable property of an instance of the Variable class.

In the most basic case, calculating the percent and percent MOE occur by first calling calculate_e_m to calculate the estimate and MOE of the pff_variable. Then, calculate_e_m gets called again, instead calculating the estimate and MOE of the associated base variable.

As with other pff_variables, a base variable can be a special variable or a median. If the base variable is a special variable, calculate_e_m_special gets called in place of calculate_e_m. Similarly, if the base variable is a median, calculate_e_m_median gets called in place of calculate_e_m. For more information on exceptions when calculating estimates and MOEs, see here.

Combining pff_variable and base_variable results to calculate p and z

Once estimates and MOEs are calculated for both the pff_variable and its base_variable, the two resulting DataFrames get merged, with e and m of the base variable renamed as e_agg and m_agg.

Once merged, the percent estimate is calculated using get_p:

Percent MOE = (Estimate of PFF variable) / (Estimate of Base Variable) if (Estimate of Base Variable) is not NULL

The percent MOE is calculated using get_z, which is based on the methodology outlined in the Census Bureau's guidance on calculating MOEs of derived estimates.

If the PFF percent estimate is 0 or 100, percent MOE is NULL
If the base variable estimate is 0, percent MOE is NULL
Otherwise,
    Percent MOE = (Square root of 
                      (Squared PFF MOE - Squared (PFF estimate * Base MOE / Base Estimate))
                  ) / Base Estimate * 100

In cases where the value under the square root is negative, the ratio MOE formula is used instead:
    Percent MOE = (Square root of 
                      (Squared PFF MOE + Squared (PFF estimate * Base MOE / Base Estimate))
                  ) / Base Estimate * 100

Exceptions to calculating base estimate and MOE

Profile-only variables: There are several variables where percent and percent MOE are available directly from the Census API. The Census Bureau variable documentation and table column headers indicate these with suffixes "PE" and "PM". If available, we do not calculate the base estimate and MOE, and instead pull estimate, MOE, percent estimate, and percent MOE from the API directly using calculate_e_m_p_z, as called by calculate_c_e_m_p_z. This is only possible for census geography types (i.e. non-aggregated geography types), and for variables from the DP-prefixed profile tables. The variables that have percent and percent MOEs available directly from the Census API are listed in the profile_only_variables property of the Metadata class. There are 10 DP-prefixed profile variables that do not have percent estimates and MOEs available, which are listed in the profile_only_exceptions property of the Metadata class. For aggregated geography types, the percent and percent MOE are calculated using a base variable as described above.

Poverty variables: There are three poverty-related variables where the percent and percent MOE are available directly from the Census API (after 2010): "pbwpv", "pu18bwpv", and "p65plbwpv". Unlike the profile-only variables above, the percent and percent MOE are not indicated with suffixes "PE" and "PM", but are instead stored as estimates ("E" suffix) and MOEs ("M" suffix) of a separate variable. These separate, percent variables are in the PFF metadata as {pff_variable}_pct. The function calculate_poverty_p_z calls calculate_e_m on {pff_variable}_pct, renames e and m of the results as p and z respectively. The function calculate_poverty_p_z is called by calculate_c_e_m_p_z in place of calling calculate_e_m for the base variable. For aggregated geography types, the percent and percent MOE are calculated using a base variable as described above.

Variables without p and z

Several variables do not have p or z values. This is indicated in the metadata where base_variable is "nan". These include variables that are means, rent/cost values or burdens, and variables that already represent a percent of a population. When calculate_c_e_m_p_z is called for these variables, p and z are set to NULL.

For base variables, p is set to 100 if the geography type is either a city or borough. Otherwise, p is NULL. In both cases, z is NULL.