update_uprn_addresses - zebrainvest/postgres GitHub Wiki

update_listing_scrapper_data Stored Procedure

Requirement

The UPRN (uprn) value in table certificates_depc is often NULL or NaN, however it is common for the same property (postcode, address) to have multiple records in this table, some of which with a UPRN value. Updating this table's UPRN values when they're NULL/NaN is very costly as this is such a large table and is used very frequently for various processes, and locking it for updates is problematic. A solution is to use a separate table to keep postcode + address + uprn values, and get the UPRN from there instead, by matching postcode + address values from both tables.

Implementation

This is a stored procedure in the production propertytoolkit database.

Adds new addresses with UPRN to the table uprn_addresses (UA), sourced from certificates_depc (DEPC), wherever the UPRN in DEPC is new to UA.

The stored procedure only adds new records, ignoring any already existing records with the same UPRN value.

Notes:

  • This table uses collations for postcode (ignore_punct_case) to ensure that spaces/symbols and case are ignored when comparing to other strings, and another collation in address (num_ignore_punct_case) to also ignores spaces/symbols and case, but treating numbers as a whole in such a way that '1/23' compared to '123' is different, but compared to '01.23' or ' 1 23' is the same.
  • A system cron job (root user's crontab) runs this stored procedure at half past midnight.