20200507 DB Q&A Notes - hackforla/311-data GitHub Wiki
Attendees:
- Joel Henderson
- John Ritchey
- Adam Kendis
- Jake Mensch
- John Darragh
- Cynthia Kiser
- Tim Eccleston
Resources:
Meeting Video Recording
PG Performance Test
Postgres SQL Range Types
Intro to materialized views:
PG Pass
Allows for Env File auto look at startup
Purpose(s) of the meeting:
- Background: 3 teams were looking for help with their postgres DB. We reached out to CFA and got Tim and Joel (with H4LA) to advise our devs.
- 311 - Needs help improving DB performance to have much faster response times for user requests prior to beta release in time for meeting with the mayor. We only write to the database once per day. The user only reads from the database. 32 fields. 5.2M rows. Only use 9 of those fields. User Submits filters. When broad enough, can take 10-15 seconds for a typical query. Redis running. Looking for strategies to improve read times.
- Food Oasis - Needs some help with postgres. John has a lot of experience with DB Administration
- Civic Tech Index - Needs to setup a database for the project.
Notes:
Make DB Smaller / (Slower for quick response user response times)
- Normalize our 1 table
- Remove unused cols
311-Speed up Solutions
Postgres is a great transactional DB. For Speed, do more with Redis or similarly designed solution. Build things as ready to serve as possible:
Pre-Join everything together
Have redundant data and pull out one doc at a time
Pagination
-
1st Filter Set needs to hit Postgres
-
Faster Open Source Alternatives to Postgres: Redis / MongoDB.
-
Date Range Queries are very slow. Make a clustered index that starts with that date field.
-
Make dates a day or week number as an index so searching is faster. (strings, vchar etc. are very slow. Needs to be enum. Will be 10x faster)
-
Have pages for most common queries
Food Oasis Q & A
PG Performance Test
Joel, using a typical MacBook, typical PG 12, and I generated a demo table and data. A few findings are below.
- No meaningful speed difference between using categories as text vs. categories as enum. I’m presuming that the categories are generally short words. Adding an index gave 4x speed.
- The PG time comparison functions are even slower than I thought. Adding an index to the timestamp column gave a 30x improvement.
- Pre-calculating a column “recent_week” and its index, as we discussed, was 10x /worse/ than just using the timestamp field and its index. EXPLAIN shows PG ignores the recent_week index, instead doing a full scan.