Database investigation tools - CDCgov/prime-simplereport GitHub Wiki

Query Performance Investigations

That is a bummer, but it'll be ok! You'll get through this. We've set up a few things to help you get some data about your queries to act on.

Please read the entirety of this doc and any comments in the linked file.

After reading, head on over to this file in your IDE to make the change needed for an investigation. Turn on the settings by following the instructions in each configuration block.

log_min_duration_statement

Postgresql log_min_duration_statement docs.

This is a straightforward logging option used for logging slow queries. It can be used to determine which queries are taking a long time; that may be enough information to work from.

To view these logs, head on over to the azure portal, visit the Postgresql Flexible Server service, select the database you want to view the logs of and select the Logs blade. You can use the following query to filter the records. Feel free to add a timeframe and additional filters to fit your needs.

AzureDiagnostics
| where ResourceProvider =="MICROSOFT.DBFORPOSTGRESQL" 
| where Category == "PostgreSQLLogs"
| where Message contains "duration:"

Once you've found a troublesome query, you could try running EXPLAIN ANALYZE against that query to get even more information.

pg_stat_statements

PostgreSQL pg_stat_statements docs.

pg_stat_statements provides a means for tracking execution statistics of all SQL statements executed by a server. We can uncover which queries may be hogging memory, CPU, and I/O using this. We can also use it to find efficient queries that run often, adding load.

To get everything out of pg_stat_statements, you may need to enable other configuration options to get all the data you want. An example of this is track_io_timing. This needs to be enabled to get accurate read/write load data. PostgreSQL track_io_timing docs.

Query the pg_stat_statements table

Here are some potentially helpful queries. (if you write or run a query that you've found to be helpful, please add it to the list!)

Top 10 I/O intensive queries:

SELECT userid::regrole, dbid, query
FROM pg_stat_statements
ORDER BY (blk_read_time + blk_write_time) / calls DESC
LIMIT 10;

Top 10 time-consuming queries:

SELECT userid::regrole, dbid, mean_exec_time, query
FROM pg_stat_statements
ORDER BY mean_exec_time DESC
LIMIT 10;

Top 10 response-time outliers:

SELECT userid::regrole, dbid, stddev_exec_time, query
FROM pg_stat_statements
ORDER BY stddev_exec_time DESC
LIMIT 10;

Top 10 queries by memory usage:

SELECT userid::regrole, dbid, query
FROM pg_stat_statements
ORDER BY (shared_blks_hit + shared_blks_dirtied) DESC
LIMIT 10;

auto_explain

PostgreSQL auto_explain docs.

auto_explain allows you to log the analysis of queries that take longer than the threshold set. The power here is that it lets you see what a query does. If you suspect that a query plan may be unstable, this is a great option to explore.

To view these logs, head on over to the azure portal, visit the Postgresql Flexible Server service, select the database you want to view the logs of and select the Logs blade. You can use the following query to filter the records. Feel free to add a timeframe and additional filters to fit your needs.

AzureDiagnostics
| where ResourceProvider =="MICROSOFT.DBFORPOSTGRESQL" 
| where Category == "PostgreSQLLogs"
| where Message contains "plan:"

Hint: you can export your findings for easier viewing

To do so, you can run the psql connection command with a -c containing the query you want to run, then pipe that into a file for viewing or export.

psql "FILL_IN_DB_CONNECTION_DETAILS" -c "select userid::regrole, dbid, query from pg_stat_statements order by (blk_read_time+blk_write_time)/calls desc limit 10;" > /export/io_intensive_queries.txt