postgres explain performance query auto_explain - ghdrako/doc_snipets GitHub Wiki

The first row in the plan is always a node, and the other node rows are indented to the right and have an arrow as a prefix (->). The other lines in the plan provide information about the node they are under; therefore, in the preceding example, the Sort Key row is additional information to the Sort node.

Another approach to distinguish node rows from additional information is to consider that every node line has the cost, rows, and width attributes in parentheses.

Once you have discovered the nodes of the query, you have to find out the very first node, which is usually the most indented one, and also the one with the lowest startup cost;

EXPLAIN ( FORMAT JSON ) SELECT * FROM categories;

outputs in TEXT, XML, JSON, and YAML

JSON format provides not only a different structure to the query plan but also a different and more rich set of information.

EXPLAIN ANALYZE

The ANALYZE mode of EXPLAIN enhances the command by effectively running the query to explain. It prints out the best plan to execute the query and it runs the query, also reporting back some statistical information.

While EXPLAIN can only estimate the costs of a node, the EXPLAIN ANALYZE provides feedback on the execution time (expressed in milliseconds), the effective number of rows, and how many times a node has been executed (loops).

EXPLAIN ANALYZE SELECT * FROM posts;

EXPLAIN ANALYZE can also be invoked by passing ANALYZE as an option to EXPLAIN, as follows:

EXPLAIN ( ANALYZE ) SELECT * FROM categories ORDER BY title DESC;

The option form of EXPLAIN ANALYZE is handy when you want to add other options to EXPLAIN, as shown in the following subsection.

Explain opttions

  • VERBOSE option
EXPLAIN (VERBOSE on) SELECT * FROM categories;
  • SETTING option
SET work_mem TO '32MB';
EXPLAIN (SETTINGS on) SELECT * FROM posts ORDER BY created_on DESC;
  • BUFFERS option
EXPLAIN (ANALYZE, BUFFERS on) SELECT * FROM posts;
  • WAL option
EXPLAIN (ANALYZE on, WAL on, FORMAT yaml) INSERT INTO... 
EXPLAIN (analyze, verbose, costs, timing, buffers)
  • analyze true will actually execute the query
  • verbose true will add some more information to the plan (such as column information). costs true will show information about costs.
  • timing true is equally important, as it will provide us with good runtime data so that we can see where in the plan time gets lost.
  • buffers true, which can be very enlightening.

auto_explain

auto-explain triggers when a running query is slower than a specif i ed thresh-old, and then dumps the execution plan of the query in the PostgreSQL logs.

The auto-explain module is configured via a set of auto_explain parameter options that can be inserted in the PostgreSQL configuration (the postgresql.conf file), but you need to remember that in order to activate the module, you need to restart the cluster.

All the settings for auto-explain are named in the namespace auto_explain, therefore any pa-rameter has a prefix that starts with auto_explain.; the main settings are the following ones:

  • auto_explain.log_min_duration is the threshold of time a statement must take before it is logged. Any statement requiring more time than this setting will appear in the cluster logs with its EXPLAIN output.
  • auto_explain.log_format and auto_explain.log_level control the format of the out-put, in terms of text, JSON, YAML, XML and the level at which such output will be logged (e.g., INFO).
  • auto_explain.log_verbose, if turned on, provides more verbose information in the out-put.
  • auto_explain.sample_rate is a value between 0 and 1 indicating the sampling rate of a session. For example, 0.5 means that one statement out of two will be logged.
  • auto_explain.log_nested_statements is a boolean value that determines whether “in-ner” statements have to be logged on their own. For example, if this option is turned on when logging a function call, statements that also happen inside the function will be logged and explained.
  • auto_explain.log_analyze is a boolean value that indicates whether the logged state-ment must also report EXPLAIN ANALYZE values, mainly the actual timing. Be aware that taking per-node timing information can be resource-demanding and thus can slow down the whole query. When this parameter is turned on, other settings can be turned on to provide the same information that EXPLAIN ANALYZE does:
  • auto_explain.log_buffers, when turned on, provides information about the buffer’s utilization.
  • auto_explain.log_wal, when turned on, provides information about the WAL produced by a query.
  • auto_explain.log_timing, when turned on, provides per-node timing information.
  • auto_explain.log_triggers, when turned on, provides information about trigger exe-cutions within a statement.
  • auto_explain.log_settings, when turned on, reports settings different from the clus-ter-wide conf i guration.

In order to install and conf i gure the module, let’s start simple and add the following two settings to the cluster conf i guration in postgresql.conf:

session_preload_libraries = 'auto_explain' 
auto_explain.log_min_duration = '100ms' 

The first line tells PostgreSQL to load the library related to the auto-explain module, while the second instruments the module to trigger whenever a query takes longer than 100 milliseconds to conclude. Of course, you can raise the query duration or lower it, depending on your needs.

test=# LOAD 'auto_explain';
test=# SET auto_explain.log_analyze TO on;
test=# SET auto_explain.log_min_duration TO 500;
session_preload_libraries = 'auto_explain';