postgres execution plan - ghdrako/doc_snipets GitHub Wiki

To build a plan, the optimizer uses transformation rules, heuristics, and cost-based optimization algorithms. A rule converts a plan into another plan with better cost. For example, filter and project operations reduce the size of the dataset and therefore should be executed as early as possible; a rule might reorder operations so that filter and project operations are executed sooner. An optimization algorithm chooses the plan with the lowest cost estimate. However, the number of possible plans (called the plan space) for a query containing several operations is huge—far too large for the algorithm to consider every single possible plan. After all, time spent choosing the correct algorithm contributes to the total execution time of the query. Heuristics are used to reduce the number of plans evaluated by the optimizer.

The cost of each execution plan depends on

  • Cost formulas of algorithms used in the plan
  • Statistical data on tables and indexes, including distribution of values
  • System settings (parameters and preferences), such as join_ collapse_limit or cpu_index_tuple_cost

Since PG uses one process per connection and the LLVM JIT code is process specific the code can't be shared amongst all connections to the DB.

Plans themselves suffer from this since they are in memory data structures not designed to be shared amongst different processes.

Because each worker executes the parallel portion of the plan to completion, it is not possible to simply take an ordinary query plan and run it using multiple workers. Each worker would produce a full copy of the output result set, so the query would not run any faster than normal but would produce incorrect results. Instead, the parallel portion of the plan must be what is known internally to the query optimizer as a partial plan; that is, it must be constructed so that each process that executes the plan will generate only a subset of the output rows in such a way that each required output row is guaranteed to be generated by exactly one of the cooperating processes.

The execution of a plan starts from the leaves and ends at the root. This means that the operation that is executed first will be on the line that has the rightmost offset. Of course, a plan may contain several leaf nodes( start with ->) that are executed independently.

  • bitmap index scan - engine accesses the index and compiles the list of blocks that could contain needed records
  • bitmap heap scan - reads the actual blocks from the database - output from bitmap index scan
  • rechecks
  • filter
  • sequential scan

settings influence the planner

# - Planner Method Configuration -
#enable_async_append = on
#enable_bitmapscan = on
#enable_gathermerge = on
#enable_hashagg = on
#enable_hashjoin = on
#enable_incremental_sort = on
#enable_indexscan = on
#enable_indexonlyscan = on
#enable_material = on
#enable_memoize = on
#enable_mergejoin = on
#enable_nestloop = on
#enable_parallel_append = on
#enable_parallel_hash = on
#enable_partition_pruning = on
#enable_partitionwise_join = off
#enable_partitionwise_aggregate = off
#enable_seqscan = on
#enable_sort = on
#enable_tidscan = on
#enable_group_by_reordering = on

This settings should only be used to speed up individual queries and not turn off things globally.

partition-wise operations are turned off by default. This is done to reduce planning time. However, if you happen to run excessive analytic queries, it can make a lot of sense to turn those settings on.

genetic query optimization

Genetic Query Optimization (GEQO)

PostgreSQL will approach this problem just like a traveling salesman problem and encode the possible joins as integer strings. For example, 4-1-3-2 means joining 4 and 1 first, then 3, and then 2. The numbers represent the relations’ IDs. First, the genetic optimizer will generate a random set of plans. Those plans are then inspected. The bad ones are discarded, and new ones are generated based on the genes of the good ones. This way, potentially even better plans are generated. This process can be repeated as often as desired.

GEQO can be turned on and off by adjusting the geqo variable, as shown in the following lines of code:

test=# SHOW geqo;
 geqo
------
 on
(1 row)
test=# SET geqo TO off;

The listing shows how GEQO can be turned on and off. By default, the geqo variable kicks in if a statement exceeds a certain level of complexity, which is controlled by the following variable:

test=# SHOW geqo_threshold ;
 geqo_threshold
----------------
 12
(1 row)

If your queries are so large that you start to reach this threshold, it certainly makes sense to play with this setting to see how plans are changed by the planner if you change those variables.As a general rule, however, I would suggest avoiding GEQO for as long as possible and trying to fix things first by attempting to somehow fix the join order by using the join_collapse_limit variable. Note that every query is different, so it certainly helps to experiment and gain more experience by learning how the planner behaves under different circumstances.

The geqo_threshold configuration parameter specifies the maximum number of joins in a query for which the near-exhaustive search of the best join sequence is performed. If the number of tables exceeds the maximum, the joins order is determined by heuristics. Heuristics cut out parts of the plan space that are unlikely to contain optimal plans, reducing the number of plans examined by the optimization algorithm. While this feature helps the optimizer select an execution plan more quickly, it can also affect performance negatively: there is a risk that the best execution plan will be accidentally dropped before the cost comparison.