postgres execution plan - ghdrako/doc_snipets GitHub Wiki

Gdy serwer pobiera żądanie, wykonywany jest zestaw procesów. Najpierw parser wykonuje analizę składniową żądania. Następnie, jeśli to konieczne, rewriter przepisuje żądanie. Aby to zrobić, bierze pod uwagę reguły, widoki niematerializowane i funkcje SQL. Jeśli reguła wymaga zmiany zapytania, wysłane zapytanie jest zastępowane nowym. Jeśli używany jest widok niematerializowany, zapytanie, które zawiera, jest integrowane z wysłanym zapytaniem. To samo dotyczy osadzalnej funkcji SQL. Następnie planista wygeneruje wszystkie plany wykonania. Oblicza koszt każdego planu, a następnie wybiera najtańszy plan, a więc najbardziej interesujący. Na koniec wykonawca wykonuje zapytanie. Aby to zrobić, musi zacząć od odzyskania niezbędnych blokad na obiektach docelowych. Po odzyskaniu blokad wykonywane jest zapytanie.Po wykonaniu żądania wysyła wyniki użytkownikowi.

Widać tutaj kilka wąskich gardeł. Najważniejsze to:

Planowanie (tak bardzo, że czasami preferowane jest wygenerowanie tylko kilku planów, szybciej przejścia do fazy wykonania);
odzyskiwanie blokad (żądanie może poczekać kilka sekund, minut lub nawet godzin, przed odzyskaniem blokad i faktycznie wykonania żądania);
wykonanie żądania;
Wysyłanie wyników do użytkownika.

Możliwe jest śledzenie wykonywania różnych kroków przy użyciu opcji log_parser_stats, log_planner_stats i log_executor_stats. Oto kompletny przykład:

Ustawianie konfiguracji w sesji:

SET log_parser_stats TO on;
SET log_planner_stats TO on;
SET log_executor_stats TO on;
SET client_min_messages TO log;

Wykonywanie zapytania:

To build a plan, the optimizer uses transformation rules, heuristics, and cost-based optimization algorithms. A rule converts a plan into another plan with better cost. For example, filter and project operations reduce the size of the dataset and therefore should be executed as early as possible; a rule might reorder operations so that filter and project operations are executed sooner. An optimization algorithm chooses the plan with the lowest cost estimate. However, the number of possible plans (called the plan space) for a query containing several operations is huge—far too large for the algorithm to consider every single possible plan. After all, time spent choosing the correct algorithm contributes to the total execution time of the query. Heuristics are used to reduce the number of plans evaluated by the optimizer.

The cost of each execution plan depends on

Cost formulas of algorithms used in the plan
Statistical data on tables and indexes, including distribution of values
System settings (parameters and preferences), such as join_ collapse_limit or cpu_index_tuple_cost

Since PG uses one process per connection and the LLVM JIT code is process specific the code can't be shared amongst all connections to the DB.

Plans themselves suffer from this since they are in memory data structures not designed to be shared amongst different processes.

Because each worker executes the parallel portion of the plan to completion, it is not possible to simply take an ordinary query plan and run it using multiple workers. Each worker would produce a full copy of the output result set, so the query would not run any faster than normal but would produce incorrect results. Instead, the parallel portion of the plan must be what is known internally to the query optimizer as a partial plan; that is, it must be constructed so that each process that executes the plan will generate only a subset of the output rows in such a way that each required output row is guaranteed to be generated by exactly one of the cooperating processes.

The execution of a plan starts from the leaves and ends at the root. This means that the operation that is executed first will be on the line that has the rightmost offset. Of course, a plan may contain several leaf nodes( start with ->) that are executed independently.

bitmap index scan - engine accesses the index and compiles the list of blocks that could contain needed records
bitmap heap scan - reads the actual blocks from the database - output from bitmap index scan
rechecks
filter
sequential scan

settings influence the planner

# - Planner Method Configuration -
#enable_async_append = on
#enable_bitmapscan = on
#enable_gathermerge = on
#enable_hashagg = on
#enable_hashjoin = on
#enable_incremental_sort = on
#enable_indexscan = on
#enable_indexonlyscan = on
#enable_material = on
#enable_memoize = on
#enable_mergejoin = on
#enable_nestloop = on
#enable_parallel_append = on
#enable_parallel_hash = on
#enable_partition_pruning = on
#enable_partitionwise_join = off
#enable_partitionwise_aggregate = off
#enable_seqscan = on
#enable_sort = on
#enable_tidscan = on
#enable_group_by_reordering = on

This settings should only be used to speed up individual queries and not turn off things globally.

partition-wise operations are turned off by default. This is done to reduce planning time. However, if you happen to run excessive analytic queries, it can make a lot of sense to turn those settings on.

genetic query optimization

Genetic Query Optimization (GEQO)

PostgreSQL will approach this problem just like a traveling salesman problem and encode the possible joins as integer strings. For example, 4-1-3-2 means joining 4 and 1 first, then 3, and then 2. The numbers represent the relations’ IDs. First, the genetic optimizer will generate a random set of plans. Those plans are then inspected. The bad ones are discarded, and new ones are generated based on the genes of the good ones. This way, potentially even better plans are generated. This process can be repeated as often as desired.

GEQO can be turned on and off by adjusting the geqo variable, as shown in the following lines of code:

test=# SHOW geqo;
 geqo
------
 on
(1 row)
test=# SET geqo TO off;

The listing shows how GEQO can be turned on and off. By default, the geqo variable kicks in if a statement exceeds a certain level of complexity, which is controlled by the following variable:

test=# SHOW geqo_threshold ;
 geqo_threshold
----------------
 12
(1 row)

If your queries are so large that you start to reach this threshold, it certainly makes sense to play with this setting to see how plans are changed by the planner if you change those variables.As a general rule, however, I would suggest avoiding GEQO for as long as possible and trying to fix things first by attempting to somehow fix the join order by using the join_collapse_limit variable. Note that every query is different, so it certainly helps to experiment and gain more experience by learning how the planner behaves under different circumstances.

The geqo_threshold configuration parameter specifies the maximum number of joins in a query for which the near-exhaustive search of the best join sequence is performed. If the number of tables exceeds the maximum, the joins order is determined by heuristics. Heuristics cut out parts of the plan space that are unlikely to contain optimal plans, reducing the number of plans examined by the optimization algorithm. While this feature helps the optimizer select an execution plan more quickly, it can also affect performance negatively: there is a risk that the best execution plan will be accidentally dropped before the cost comparison.