postgres parallel query - ghdrako/doc_snipets GitHub Wiki

Parallel query

If the optimizer thinks that parallel processing would reduce the execution time, it will plan additional parallel worker processes. These processes are created by the query executor and live only for the duration of a single SQL statement. The parallel worker processes calculate intermediate results, which eventually are collected at the original backend process. This collection happens in the “Gather” node of the execution plan: PostgreSQL executes all steps below the Gather node in parallel, while everything above Gather is single-threaded.

Limitation

https://www.postgresql.org/docs/current/when-can-parallel-query-be-used.html

While parallel query can speed up query execution, it incurs some overhead:

starting parallel worker processes is an expensive operation
exchanging data between worker processes and the backend requires inter-process communication via dynamic shared memory segments (DSM)

The PostgreSQL optimizer takes this overhead into account by planning parallel query only expensive statements that process large tables or indexes. There are some additional limits to prevent parallel query from consuming too many resources:

max_worker_processes
max_parallel_workers limits the number of parallel workers for the entire database cluster at any given time
max_parallel_workers_per_gather limits the number of parallel workers that a single statement can use

Warunki aby optymalizator uzyl parallel query i inne istotne rzeczy

max_parallel_workers_per_gather - parameter must be greater then 0
dynamic_shared_memory - parameter must not be set to none
database must not be running in single-user mode
query must not write or lock rows
does not use a function marked PARALLEL UNSAFE(for example user defined functions)
Parallel plans must exceed parallel_setup_cost and parallel_tuple_cost parameters
parallel index scans only supported for B-tree indexes - czyli ineksy specjalizowane typu hash index nie beda uzywane
inner side of nested loop and merge join is nonparallel
hash join are parallel but each process create a copy of the hash table - wazne do rozwazenia gdy generujemy duze hash tables

If the pool defined by max_parallel_workers is exhausted, the query executor cannot start all the parallel processes that the optimizer planned.

EXPLAIN (COSTS OFF) SELECT count(*) FROM bar;
 
                 QUERY PLAN                 
════════════════════════════════════════════
 Finalize Aggregate
   ->  Gather
         Workers Planned: 2
         ->  Partial Aggregate
               ->  Parallel Seq Scan on bar

To ensure that there is always just one core at work, use the following command:

SET max_parallel_workers_per_gather TO 0;