postgres parallel query - ghdrako/doc_snipets GitHub Wiki
Parallel query
If the optimizer thinks that parallel processing would reduce the execution time, it will plan additional parallel worker processes. These processes are created by the query executor and live only for the duration of a single SQL statement. The parallel worker processes calculate intermediate results, which eventually are collected at the original backend process. This collection happens in the βGatherβ node of the execution plan: PostgreSQL executes all steps below the Gather node in parallel, while everything above Gather is single-threaded.
Limitation
While parallel query can speed up query execution, it incurs some overhead:
- starting parallel worker processes is an expensive operation
- exchanging data between worker processes and the backend requires inter-process communication via dynamic shared memory segments (DSM)
The PostgreSQL optimizer takes this overhead into account by planning parallel query only expensive statements that process large tables or indexes. There are some additional limits to prevent parallel query from consuming too many resources:
max_worker_processes- max_parallel_workers limits the number of parallel workers for the entire database cluster at any given time
- max_parallel_workers_per_gather limits the number of parallel workers that a single statement can use
Warunki aby optymalizator uzyl parallel query i inne istotne rzeczy
max_parallel_workers_per_gather- parameter must be greater then 0dynamic_shared_memory- parameter must not be set to none- database must not be running in single-user mode
- query must not write or lock rows
- does not use a function marked
PARALLEL UNSAFE(for example user defined functions) - Parallel plans must exceed
parallel_setup_costandparallel_tuple_costparameters - parallel index scans only supported for B-tree indexes - czyli ineksy specjalizowane typu hash index nie beda uzywane
- inner side of nested loop and merge join is nonparallel
- hash join are parallel but each process create a copy of the hash table - wazne do rozwazenia gdy generujemy duze hash tables
If the pool defined by max_parallel_workers is exhausted, the query executor cannot start all the parallel processes that the optimizer planned.
EXPLAIN (COSTS OFF) SELECT count(*) FROM bar;
QUERY PLAN
ββββββββββββββββββββββββββββββββββββββββββββ
Finalize Aggregate
-> Gather
Workers Planned: 2
-> Partial Aggregate
-> Parallel Seq Scan on bar
To ensure that there is always just one core at work, use the following command:
SET max_parallel_workers_per_gather TO 0;