postgres parallel - ghdrako/doc_snipets GitHub Wiki
Parallel query
If the optimizer thinks that parallel processing would reduce the execution time, it will plan additional parallel worker processes. These processes are created by the query executor and live only for the duration of a single SQL statement. The parallel worker processes calculate intermediate results, which eventually are collected at the original backend process. This collection happens in the βGatherβ node of the execution plan: PostgreSQL executes all steps below the Gather node in parallel, while everything above Gather is single-threaded.
Limitation
While parallel query can speed up query execution, it incurs some overhead:
- starting parallel worker processes is an expensive operation
- exchanging data between worker processes and the backend requires inter-process communication via dynamic shared memory segments (DSM)
The PostgreSQL optimizer takes this overhead into account by planning parallel query only expensive statements that process large tables or indexes. There are some additional limits to prevent parallel query from consuming too many resources:
- max_parallel_workers limits the number of parallel workers for the entire database cluster at any given time
- max_parallel_workers_per_gather limits the number of parallel workers that a single statement can use
If the pool defined by max_parallel_workers is exhausted, the query executor cannot start all the parallel processes that the optimizer planned.
EXPLAIN (COSTS OFF) SELECT count(*) FROM bar;
QUERY PLAN
ββββββββββββββββββββββββββββββββββββββββββββ
Finalize Aggregate
-> Gather
Workers Planned: 2
-> Partial Aggregate
-> Parallel Seq Scan on bar
To ensure that there is always just one core at work, use the following command:
SET max_parallel_workers_per_gather TO 0;