Postgresql performance parameters tunning - ghdrako/doc_snipets GitHub Wiki
General step
- Tune your operating system for Postgres (which is effectively tuning your OS for just about any database.
- Tune your Postgres server parameters for your hardware and OS.
- Ensure there is not excessive table bloat involved with the queries you are concerned about.
- Evaluate if you have any query operations that spilling over to the disk
- Check to ensure your table statistics are up to date.
- We will evaluate how to tune your queries if steps 1-5 didn’t fix your performance issues.
Tunning
- https://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server
- https://pgtune.leopard.in.ua
- https://severalnines.com/blog/pgtune-alternatives-clustercontrol-postgresql-configuration/ Tuning tools
- dbForge Studio for PostgreSQL helps to identify productivity bottlenecks, and provides PostgreSQL performance tuning.
- postgresqltuner.pl script can analyze the configuration and make tuning recommendations.
- PgBadger analyse PostgreSQL logs to generate performance reports.
- pgMustard provides tuning advice based on EXPLAIN ANALYZE output.
- pgTuning help you tune the PostgreSQL configuration.
- ClusterControl alternative to pgTuning
- explain.depesz PostgreSQL's explain analyze made readable
Memory management
- https://www.postgresql.org/docs/current/runtime-config-resource.html#RUNTIME-CONFIG-RESOURCE-MEMORY
- https://www.enterprisedb.com/postgres-tutorials/how-tune-postgresql-memory
- https://severalnines.com/blog/what-check-if-postgresql-memory-utilization-high/
- https://dba.stackexchange.com/questions/185880/analyze-memory-usage-of-postgresql-why-is-it-growing-constantly/285423#285423
- https://dba.stackexchange.com/questions/18484/tuning-postgresql-for-large-amounts-of-ram According to section E.2.3.1.4. General Performance PostgreSQL has following behavior until version 13: "Previously, hash aggregation was avoided if it was expected to use more than work_mem memory. [...] once hash aggregation had been chosen, the hash table would be kept in memory no matter how large it got — which could be very large if the planner had misestimated". Using high work_mem value results in hash aggregation getting chosen more often and you end up going over any set memory limits as a result. With PostgreSQL version 13.0 or greater the memory usage will not go above work_mem setting and PostgreSQL will use temporary files on disk to handle the resource requirements instead if planner estimate is really bad. With version 12.x or lesser the actual memory usage is unlimited if hash aggregation is chosen due planner misestimation.
Updates
PostgreSQL never performs updates in place. Instead, a new version of an item (e.g., a table row) is inserted and stored in a free space in the same or a newly allocated block, while the previous version is not overwritten immediately. A VACUUM operation removes them and consolidates the free space in a block when old versions are no longer needed forcurrently running transactions.
in general, new versions are inserted into different locations, and therefore all indexes on the table must be modified. To reduce this negative effect, PostgreSQL uses a technique that sometimes is referred to as HOT(heap-only tuples); an attempt is made to insert the new version into the same block.If the block has sufficient free space and the update does not involve modifying any indexed columns, then there is no need to modify any indexes.
PostgreSQL utilizes first commit wins stategy.
Referential integrity constraints in PostgreSQL are implemented using system triggers, so all observations regarding integrity constraints are applicable for triggers.
Parameters
- https://www.crunchydata.com/blog/optimize-postgresql-server-performance
- https://pgtune.leopard.in.ua/#/ # calculate tunning settings for postgres
- https://sites.google.com/site/itmyshare/database-tips-and-examples/postgres/useful-sqls-to-check-contents-of-postgresql-shared_buffer
- http://www.fibrevillage.com/database/134-useful-sqls-to-check-contents-of-postgresql-shared-buffer
- https://www.postgresql.org/docs/current/runtime-config-resource.html
SHOW ALL ;
select * from pg_settings;
TABLE pg_settings ;
select * from pg_file_settings ;
TABLE pg_file_settings ;
select pg_size_bytes(setting) * 8192/1024/1024
from pg_settings
where name = 'shared_buffers';
select current_setting('block_size');
select p.*,case unit from pg_settings p where p.name in ('shared_buffers','wal_buffers','effective_cache_size','work_mem','maintenance_work_mem');
Local Memory Area
In PostgreSQL, each backend process allocates local memory for query processing; each area is divided into sub-areas whose sizes are either fixed or variable.
The sub-areas are as follow.
work_mem
work_mem speed up grouping and sorting.
SHOW work_mem ;
SET work_mem TO default; -- 4MB
SET work_mem TO '1 GB';
work_mem can be applied to every subquery, so a query with 5 HASHJOINS might cost 5*work_mem. And for worst-case scenarios, you should also think of multiple sessions consuming this amount (again a reason to keep max_connections low).
The executor uses this area for sorting tuples by ORDER BY
and DISTINCT operations. It also uses it for joining tables by merge-join and hash-join operations.
This parameter specifies the amount of memory allocated for each worker process used for sorting and hashing. Increasing this value can improve query performance by allowing more data to be sorted or hashed in memory.
The value of work_mem is used for complex sort operations, and defines the maximum amount of memory to be used for intermediate results, such as hash tables, and for sorting.
Important to ensure that the work_mem value is not set too high - because the system will try to allocate work_mem
several times over for each concurrent sort operation.
Because of this important caveat, it's ideal to set the global value for work_mem at a relatively low value, and then alter any specific queries themselves to use a higher work_mem value:
SET LOCAL work_mem = '256MB';
SELECT * FROM db ORDER BY LOWER(name);
maintenance_work_mem
This parameter is used for some kinds of maintenance operations (VACUUM, REINDEX).
While work_mem specifies how much memory is used for complex sort operations, maintenance_work_mem specifies how much memory is used for routine maintenance tasks, such as VACUUM, CREATE INDEX, and similar.
Unlike work_mem, however, only one of these maintenance operations can be executed at a time by a database session. As a result, most systems do not have many of these processes running concurrently, so it's typically safe to set this value much larger than work_mem, as the larger amounts of available memory could improve the performance of vacuuming and database dump restores.
The default value for maintenance_work_mem is 64MB.
max_parallel_maintenance_workers
- https://www.cybertec-postgresql.com/en/postgresql-parallel-create-index-for-better-performance PostgreSQL 11+ is able to build btree indexes in parallel. max_parallel_maintenance_workers controls the maximum number of worker processes that can be used by CREATE INDEX. As for every parallel operation, PostgreSQL will determine the number of workers based on table sizes. When indexing large tables, index creation can see drastic improvements.
Temp_buffers
The executor uses this area for storing temporary tables
Shared Memory Area
Shared buffer pool
This parameter specifies the amount of memory allocated for shared memory buffers used for caching data. Increasing this value can improve read performance by reducing the frequency of disk reads.
PostgreSQL uses 'double buffering', meaning that PostgreSQL uses its own internal buffer as well as kernel buffered IO. In short, this means that data is stored in memory twice.
Generally speaking, the value for shared_buffers
should be roughly 25% of the total system RAM for a dedicated DB server. The value for shared_buffers
should never be set to reserve all of the system RAM for PostgreSQL. A value over 25% of the system RAM can be useful if, for example, it is set such that the entire database working set of data can fit in cache, as this would greatly reduce the amount of time reading from disk.
wal_buffers
WAL buffer is a buffering area of the WAL data before writing it to a persistent storage.
Write-Ahead Logging (WAL) is a standard method for ensuring integrity of data. Much like in the shared_buffers setting, PostgreSQL writes WAL records into buffers and then these buffers are flushed to disk.
The default size of the buffer is set by the wal_buffers setting- initially at 16MB. If the system being tuned has a large number of concurrent connections, then a higher value for wal_buffers can provide better performance.
effective_cache_size
The effective_cache_size value provides a 'rough estimate' of the number of how much memory is available for disk caching by the operating system and within the database itself, after taking into account what's used by the OS itself and other applications.
This value is used only by the PostgreSQL query planner to figure out whether plans it's considering would be expected to fit in RAM or not. As such, it's a bit of a fuzzy number to define for general use cases.
A conservative value for effective_cache_size would be 1/2 of the total memory available on the system. Most commonly, the value is set to 75% of the total system memory on a dedicated DB server, but can vary depending on the specific discrete needs on a particular server workload.
If the value for effective_cache_size is too low, then the query planner may decide not to use some indexes, even if they would help greatly increase query speed.
Commit Log
The, commit log (CLOG) keeps the states of all transactions, and is part of the concurrency control mechanism. The commit log is allocated to the shared memory and used throughout transaction processing.
PostgreSQL defines the following four transaction states.
- IN_PROGRESS
- COMMITTED
- ABORTED
- SUB-COMMITTED
max_connections
This parameter specifies the maximum number of concurrent connections allowed to the server. Increasing this value can improve scalability by allowing more clients to connect to the server simultaneously.
Disk - tunning parameter
random_page_cost` is the cost for a seek+read on the disk. It is relative to the
sequential_disk_cost, which is 1. The default (4) for
random_page_cost`` is set too high for modern machines and network storage, normally it can be lowered to between 2 and 1.x. For SSD disks yould even set it to 1.0, since seeking is almost for free on SSDs.
Query plan
explain (analyze) select * from users where customer_id < 'ID_HERE'::uuid
When you run ANALYZE what that's doing behind the scenes is computing stats about the distribution of data in your tables. One of those stats is called histogram_bounds, which is range of equally spaced buckets that ANALYZE built after randomly sampling the table.
what's in that histogram via the pg_stats table
SELECT array_length(histogram_bounds, 1) histogram_size, tablename, attname, histogram_bounds FROM pg_stats WHERE tablename = 'users';
analyze users;
Unlogged table
An UNLOGGED table can be created with less I/O because it does not write WAL. It is better than a temporary table because it doesn't disappear if you disconnect and then reconnect. If you are choosing to use an unlogged table, then you are telling PostgreSQL that you are able to recreate the contents of that table in the (unlikely) event of a crash. Because you lose its contents after a crash.
CREATE UNLOGGED TABLE dup_cust AS
SELECT *
FROM ...
Lock table
to prevent any INSERT, UPDATE, or DELETE operations
LOCK TABLE new_cust IN SHARE ROW EXCLUSIVE MODE;
Row identity
In Postgres we use hidden column named ctid
. It denotes the physical location of the row you are observing – for example, duplicate rows will all have different ctid values.
We can use to remove duplicates:
BEGIN;
LOCK TABLE new_cust IN SHARE ROW EXCLUSIVE MODE;
CREATE TEMPORARY TABLE dups_cust AS
SELECT customerid, min(ctid) AS min_ctid
FROM new_cust
GROUP BY customerid
HAVING count(*) > 1;
DELETE FROM new_cust
USING dups_cust
WHERE new_cust.customerid = dups_cust.customerid
AND new_cust.ctid != dups_cust.min_ctid;
COMMIT;
VACUUM new_cust;
The BEGIN
and COMMIT
commands wrap the LOCK
and DELETE
commands into a single transaction, which is required. Otherwise, the lock will be released immediately after being taken.
Another reason to use a single transaction is that we can always roll back if anything goes wrong
JIT
when JIT compilation is enabled, PostgreSQL will check your query, and if it happens to be time-consuming enough, highly optimized code for your query will be created on the fly (just-in-time).
- Configuring JIT
To use JIT, it has to be added at compile time (when running ./configure). The following configuration options are available:
--with-llvm build with LLVM based JIT support
...
LLVM_CONFIG path to llvm-config command
Some Linux distributions ship an extra package containing support for JIT. If you want to make use of JIT, make sure those packages are installed. Once you have made sure that JIT is available, the following configuration parameters will be available so that you can fine-tune JIT compilation for your queries:
#jit = on # allow JIT compilation
#jit_provider = 'llvmjit' # JIT implementation to use
#jit_above_cost = 100000 # perform JIT compilation if
# available
# and query more expensive,
# -1 disables
#jit_optimize_above_cost = 500000 # optimize JITed functions
# if query is
# more expensive, -1 disables
#jit_inline_above_cost = 500000 # attempt to inline operators and
# functions if query is
# more expensive,
# -1 disables
jit_above_cost
means that JIT is only considered if the expected cost is at least 100,000 units.
- running queries
SET jit TO off;
SET jit TO on;