Postgresql performance parameters tunning - ghdrako/doc_snipets GitHub Wiki

General step

  1. Tune your operating system for Postgres (which is effectively tuning your OS for just about any database.
  2. Tune your Postgres server parameters for your hardware and OS.
  3. Ensure there is not excessive table bloat involved with the queries you are concerned about.
  4. Evaluate if you have any query operations that spilling over to the disk
  5. Check to ensure your table statistics are up to date.
  6. We will evaluate how to tune your queries if steps 1-5 didn’t fix your performance issues.

Tunning

Memory management

Updates

PostgreSQL never performs updates in place. Instead, a new version of an item (e.g., a table row) is inserted and stored in a free space in the same or a newly allocated block, while the previous version is not overwritten immediately. A VACUUM operation removes them and consolidates the free space in a block when old versions are no longer needed forcurrently running transactions.

in general, new versions are inserted into different locations, and therefore all indexes on the table must be modified. To reduce this negative effect, PostgreSQL uses a technique that sometimes is referred to as HOT(heap-only tuples); an attempt is made to insert the new version into the same block.If the block has sufficient free space and the update does not involve modifying any indexed columns, then there is no need to modify any indexes.

PostgreSQL utilizes first commit wins stategy.

Referential integrity constraints in PostgreSQL are implemented using system triggers, so all observations regarding integrity constraints are applicable for triggers.

Parameters

SHOW ALL ;
select * from pg_settings;
TABLE pg_settings ;
select * from pg_file_settings ;
TABLE pg_file_settings ;

select pg_size_bytes(setting) * 8192/1024/1024
   from pg_settings
   where name = 'shared_buffers';

select current_setting('block_size');

select p.*,case unit from pg_settings p where p.name in ('shared_buffers','wal_buffers','effective_cache_size','work_mem','maintenance_work_mem');

Local Memory Area

In PostgreSQL, each backend process allocates local memory for query processing; each area is divided into sub-areas whose sizes are either fixed or variable.

The sub-areas are as follow.

work_mem

work_mem speed up grouping and sorting.

SHOW work_mem ;
SET work_mem TO default; -- 4MB
SET work_mem TO '1 GB';

work_mem can be applied to every subquery, so a query with 5 HASHJOINS might cost 5*work_mem. And for worst-case scenarios, you should also think of multiple sessions consuming this amount (again a reason to keep max_connections low).

The executor uses this area for sorting tuples by ORDER BY and DISTINCT operations. It also uses it for joining tables by merge-join and hash-join operations.

This parameter specifies the amount of memory allocated for each worker process used for sorting and hashing. Increasing this value can improve query performance by allowing more data to be sorted or hashed in memory.

The value of work_mem is used for complex sort operations, and defines the maximum amount of memory to be used for intermediate results, such as hash tables, and for sorting.

Important to ensure that the work_mem value is not set too high - because the system will try to allocate work_mem several times over for each concurrent sort operation.

Because of this important caveat, it's ideal to set the global value for work_mem at a relatively low value, and then alter any specific queries themselves to use a higher work_mem value:

SET LOCAL work_mem = '256MB';
SELECT * FROM db ORDER BY LOWER(name);

maintenance_work_mem

This parameter is used for some kinds of maintenance operations (VACUUM, REINDEX).

While work_mem specifies how much memory is used for complex sort operations, maintenance_work_mem specifies how much memory is used for routine maintenance tasks, such as VACUUM, CREATE INDEX, and similar.

Unlike work_mem, however, only one of these maintenance operations can be executed at a time by a database session. As a result, most systems do not have many of these processes running concurrently, so it's typically safe to set this value much larger than work_mem, as the larger amounts of available memory could improve the performance of vacuuming and database dump restores.

The default value for maintenance_work_mem is 64MB.

max_parallel_maintenance_workers

Temp_buffers

The executor uses this area for storing temporary tables

Shared Memory Area

Shared buffer pool

This parameter specifies the amount of memory allocated for shared memory buffers used for caching data. Increasing this value can improve read performance by reducing the frequency of disk reads.

PostgreSQL uses 'double buffering', meaning that PostgreSQL uses its own internal buffer as well as kernel buffered IO. In short, this means that data is stored in memory twice.

Generally speaking, the value for shared_buffers should be roughly 25% of the total system RAM for a dedicated DB server. The value for shared_buffers should never be set to reserve all of the system RAM for PostgreSQL. A value over 25% of the system RAM can be useful if, for example, it is set such that the entire database working set of data can fit in cache, as this would greatly reduce the amount of time reading from disk.

wal_buffers

WAL buffer is a buffering area of the WAL data before writing it to a persistent storage.

Write-Ahead Logging (WAL) is a standard method for ensuring integrity of data. Much like in the shared_buffers setting, PostgreSQL writes WAL records into buffers and then these buffers are flushed to disk.

The default size of the buffer is set by the wal_buffers setting- initially at 16MB. If the system being tuned has a large number of concurrent connections, then a higher value for wal_buffers can provide better performance.

effective_cache_size

The effective_cache_size value provides a 'rough estimate' of the number of how much memory is available for disk caching by the operating system and within the database itself, after taking into account what's used by the OS itself and other applications.

This value is used only by the PostgreSQL query planner to figure out whether plans it's considering would be expected to fit in RAM or not. As such, it's a bit of a fuzzy number to define for general use cases.

A conservative value for effective_cache_size would be 1/2 of the total memory available on the system. Most commonly, the value is set to 75% of the total system memory on a dedicated DB server, but can vary depending on the specific discrete needs on a particular server workload.

If the value for effective_cache_size is too low, then the query planner may decide not to use some indexes, even if they would help greatly increase query speed.

Commit Log

The, commit log (CLOG) keeps the states of all transactions, and is part of the concurrency control mechanism. The commit log is allocated to the shared memory and used throughout transaction processing.

PostgreSQL defines the following four transaction states.

  • IN_PROGRESS
  • COMMITTED
  • ABORTED
  • SUB-COMMITTED

max_connections

This parameter specifies the maximum number of concurrent connections allowed to the server. Increasing this value can improve scalability by allowing more clients to connect to the server simultaneously.

Disk - tunning parameter

random_page_cost` is the cost for a seek+read on the disk. It is relative to the sequential_disk_cost, which is 1. The default (4) for random_page_cost`` is set too high for modern machines and network storage, normally it can be lowered to between 2 and 1.x. For SSD disks yould even set it to 1.0, since seeking is almost for free on SSDs.

Query plan

explain (analyze) select * from users where customer_id < 'ID_HERE'::uuid

When you run ANALYZE what that's doing behind the scenes is computing stats about the distribution of data in your tables. One of those stats is called histogram_bounds, which is range of equally spaced buckets that ANALYZE built after randomly sampling the table.

what's in that histogram via the pg_stats table

SELECT array_length(histogram_bounds, 1) histogram_size, tablename, attname, histogram_bounds FROM pg_stats WHERE tablename = 'users';

analyze users;

Unlogged table

An UNLOGGED table can be created with less I/O because it does not write WAL. It is better than a temporary table because it doesn't disappear if you disconnect and then reconnect. If you are choosing to use an unlogged table, then you are telling PostgreSQL that you are able to recreate the contents of that table in the (unlikely) event of a crash. Because you lose its contents after a crash.

CREATE UNLOGGED TABLE dup_cust AS

SELECT *
FROM ...

Lock table

to prevent any INSERT, UPDATE, or DELETE operations

LOCK TABLE new_cust IN SHARE ROW EXCLUSIVE MODE;

Row identity

In Postgres we use hidden column named ctid. It denotes the physical location of the row you are observing – for example, duplicate rows will all have different ctid values.

We can use to remove duplicates:

BEGIN;
LOCK TABLE new_cust IN SHARE ROW EXCLUSIVE MODE;

CREATE TEMPORARY TABLE dups_cust AS
SELECT customerid, min(ctid) AS min_ctid
FROM new_cust
GROUP BY customerid
HAVING count(*) > 1;

DELETE FROM new_cust
USING dups_cust
WHERE new_cust.customerid = dups_cust.customerid
AND new_cust.ctid != dups_cust.min_ctid;

COMMIT;

VACUUM new_cust;

The BEGIN and COMMIT commands wrap the LOCK and DELETE commands into a single transaction, which is required. Otherwise, the lock will be released immediately after being taken.

Another reason to use a single transaction is that we can always roll back if anything goes wrong

JIT

when JIT compilation is enabled, PostgreSQL will check your query, and if it happens to be time-consuming enough, highly optimized code for your query will be created on the fly (just-in-time).

  • Configuring JIT

To use JIT, it has to be added at compile time (when running ./configure). The following configuration options are available:

 --with-llvm build with LLVM based JIT support
...
 LLVM_CONFIG path to llvm-config command

Some Linux distributions ship an extra package containing support for JIT. If you want to make use of JIT, make sure those packages are installed. Once you have made sure that JIT is available, the following configuration parameters will be available so that you can fine-tune JIT compilation for your queries:

#jit = on                         # allow JIT compilation
#jit_provider = 'llvmjit'         # JIT implementation to use
#jit_above_cost = 100000          # perform JIT compilation if
                                  # available
                                  # and query more expensive,
                                  # -1 disables
#jit_optimize_above_cost = 500000 # optimize JITed functions
                                  # if query is
                                  # more expensive, -1 disables
#jit_inline_above_cost = 500000   # attempt to inline operators and
                                  # functions if query is
                                  # more expensive,
                                  # -1 disables

jit_above_cost means that JIT is only considered if the expected cost is at least 100,000 units.

  • running queries
SET jit TO off;
SET jit TO on;