postgres tempfile work_mem sort - ghdrako/doc_snipets GitHub Wiki

There are 3 types of sort algorithms in PostgreSQL:

  • external sort Disk
  • quicksort
  • top-N heapsort

Parameters

  • temp_tablespaces is a parameter, which allows administrators to control the location of temporary files sent to disk. Using a separate tablespace for temporary files can also help to speed up sorting.
  • temp_file_limit - By default, this has a value of -1 which means there is no limit on the size of the temp file. Set value to this parameter !!!

Disk-Based Storage for Sorting:

  • When work_mem is insufficient, PostgreSQL uses temporary disk files (not temporary tables) to handle the excess data.
  • The data is divided into manageable chunks that fit into work_mem, each chunk is sorted in memory, and the sorted chunks are written to disk.
  • A merge sort is performed on these disk-based sorted chunks to produce the final sorted output.

If you run out of temp file space you could see a PGError that reads something like could not write to temporary file: No space left on device. This will cause the queries in question to fail to execute entirely.

How knowing if you’re using temp files \

  • In EXPLAIN ANALYZE E.g Sort Method: external merge Disk:
  • By enabling log_temp_files on your database, you will be able to search your Postgres logs for temporary file to see if your queries are using them.
  • Some of the things returned from running SELECT * FROM pg_stat_database; is temp_files and temp_bytes, referring respectively to the number of temp files and the number of bytes written to temp files. These numbers however, return an aggregation of all the files created over the lifetime of the database, and do not reflect the current temp file usage. These stats can be reset with select pg_stat_statements_reset()
SELECT interval '1 millisecond' * total_time AS total_exec_time,
to_char(calls, 'FM999G999G999G990') AS ncalls,
total_time / calls AS avg_exec_time_ms,
interval '1 millisecond' * (blk_read_time + blk_write_time) AS sync_io_time,
temp_blks_written,
query AS query
FROM pg_stat_statements WHERE userid = (SELECT usesysid FROM pg_user WHERE usename = current_user LIMIT 1)
AND temp_blks_written > 0
ORDER BY temp_blks_written DESC
LIMIT 20;
  1. Włączenie logowania tworzenia plików tymczasowych
log_line_prefix = '%m [%p] %u@%d '  
  • %m to timestamp,
  • %p to PID procesu,
  • %u to user,
  • %d to database.
# loguj wszystkie pliki tymczasowe większe niż 0 KB
log_temp_files = 0

# (opcjonalnie) jeśli chcesz logować tylko duże pliki, np. > 1 MB
# log_temp_files = 1024

Przeladowanie konfiguracji:

sudo systemctl reload postgresql
pg_ctl reload -D /ścieżka/do/data
2025-05-14 14:22:11.123 CEST [12345] app_user@mydb LOG:  temporary file: path "base/pgsql_tmp/pgsql_tmp12345.0" size  10485760
  1. Wlaczenie rejestracji zapytan
log_statement = 'all'               # loguje WSZYSTKIE zapytania
log_min_duration_statement = 0      # loguj WSZYSTKIE zapytania i ich czasy
2025-05-14 15:00:01.234 CEST [12345] user@db LOG:  statement: SELECT * FROM big_table ORDER BY col;
...
2025-05-14 15:00:01.567 CEST [12345] user@db LOG:  temporary file: path "base/pgsql_tmp/..." size 10485760
  1. Podnoszenie work_mem
  • Globalnie w postgresql.conf:
work_mem = '16MB'
  • dotyczy wszystkich nowych sesji.
ALTER ROLE twoj_uzytkownik SET work_mem = '32MB';
  • Per-session ad hoc:
    SET work_mem = '64MB';
Uwaga: work_mem jest przydzielane per-operacja w zapytaniu. Jeśli masz w jednym zapytaniu np. trzy sorty i dwa hash-joiny, każda z tych operacji może dostać do work_mem tyle pamięci. Uważaj więc, żeby nie przekroczyć pamięci RAM na serwerze.