postgres table and index bloat - ghdrako/doc_snipets GitHub Wiki

Causes of Table Bloat in PostgreSQL

  • Multi-Version Concurrency Control (MVCC) - allow concurrent transactions without locking entire tables
  • Frequent Updates or Deletes - When a row is updated, PostgreSQL creates a new version while keeping the old one until all referencing transactions are complete
  • Long-Running Transactions - PostgreSQL tracks row visibility using transaction IDs (XIDs). If a transaction remains open for a long time, dead tuples cannot be reused because the database does not know if the transaction still needs them.
  • Ineffective VACUUM Process
  • Lack of or Delayed Autovacuum
  • TOAST tables can bloat if large object columns are frequently updated or deleted without corresponding vacuuming, contributing to overall table bloat
  • Indexes can also bloat similarly to tables

Effects of Table Bloat

  • Increased Disk Space Usage
  • Slower Query Performance
  • Inefficient Indexes
  • Increased I/O: Bloat leads to increased disk I/O as PostgreSQL reads and writes more data than necessary, impacting overall system performance.

Identifying Table Bloat

  • pg_stat_user_tables : This system view provides information about tables, including the count of live and dead tuples.
SELECT relname, n_live_tup, n_dead_tup
FROM pg_stat_user_tables
WHERE n_dead_tup > 0;
  • pgstattuple extension: This extension provides detailed information about table bloat.
CREATE EXTENSION pgstattuple;
SELECT * FROM pgstattuple('your_table_name');

pg_freespacemap : This extension shows free space in tables and indexes.

Preventing and Reducing Table Bloat

  • VACUUM Frequently
  • Tune Autovacuum
    • autovacuum_vacuum_threshold
    • autovacuum_vacuum_scale_factor
    • autovacuum_naptime
  • VACUUM FULL for Severe Bloat VACUUM FULL table_name;
  • Rebuild Indexes REINDEX TABLE table_name;
  • Partitioning - Partition large tables so that older, less frequently accessed data can be vacuumed separately, reducing bloat in frequently updated partitions.
  • Avoid Long-Running Transactions - Keep transactions as short as possible to prevent dead tuples from being vacuumed and reclaimed.
  • Archiving or Deleting Old Data - Regularly archive or delete old or unnecessary data to maintain manageable table sizes.

pgcompacttable

reindex

In PostgreSQL 12, REINDEX gained support for the CONCURRENTLY option. This works similarly to how pg_repack does, but it’s native to PostgreSQL. By being native, it continues to receive improvements in newer versions.

REINDEX (VERBOSE)  INDEX  CONCURRENTLY index_trips_on_driver_id;

Providing a table name as an option to REINDEX performs a reindex for all indexes that are on the specified table.

REINDEX (VERBOSE)  TABLE  CONCURRENTLY trips;