postgres table and index bloat - ghdrako/doc_snipets GitHub Wiki
- https://github.com/ioguix/pgsql-bloat-estimation/blob/master/table/table_bloat.sql
- https://github.com/ioguix/pgsql-bloat-estimation/blob/master/btree/btree_bloat.sql
- https://github.com/reorg/pg_repack
- https://hakibenita.com/postgresql-unused-index-size
- https://github.com/ioguix/pgsql-bloat-estimation
- https://www.percona.com/blog/pg_repack-rebuild-postgresql-database-objects-online
- https://www.percona.com/blog/pg_repack-rebuild-postgresql-database-objects-online
- https://github.com/andyatkinson/pg_scripts/blob/main/top_updated_tables.sql
- https://docs.crunchybridge.com/insights-metrics/bloat-and-vacuum#
Causes of Table Bloat in PostgreSQL
- Multi-Version Concurrency Control (MVCC) - allow concurrent transactions without locking entire tables
- Frequent Updates or Deletes - When a row is updated, PostgreSQL creates a new version while keeping the old one until all referencing transactions are complete
- Long-Running Transactions - PostgreSQL tracks row visibility using transaction IDs (XIDs). If a transaction remains open for a long time, dead tuples cannot be reused because the database does not know if the transaction still needs them.
- Ineffective VACUUM Process
- Lack of or Delayed Autovacuum
- TOAST tables can bloat if large object columns are frequently updated or deleted without corresponding vacuuming, contributing to overall table bloat
- Indexes can also bloat similarly to tables
Effects of Table Bloat
- Increased Disk Space Usage
- Slower Query Performance
- Inefficient Indexes
- Increased I/O: Bloat leads to increased disk I/O as PostgreSQL reads and writes more data than necessary, impacting overall system performance.
Identifying Table Bloat
pg_stat_user_tables
: This system view provides information about tables, including the count of live and dead tuples.
SELECT relname, n_live_tup, n_dead_tup
FROM pg_stat_user_tables
WHERE n_dead_tup > 0;
pgstattuple extension
: This extension provides detailed information about table bloat.
CREATE EXTENSION pgstattuple;
SELECT * FROM pgstattuple('your_table_name');
• pg_freespacemap
: This extension shows free space in tables and indexes.
Preventing and Reducing Table Bloat
- VACUUM Frequently
- Tune Autovacuum
- autovacuum_vacuum_threshold
- autovacuum_vacuum_scale_factor
- autovacuum_naptime
- VACUUM FULL for Severe Bloat
VACUUM FULL table_name;
- Rebuild Indexes
REINDEX TABLE table_name;
- Partitioning - Partition large tables so that older, less frequently accessed data can be vacuumed separately, reducing bloat in frequently updated partitions.
- Avoid Long-Running Transactions - Keep transactions as short as possible to prevent dead tuples from being vacuumed and reclaimed.
- Archiving or Deleting Old Data - Regularly archive or delete old or unnecessary data to maintain manageable table sizes.
pgcompacttable
- https://github.com/dataegret/pgcompacttable It works by reordering tuples from the end of a table towards the front of a table, which allows the table size to shrink.
reindex
In PostgreSQL 12, REINDEX
gained support for the CONCURRENTLY
option. This works similarly to how pg_repack
does, but it’s native to PostgreSQL. By being native, it continues to receive improvements in newer versions.
REINDEX (VERBOSE) INDEX CONCURRENTLY index_trips_on_driver_id;
Providing a table name as an option to REINDEX performs a reindex for all indexes that are on the specified table.
REINDEX (VERBOSE) TABLE CONCURRENTLY trips;