postgres table and index bloat - ghdrako/doc_snipets GitHub Wiki
- https://github.com/ioguix/pgsql-bloat-estimation/blob/master/table/table_bloat.sql
- https://github.com/ioguix/pgsql-bloat-estimation/blob/master/btree/btree_bloat.sql
- https://github.com/reorg/pg_repack
- https://hakibenita.com/postgresql-unused-index-size
- https://github.com/ioguix/pgsql-bloat-estimation
- https://www.percona.com/blog/pg_repack-rebuild-postgresql-database-objects-online
- https://www.percona.com/blog/pg_repack-rebuild-postgresql-database-objects-online
- https://github.com/andyatkinson/pg_scripts/blob/main/top_updated_tables.sql
- https://docs.crunchybridge.com/insights-metrics/bloat-and-vacuum#
Causes of Table Bloat in PostgreSQL
- Multi-Version Concurrency Control (MVCC) - allow concurrent transactions without locking entire tables
- Frequent Updates or Deletes - When a row is updated, PostgreSQL creates a new version while keeping the old one until all referencing transactions are complete
- Long-Running Transactions - PostgreSQL tracks row visibility using transaction IDs (XIDs). If a transaction remains open for a long time, dead tuples cannot be reused because the database does not know if the transaction still needs them.
- Ineffective VACUUM Process
- Lack of or Delayed Autovacuum
- TOAST tables can bloat if large object columns are frequently updated or deleted without corresponding vacuuming, contributing to overall table bloat
- Indexes can also bloat similarly to tables
Effects of Table Bloat
- Increased Disk Space Usage
- Slower Query Performance
- Inefficient Indexes
- Increased I/O: Bloat leads to increased disk I/O as PostgreSQL reads and writes more data than necessary, impacting overall system performance.
Identifying Table Bloat
pg_stat_user_tables: This system view provides information about tables, including the count of live and dead tuples.
SELECT relname, n_live_tup, n_dead_tup
FROM pg_stat_user_tables
WHERE n_dead_tup > 0;
SELECT n_live_tup live_tupes, n_dead_tup dead_tuples,
round((n_dead_tup::numeric/n_live_tup)*100,2) percent_dead_tuples
FROM pg_stat_user_tables WHERE relname='rental';
pgstattuple extension: This extension provides detailed information about table bloat, which includes how much free space is currently available in the table.
Note that using pgstattuple requires scanning the entire table, which can cause a significant load, but also provides more accurate information.
CREATE EXTENSION pgstattuple;
SELECT * FROM pgstattuple('your_table_name');
• pg_freespacemap : This extension shows free space in tables and indexes.
Preventing and Reducing Table Bloat
- VACUUM Frequently
- Tune Autovacuum
- autovacuum_vacuum_threshold
- autovacuum_vacuum_scale_factor
- autovacuum_naptime
- VACUUM FULL for Severe Bloat
VACUUM FULL table_name; - Rebuild Indexes
REINDEX TABLE table_name; - Partitioning - Partition large tables so that older, less frequently accessed data can be vacuumed separately, reducing bloat in frequently updated partitions.
- Avoid Long-Running Transactions - Keep transactions as short as possible to prevent dead tuples from being vacuumed and reclaimed.
- Archiving or Deleting Old Data - Regularly archive or delete old or unnecessary data to maintain manageable table sizes.
pgcompacttable
- https://github.com/dataegret/pgcompacttable It works by reordering tuples from the end of a table towards the front of a table, which allows the table size to shrink.
reindex
In PostgreSQL 12, REINDEX gained support for the CONCURRENTLY option. This works similarly to how pg_repack does, but it’s native to PostgreSQL. By being native, it continues to receive improvements in newer versions.
REINDEX (VERBOSE) INDEX CONCURRENTLY index_trips_on_driver_id;
Providing a table name as an option to REINDEX performs a reindex for all indexes that are on the specified table.
REINDEX (VERBOSE) TABLE CONCURRENTLY trips;