postgres visibility map - ghdrako/doc_snipets GitHub Wiki
In PostgreSQL, the Visibility Map (VM) is a specialized data structure that optimizes the VACUUM process and enhances the performance of index-only scans. The VM tracks whether all tuples (rows) on a specific page of a table are visible to all active transactions. This enables PostgreSQL to skip unnecessary operations during VACUUM and perform faster index-only scans when possible.
Key Functions Visibility Map
- Optimizing VACUUM
- The VM indicates whether a page contains any dead tuples (deleted or updated rows that need cleanup).
- If a page is marked as " all visible ," the VACUUM process can skip it entirely since there are no dead tuples to remove. This increases VACUUM efficiency, especially for large tables, by avoiding unnecessary scans.
- Index-Only Scans:
- The VM helps optimize index-only scans, allowing PostgreSQL to return query results using only index data without reading the corresponding table (heap) pages.
- For an index-only scan to be possible, PostgreSQL must ensure that all tuples in the relevant heap pages are visible to all transactions (no row visibility checks needed). The VM provides this information.
- If a page is marked " all visible " in the VM, PostgreSQL can skip fetching actual data from the table and return results based on the index alone.
Working of the Visibility Map
- The VM is a bitmap where each bit corresponds to a page (8 KB block) in a table.
- Each bit indicates whether all tuples on that page are visible to every transaction.
- If a bit is set (marked as " all visible "), PostgreSQL knows that no further visibility checks are needed for that page.
- If the bit is clear (not marked), it means some tuples on the page might not be visible to all transactions, necessitating a scan if required.
- The VM is stored as a separate file alongside the table (heap) data and is updated by the VACUUM process.
Updating the Visibility Map
- During VACUUM :
- When VACUUM processes a table, it identifies pages containing only live tuples (visible to all active transactions) and updates the VM to mark those pages as " all visible ."
- If dead tuples are found on a page, VACUUM cleans them up, but thepage will not be marked " all visible " until the next VACUUM , assuming no further changes occur.
- When VACUUM processes a table, it identifies pages containing only live tuples (visible to all active transactions) and updates the VM to mark those pages as " all visible ."
- When Rows are Modified:
- When a page is modified (for example, rows inserted, updated, or deleted), the bit in the VM for that page is cleared. This signals that the page may contain rows requiring further visibility checks, and it cannot be skipped by future VACUUM operations or index-only scans.
- The bit will be set again after the next successful VACUUM run if no further changes have been made to the page.
Example Use Case: VACUUM Optimization
- Suppose you have a large table with 1 million pages, but only 100,000 have been modified recently (updated or deleted rows).
- The VM will mark the 900,000 unmodified pages as " all visible ."
- When VACUUM runs, it only needs to scan the 100,000 modified pages, significantly reducing work and speeding up the process.
Example Use Case: Index-Only Scan
- For a query like:
SELECT id FROM users WHERE id > 1000;
*If an index exists on the id column and the corresponding pages in the table are marked " all visible " in the VM, PostgreSQL can perform an index-only scan. - Instead of fetching actual rows from the table, PostgreSQL uses the index alone to satisfy the query, improving performance by avoiding disk I/O.
Limitations and Considerations
Visibility Map is Approximate
: The VM provides a broad indication of tuple visibility at the page level but does not store tuple-level visibility information. If even a single tuple on a page is not visible to all transactions, the entire page will be marked as not " all visible ."Needs VACUUM to be Effective
: For the VM to remain accurate, VACUUM (or autovacuum) must run periodically. Without VACUUM , the map will not be updated, impacting both index-only scans and VACUUM efficiency.Disk Space Overhead
: The VM is stored in a separate file on disk, introducing a small storage overhead (approximately 1 byte per page of the table).