postgres index types - ghdrako/doc_snipets GitHub Wiki

Index types

  • Balanced Tree (B-Tree) - default index PostgreSQL uses. Along with equal-to(=) BTREE works well with the below operators as well:
    • <
    • >
    • <=
    • >=
    • IS NULL
    • IS NOT NULL
    • BETWEEN
    • IN

B-tree indexes can also be used to retrieve data in sorted order.

  • hash index - this index is built on the result of a 32-bit hash function for the value of the column(s). It is important to note that the hash index can be used only for equality operators, not for range nor disequality operators. In fact, being an index built on a hash function, the index cannot compare two hash values to understand their ordering; only the equality (which produces the very same hash value) can be evaluated.
  • Block Range Index (BRIN) is a particular type of index that is based on the range of values in data blocks on storage. The idea is that every block has a minimal and maximal value, and the index then stores a couple of values for every data block on the storage. When a particular value is searched from a query, the index knows in which data block the values can be found, but all the tuples in the block must be evaluated.

A table for which a BRIN index is created is considered a sequence of block ranges, where each range consists of a fixed number of adjacent blocks. For each range, a BRIN index entry contains a summary of column values contained in the block range. For example, a summary may contain the minimum and maximum values of the timestamp column in an event log table. To find any value of the indexed attribute, it is sufficient to find an appropriate block range (using the index) and then scan all blocks in the range.

The structure of the summarization method depends on the type of the column being indexed. For intervals, a summary may be an interval containing all intervals contained in the block range. For spatial data, a summary can be a bounding box containing all boxes in the block range. If the column values are not ordered or rows are not ordered in the table, a scan of a BRIN index will return multiple block ranges to be scanned. The summarization is expensive. Therefore, PostgreSQL provides multiple choices for BRIN index maintenance: a BRIN index can be updated automatically with triggers; alternatively, delayed summarization can be done automatically together with vacuum or started manually.

  • Generalized Inverted Index (GIN) is a type of index that instead of pointing to a single tuple points to multiple values, and to some extent, to an array of values. Usually, this kind of index is used in full-text search scenarios, where you are indexing a written text where there are multiple duplicated keys (for example, the same word or term) that point to different places (for example, the same word in different phrases and lines).
    • GINis designed for handling cases where the items to be indexed are com-posite values, and the queries to be handled by the index need to search for element values that appear within the composite items. For example, the items could be documents, and the queries could be searches for docu-ments containing speci??c words.
    • Gin indexes are “inverted indexes” which are appropriate for data values that contain multiple component values, such as arrays. An inverted index contains a separate entry for each component value. Such an index can efficiently handle queries that test for the presence of speciffc component values.
    • The GIN access method is the foundation for the PostgreSQL Full Text Search support.
  • Generalized Index Search Tree (GIST), which is a platform on top of which new index types can be built. The idea is to provide a pluggable infrastructure where you can define operators and features that can index a data structure.Its implementation in PostgreSQL allows support for 2-dimensional data types such as the geometry point or the rang data types. Those data types don’t support a total order and as a consequence can’t be indexed properly in a B-tree index.

GIST is a family of index structures, each of which supports a certain data type and can be configured to implement several different tree-based index structures. Specifically, it implements index structure for spatial data known as an R-tree. Support for R-tree and a few other indexes is included in the PostgreSQL distribution;

An R-tree index supports a search on spatial data. An index key for an R-tree always represents a rectangle in a multidimensional space. A search returns all objects having a non-empty intersection with the query rectangle. The structure of an R-tree is similar to the structure of a B-tree; however, splitting overflowed nodes is much more complicated. R-tree indexes are efficient for a small number of dimensions (typically, two to three).

  • Spaced Partitioned gist SP-GIST, a spatial index used in geographical applications.SP-GiSTindexes are the only PostgreSQL index access method imple-mentation that support non-balanced disk-based data structures, such as quadtrees, k-d trees, and radix trees (tries). This is useful when you want to index 2-dimensional data with very di?ferent densities.

GIST index is efficient for collections of documents where the total number of different terms is small. This is uncommon with texts in natural languages, so GIN indexes are usually more efficient in this case.

  • Bloom filters is a space-efficient data structure that is used to test whether an element is a member of a set. In the case of an index access method, it allows fast exclusion of non-matching tuples via signatures whose size is determined at index creation. This type of index is most useful when a table has many attributes and queries test arbitrary combinations of them. A traditional B-tree index is faster than a Bloom index, but it can require many B-tree indexes to support all possible queries where one needs only a single Bloom index. Note however that Bloom indexes only support equality queries, whereas B-tree indexes can also perform inequality and range searches.
    • The Bloom filter index is implemented as a PostgreSQL extension starting in PostgreSQL 9.6, and so to be able to use thisaccess methodit’s necessary to first create extension bloom.

Both Bloom indexes and BRIN indexes are mostly useful when covering mutliple columns. In the case of Bloom indexes, they are useful when the queries themselves are referencing most or all of those columns in equality comparisons.

CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ] [ [ IF NOT EXISTS ] name ] ON [
ONLY ] table_name [ USING method ]
( { column_name | ( expression ) } [ COLLATE collation ] [ opclass ] [
ASC | DESC ] [ NULLS { FIRST | LAST } ] [, ...] )
[ INCLUDE ( column_name [, ...] ) ]
[ WITH ( storage_parameter = value [, ... ] ) ]
[ TABLESPACE tablespace_name ]
[ WHERE predicate ]

It is possible to store an index in another tablespace than that of the underlying table, and this can be useful to store important indexes in faster storage.

  • INCLUDE clause allows you to specify some extra columns of the underlying table that are going to be stored in the index, even if not indexed. The idea is that if the index is useful for an index-only scan, you can still get extra information without the trip to the underlying table. Of course, having a covering index (which is the name of an INCLUDE clause index) means that the index is going to grow in size and, at the same time, every tuple update could require extra index update effort.
  • USING clause allows the specification of the type of index to be built, and if none is specified, the default B-Tree is used.
  • CONCURRENTLY clause allows the creation of an index in a concurrent way: when an index is in its building phase, the underlying table is locked against changes so that the index can finish its job of indexing the tuple values. In a concurrent index creation, the table allows changes even during index creation, but once the index has been built, another pass on the underlying table is required to “adjust” what has changed in the meantime.
CREATE INDEX CONCURRENTLY index_name ON table_name using btree (column);

Advenced indexes

  • Multicolumn indexes - upto 32 columns
  • Indexes and ORDER BY
  • Combining multiple indexes
  • Unique indexes
  • Indexes on expressions - In order to be used in an index, a user-defined function must be declared as IMMUTABLE, which means its output must be the same for the very same input.
  • Partial indexes
  • Partial unique indexes
  • Index-only scans