postgres indexes - ghdrako/doc_snipets GitHub Wiki
- https://www.postgresql.org/docs/current/btree-implementation.html
- https://www.postgresql.org/docs/current/sql-createindex.html
- https://www.postgresql.org/docs/current/indexes.html
- Percona - PostgreSQL Indexes Tutorial Course
- https://use-the-index-luke.com/
Indeksy nie zmieniają wyniku zapytania, ale może go przyspieszyc.
PostgreSQL oferuje różne typy indeksów:
- klasyczny indeks na pojedynczą kolumnę tabeli;
- indeks złożony na kilku kolumnach tabeli;
- indeks częściowy, poprzez ograniczenie indeksowanych danych klauzulą WHERE;
- indeks funkcjonalny, poprzez indeksowanie wyniku funkcji zastosowanej do jednej lub większej liczby kolumn tabeli;
- indeksy pokrywający , zawierające więcej pól niż jest to konieczne do filtrowania, aby nie trzeba było czytać tabeli i uzyskać Index only scan.
Za tworzenie indeksów odpowiada programista. Jedyne wyjątki: te tworzone automatycznie po zadeklarowaniu ograniczeń klucza podstawowego lub unikalności. Tworzenie jest wówczas automatyczne. Ograniczenia klucza obcego wymagają, aby w wskazanej tabeli istniał już klucz podstawowy, ale nie należy tworzyć indeksu w tabeli zawierającej klucz.
Constraints transform into index:
- PRIMARY KEY
- UNIQUE
- EXCLUDE USING
An index in PostgreSQL can be built on a single column or multiple columns at once; PostgreSQL supports indexes with up to 32 columns.It is important to note that, when creating multi-column indexes, you should always place the most selective columns first. PostgreSQL will consider a multi-column index from the first column onward, so if the first columns are the most selective, the index access method will be the cheapest.
CREATE INDEX index_name ON table_name [USING method]
(
column_name [ASC | DESC] [NULLS {FIRST | LAST }],
...
);
method such as btree, hash, gist, spgist, gin, and brin. PostgreSQL uses btree by default.
two choices:
- Use the REINDEX command (not suggested);
- Drop the index and try to re-build it again (suggested).
- Multicolumn indexes Index that includes multiple columns in the definition
- Partial indexes — An expression that reduces the set of rows covered for a table, in the index definition Sshow using example:https://hakibenita.com/postgresql-unused-index-size
- Covering indexes — An index that covers all columns needed for a query
- Function index
Operator classes — Operators to be used by the indexes, specific to a column
- https://www.postgresql.org/docs/current/indexes-index-only-scans.html
- https://github.com/ghdrako/doc_snipets/wiki/postgres-extension-pg_trgm
PostgreSQL version 11 added the INCLUDE
keyword, which can be used to create covering indexes.
The INCLUDE
keyword specifies columns that only supply data, called “payload” columns. Payload columns cannot be used for filtering like in WHERE
clauses or JOIN
conditions, but supply data for columns listed in the SELECT
clause.
CREATE UNIQUE INDEX clients_idx1 ON clients (id_client) INCLUDE (nom_client) ;
SELECT id_client,nom_client FROM clients WHERE id_client > 100 ; -- Index Only Scan
Covered index usuwa problem z pracą w indeksie nie tylko pola służącym jako kryteria wyszukiwania, ale także pola wyników.
Indeks pokrywający może zatem umożliwić skanowanie tylko indeksu, ponieważ nie musi już przeszukiwać tabeli. Aby skorzystać z tego, wszystkie kolumny zwrócone przez zapytanie muszą znajdować się w indeksie. Jeśli brakuje choć jednego, skanowanie samego indeksu nie będzie możliwe.
Kolejne zainteresowanie Index only scan: poszukiwane rekordy są ciągłe w indeksie (ponieważ jest sortowane), liczba dostępu do dysku jest znacznie niższa, co może zapewnić ogromne korzyści w selekcji.
CREATE TABLE evenements ( id int primary key, traite bool NOT NULL, type text NOT NULL, payload text );
-- 10 000 événements traités
INSERT INTO evenements (id, traite, type) (
SELECT i, true,
CASE WHEN i % 3 = 0 THEN 'FACTURATION'
WHEN i % 3 = 1 THEN 'EXPEDITION'
ELSE 'COMMANDE'
END
FROM generate_series(1, 10000) as i);
-- et 10 non encore traités
INSERT INTO evenements (id, traite, type) (
SELECT i, false,
CASE WHEN i % 3 = 0 THEN 'FACTURATION'
WHEN i % 3 = 1 THEN 'EXPEDITION'
ELSE 'COMMANDE'
END
FROM generate_series(10001, 10010) as i);
\d evenements
Table « public.evenements »
Colonne | Type | Collationnement | NULL-able | Par défaut
---------+---------+-----------------+-----------+------------
id | integer | | not null |
traite | boolean | | not null |
type | text | | not null |
payload | text | | | Index :
"evenements_pkey" PRIMARY KEY, btree (id)
CREATE INDEX index_partiel on evenements (type) WHERE NOT traite ;
EXPLAIN SELECT * FROM evenements WHERE type = 'FACTURATION' ;
QUERY PLAN
----------------------------------------------------------------
Seq Scan on evenements (cost=0.00..183.12 rows=50 width=69)
Filter: (type = 'FACTURATION'::text)
EXPLAIN SELECT * FROM evenements WHERE type = 'FACTURATION' AND NOT traite ;
QUERY PLAN
----------------------------------------------------------------------------
Bitmap Heap Scan on evenements (cost=8.22..54.62 rows=25 width=69)
Recheck Cond: ((type = 'FACTURATION'::text) AND (NOT traite))
-> Bitmap Index Scan on index_partiel (cost=0.00..8.21 rows=25 width=0)
Index Cond: (type = 'FACTURATION'::text)
CREATE INDEX index_complet ON evenements (type);
SELECT idxname, pg_size_pretty(pg_total_relation_size(idxname::text)) FROM (VALUES ('index_complet'), ('index_partiel')) as a(idxname);
idxname | pg_size_pretty
---------------+----------------
index_complet | 88 kB
index_partiel | 16 kB
Uwaga Aby partia index był użyty wartuek musi być logicznym ekwiwalentem warunku w indeksie.
Na przykład w poprzednich zapytaniach kryterium treat IS FALSE
zamiast NOT treat
nie używa indeksu (w rzeczywistości nie jest to to samo kryterium ze względu na NULL: NULL= false
zwraca NULL
, ale NULL IS false
zwraca false
).
Z drugiej strony, matematycznie bardziej ograniczone warunki niż wskaźnik pozwalają na jego stosowanie:
CREATE INDEX commandes_recentes_idx ON commandes (client_id)
WHERE date_commande > '2015-01-01' ;
EXPLAIN (COSTS OFF) SELECT * FROM commandes WHERE date_commande > '2016-01-01' AND client_id = 17 ;
QUERY PLAN
------------------------------------------------------
Index Scan using commandes_recentes_idx on commandes
Index Cond: (client_id = 17)
Filter: (date_commande > '2016-01-01'::date)
Jednakże ten częściowy indeks nie będzie używany dla kryterium przed rokiem 2015. Podobnie, jeśli częściowy indeks zawiera listę wartości IN() lub NOT IN(), to w zasadzie nadaje się do użytku
CREATE INDEX commandes_1_3 ON commandes (numero_commande)
WHERE mode_expedition IN (1,3);
EXPLAIN (COSTS OFF) SELECT * FROM commandes WHERE mode_expedition = 1 ;
QUERY PLAN
---------------------------------------------
Index Scan using commandes_1_3 on commandes
Filter: (mode_expedition = 1)
DROP INDEX commandes_1_3 ;
CREATE INDEX commandes_not34 ON commandes (numero_commande)
WHERE mode_expedition NOT IN (3,4);
EXPLAIN (COSTS OFF) SELECT * FROM commandes WHERE mode_expedition = 1 ;
QUERY PLAN
-----------------------------------------------
Index Scan using commandes_not34 on commandes
Filter: (mode_expedition = 1)
DROP INDEX commandes_not34 ;n
Typowym przypadkiem użycia indeksu częściowego jest poprzedni przykład: aplikacja z gorącymi danymi, często akceptowaną i przetwarzaną oraz zimnymi danymi, które są bardziej przeznaczone do histori lub archiwizacji. Na przykład internetowy system sprzedaży będzie prawdopodobnie zainteresowany indeksami na zamówienia, których stan różni się od zamknięcia: system zamowien prawdopodobnie będzie miał przetwarzanie, w oczekiwaniu na wysyłkę, podczas dostawy, ale nie będzie potrzebował przetwarzac Dostarczone już zamówienia, które wówczas będą służyć jako analiza statystyczna. Na przykład, zawsze w tej samej tabeli, żądania statystyki dotyczące wyprawy mogą skorzystać z tego indeksu:
CREATE INDEX index_partiel_expes ON evenements (id)
WHERE type = 'EXPEDITION' ;
EXPLAIN SELECT count(id) FROM evenements WHERE type = 'EXPEDITION' ;
QUERY PLAN
----------------------------------------------------------------------------------Aggregate (cost=106.68..106.69 rows=1 width=8)
-> Index Only Scan using index_partiel_expes on evenements (cost=0.28..98.34 rows=3337 width=4)
- Unikaj indeksów dla ktorych kolumna indeksów jest też filtrowana w where:
CREATE INDEX ON matable ( champ_filtre ) WHERE champ_filtre = …
- Preferowane :
CREATE INDEX ON matable ( champ_resultat ) WHERE champ_filtre = …
Ogólnie rzecz biorąc, częściowy indeks musi indeksować kolumnę inną niż ta, która jest filtrowana (a zatem znana). Tak więc w poprzednim przykładzie indeksowana kolumna (typ) nie jest kolumną klauzuli WHERE. Ustaliliśmy kryteria, ale interesują nas rodzaje zgłaszanych zdarzeń. Innym częściowym indeksem może być id WHERE NOT treat, który po prostu pobiera listę nieobrobionych identyfikatorów wszystkich typów.
Celem jest uzyskanie bardzo ukierunkowanego i kompaktowego indeksu, a także oszczędność miejsca na dysku i mniejsze obciążenie procesora. Jednakże indeksy częściowe muszą być znacznie mniejsze od indeksów „ogólnych” (przynajmniej o połowę). Dzięki specjalistycznym indeksom częściowym możliwe jest „wstępne obliczenie” niektórych krytycznych zapytań.
CREATE INDEX ON commandes (date_commande) ;
EXPLAIN (COSTS OFF) SELECT * FROM commandes WHERE extract('year' from date_commande) = 2011;
QUERY PLAN
--------------------------------------------------------------------------
Gather Workers Planned: 2
-> Parallel Seq Scan on commandes
Filter: (EXTRACT(year FROM date_commande) = '2011'::numeric)
EXPLAIN (COSTS OFF) SELECT * FROM commandes
WHERE date_commande BETWEEN '01-01-2011'::date AND '31-12-2011'::date;
QUERY PLAN
--------------------------------------------------------------------------
Index Scan using commandes_date_commande_idx on commandes
Index Cond: ((date_commande >= '2011-01-01'::date) AND (date_commande <= '2011-12-31'::date))↪
CREATE INDEX annee_commandes_idx ON commandes( extract('year' from date_commande) ) ;
EXPLAIN (COSTS OFF) SELECT * FROM commandes WHERE extract('year' from date_commande) = 2011;
QUERY PLAN
--------------------------------------------------------------------------
Bitmap Heap Scan on commandes Recheck Cond: (EXTRACT(year FROM date_commande) = '2011'::numeric)
-> Bitmap Index Scan on annee_commandes_idx
Index Cond: (EXTRACT(year FROM date_commande) = '2011'::numeric)
SELECT PG_SIZE_PRETTY(PG_TOTAL_RELATION_SIZE('index_name'));
SELECT
pg_tables.schemaname AS schema,
pg_tables.tablename AS table,
pg_stat_all_indexes.indexrelname AS index,
pg_stat_all_indexes.idx_scan AS number_of_scans,
pg_stat_all_indexes.idx_tup_read AS tuples_read,
pg_stat_all_indexes.idx_tup_fetch AS tuples_fetched
FROM pg_tables
LEFT JOIN pg_class ON(pg_tables.tablename = pg_class.relname)
LEFT JOIN pg_index ON(pg_class.oid = pg_index.indrelid)
LEFT JOIN pg_stat_all_indexes USING(indexrelid)
WHERE pg_tables.schemaname NOT IN ('pg_catalog', 'information_schema')
ORDER BY pg_tables.schemaname, pg_tables.tablename;
Warning: **Some index accesses may not update the statistics**. You should evaluate these results carefully and not blindly trust them.
SELECT *
FROM pg_class, pg_index
WHERE pg_index.indisvalid = false
AND pg_index.indexrelid = pg_class.oid;
SELECT pg_size_pretty( pg_relation_size( 'posts') ) AS table_
size,
pg_size_pretty( pg_relation_size( 'idx_posts_date' ) ) AS idx_date_
size,
pg_size_pretty( pg_relation_size( 'idx_posts_author' ) ) AS idx_
author_size;
pg_stat_user_indexes
view provides information about how many times an index has been used and how.
SELECT indexrelname, idx_scan, idx_tup_read, idx_tup_fetch
FROM pg_stat_user_indexes
WHERE relname = 'posts';
-
idx_scan
how many times index used
The PostgreSQL catalog view pg_stat_all_indexes
shows the total number of index uses (scans, reads, and fetches) since the last statistics reset.
Note that some primary key indexes are never used for data retrieval; however, they are vital for data integrity and should not be removed.
SELECT l.relation::regclass,
l.transactionid,
l.mode,
l.GRANTED,
s.query,
s.query_start,
age(now(), s.query_start) AS "age",
s.pid
FROM pg_stat_activity s JOIN pg_locks l
ON l.pid = s.pid
WHERE mode= 'ShareUpdateExclusiveLock'
ORDER BY s.query_start;
-
https://www.cybertec-postgresql.com/en/postgresql-parallel-create-index-for-better-performance/
The best tuning method for creating indexes is a very high value for
maintenance_work_mem
.
SET maintenance_work_mem TO '4 GB';
In PostgreSQL 11 parallel index creation is on by default. The
parameter in charge for this issue is called
max_parallel_maintenance_workers
, which can be set in postgresql.conf
:
SHOW max_parallel_maintenance_workers;
SET max_parallel_maintenance_workers TO 0; # no multicore indexing is available
SET max_parallel_maintenance_workers TO 2;
ALTER TABLE t_demo SET (parallel_workers = 4);
SET max_parallel_maintenance_workers TO 4;
Set PostgreSQL to use larger checkpoint distances
postgresql.conf
to the following values:
checkpoint_timeout = 120min
max_wal_size = 50GB
min_wal_size = 80MB
Those settings can be activated by reloading the config file:
SELECT pg_reload_conf();
Using tablespaces in PostgreSQL to speed up indexing
CREATE TABLESPACE indexspace LOCATION '/ssd1/tabspace1';
CREATE TABLESPACE sortspace LOCATION '/ssd2/tabspace2';
SET temp_tablespaces TO sortspace;
CREATE INDEX idx6 ON t_demo (data) TABLESPACE indexspace;
DDL changes like CREATE INDEX
on tables with high row counts and high transaction volume (transactions per second (TPS) or queries per second) should be created using the CONCURRENTLY
option. This applies to both creating an index and dropping an index. If your index creation gets canceled, you’ll need to raise your statement_timeout and try again.
An index created concurrently will not lock the table against writes, which means you can still insert or update new rows. In order to execute the (slightly slower) index creation, Postgres will do the following:
- Scans the table once to build the index;
- Runs the index a second time for things added or updated since the first pass.
CREATE INDEX CONCURRENTLY
works, without locking down the table and allowing concurrent updates to the table. Building an index consists of three phases.
- Phase 1: At the start of the first phase, the system catalogs are populated with the new index information. This obviously includes information about the columns used by the new index.As soon as information about the new index is available in the system catalog and is seen by other backends, they will start honouring the new index and ensure that the HOT chain’s property is preserved. CIC must wait for all existing transactions to finish before starting the second phase on index build. This guarantees that no new broken HOT chains are created after the second phase begins.
- Phase 2: So when the second phase starts, we guarantee that new transactions cannot create more broken HOT chains (i.e. HOT chains which do not satisfy the HOT property) with respect to the old indexes as well as the new index. We now take a new MVCC snapshot and start building the index by indexing every visible row in the table. While indexing we use the column value from the visible version and TID of the root of the HOT chain. Since all subsequent updates are guaranteed to see the new index, the HOT property is maintained beyond the version that we are indexing in the second phase. That means if a row is HOT updated, the new version will be reachable from the index entry just added (remember we indexed the root of the HOT chain).
During the second phase, if some other transaction updates the row such that neither the first not the second column is changed, a HOT update is possible. On the other hand, if the update changes the second column (or any other index column for that matter), then a non-HOT update is performed.
- Phase 3: You must have realised that while second phase was running, there could be transactions inserting new rows in the table or updating existing rows in a non-HOT manner. Since the index was not open for insertion during phase 2, it will be missing entries for all these new rows. This is fixed by taking a new MVCC snapshot and doing another pass over the table. During this pass, we index all rows which are visible to the new snapshot, but are not already in the index. Since the index is now actively maintained by other transactions, we only need to take care of the rows missed during the second phase.
The index is fully ready when the third pass finishes. It’s now being actively maintained by all other backends, following usual HOT rules. But the problem with old transactions, which could see rows which are neither indexed in the second or the third phase, remains. After all, their snapshots could see rows older than what our snapshots used for building the index could see.
The new index is not usable for such old transactions. CIC deals with the problem by waiting for all such old transactions to finish before marking the index ready for queries. Remember that the index was marked for insertion at the end of the second phase, but it becomes usable for reads only after the third phase finishes and all old transactions are gone.
Once all old transactions are gone, the index becomes fully usable by all future transactions. The catalogs are once again updated with the new information and cache invalidation messages are sent to other processes.
create index concurrently ''index11'' on test using btree (id);
create index concurrently "index12" on test using btree (id, subject_name);
create unique index concurrently "index14" on test using btree (id);
create index concurrently "index15" on toy using btree(availability) where availability is true;
create index concurrently on employee ((lower (name)));
SELECT
tablename,
indexname,
indexdef
FROM
pg_indexes
WHERE
tablename LIKE 'c%'
ORDER BY
tablename,
indexname;
\d employee;
Find out the current state of all your indexes
SELECT
t.tablename,
indexname,
c.reltuples AS num_rows,
pg_size_pretty(pg_relation_size(quote_ident(t.tablename)::text)) AS table_size,
pg_size_pretty(pg_relation_size(quote_ident(indexrelname)::text)) AS index_size,
CASE WHEN indisunique THEN 'Y'
ELSE 'N'
END AS UNIQUE,
idx_scan AS number_of_scans,
idx_tup_read AS tuples_read,
idx_tup_fetch AS tuples_fetched
FROM pg_tables t
LEFT OUTER JOIN pg_class c ON t.tablename=c.relname
LEFT OUTER JOIN
( SELECT c.relname AS ctablename, ipg.relname AS indexname, x.indnatts AS number_of_columns, idx_scan, idx_tup_read, idx_tup_fetch, indexrelname, indisunique FROM pg_index x
JOIN pg_class c ON c.oid = x.indrelid
JOIN pg_class ipg ON ipg.oid = x.indexrelid
JOIN pg_stat_all_indexes psai ON x.indexrelid = psai.indexrelid )
AS foo
ON t.tablename = foo.ctablename
WHERE t.schemaname='ebk'
ORDER BY 1,2;
W przypadku, gdy indeks został utworzony przy użyciu klauzuli CONCURRENTLY
, możemy się zdarzyć, że operacja się nie powiedzie, a indeks będzie istniał, ale pozostanie nieprawidłowy i tym samym bezużyteczny. Problem nie pojawia się w przypadku niepowodzenia REINDEX … CONCURRENTLY
, ponieważ stara wersja indeksu nadal istnieje i można jej używać.
When you declare a unique
constraint, a primary key
constraint or an exclusion constraint
PostgreSQL creates an index for You.
PostgreSQL implements several index Access Methods. An access method is a generic algorithm with a clean API that can be implemented for compatible data types. Each algorithm is well adapted to some use cases,
SELECT relname, relpages, reltuples,
i.indisunique, i.indisclustered, i.indisvalid,
pg_catalog.pg_get_indexdef(i.indexrelid, 0, true)
FROM pg_class c JOIN pg_index i on c.oid = i.indrelid
WHERE c.relname = 'posts';
The pg_get_indexdef()
special function provides a textual representation of the CREATE INDEX
statement used to produce every index and can be very useful to decode and learn how to build complex indexes.
To improve these filtering operations, using the WHERE
clause, postgres keeps statistics about the selectivity of the values contained per table
column. This information is stored in the pg_statistics
catalog, and there is a more human-readable view called pg_stats
.
SELECT
attname AS “column_name”,
n_distinct AS “distinct_rate”,
array_to_string(most_common_vals, E’\n’) AS
“most_common_values”,
array_to_string(most_common_freqs, E’\n’) AS
“most_common_frequencies”
FROM pg_stats
WHERE tablename = ‘address’
AND attname = ‘postal_code’;
- The
distinct_rate
shows the proportion of distinct values versus the total rows. In the case of a unique value column, such as the Primary Key, this rate is 100%, and it is represented with the value -1. The table column from the example is near -1, meaning almost all the values are distinct. - The
most_common_values
gives insight into the postal_code values that repeat the most. In this case, four values: null, 22474, 9668, and 52137. - Finally, the
most_common_frequencies
shows the frequency of each of the most common values, the nulls are 0.66% of the total values, and each of the other values represents 0.33% of the total values from the sample.
We can verify the information the next way:
pagila=# WITH total AS (SELECT count(*)::numeric
cnt FROM address)
SELECT
postal_code, count(address.*),
round(count(address.*)::numeric * 100 /
total.cnt, 2) AS “percentage”
FROM address, total
GROUP BY address.postal_code, total.cnt
HAVING count(address.*) > 1
ORDER BY 2 DESC;
You need to invalidate an index as an administrator user, even if you are the user that created the index. This is due to the fact that you need to manipulate the system catalog, which is an activity restricted to administrator users only.
UPDATE pg_index SET indisvalid = false
WHERE indexrelid = ( SELECT oid FROM pg_class
WHERE relkind = 'i'
AND relname = 'idx_author_created_on' );
Index is then marked as INVALID
to indicate that PostgreSQL will not ever try to consider it for its execution plans. You can, of course, reset the index to its original status, making the same update as the preceding and setting the indisvalid column to a true value.
REINDEX [ ( option [, ...] ) ] { INDEX | TABLE | SCHEMA | DATABASE | SYSTEM } [ CONCURRENTLY ] name
where option can be one of:
CONCURRENTLY [ boolean ] – Reindex can be created
concurrently online
TABLESPACE new_tablespace – Reindexing could be
done in new tablespace
VERBOSE [ boolean ] – Logs will be printed while
reindexing
REINDEX [ ( VERBOSE ) ] { INDEX | TABLE | SCHEMA | DATABASE | SYSTEM } [CONCURRENTLY ] name
REINDEX TABLE payment;
You can decide to rebuild a single index by means of the INDEX argument followed by the name of the index, or you can rebuild all the indexes of a table by means of the TABLE argument followed, as you can imagine, by the table name. Going further, you can rebuild all the indexes of all the tables within a specific schema by means of the SCHEMA argument (followed by the name of the schema) or the whole set of indexes of a database using the DATABASE argument and the name of the database you want to reindex. Lastly, you can also rebuild indexes on system catalog tables by means of the SYSTEM argument. You can execute REINDEX within a transaction block but only for a single index or table, which means only for the INDEX and TABLE options. All the other forms of the REINDEX command cannot be executed in a transaction block. The CONCURRENTLY option prevents the command from acquiring exclusive locks on the underlying table in a way similar to that of building a new index.
check for time spent doing sequential scans of your data, with a filtr step, as that’s the part that a proper index might be able to optimize.
If a column is of a numeric type, it can be modified by adding zero to its value. For example, the condition attr1+0=p_value will block the usage of an index on column attr1. For any data type, the coalesce() function will always block the usage of indexes, so assuming attr2 is not nullable, the condition can be modified to something like coalesce(t1.attr2, '0')=coalesce(t2.attr2, '0').
Operator Classes and Operator Families #
CREATE INDEX name ON table (column opclass [ ( opclass_options ) ] [sort options] [, ...]);
SELECT * FROM directories WHERE path LIKE '/1/2/3/%'
CREATE INDEX CONCURRENTLY ON directories (path);
Domyślny indeks B-drzewa, który jest tworzony w PostgreSQL, nie jest optymalny dla zapytań z LIKE na początku tekstu, ponieważ indeks ten działa najlepiej z zapytaniami wymagającymi pełnego porządku (od najmniejszej do największej wartości). W przypadku zapytań na zasadzie „prefiksu” (czyli wyszukiwania po początkowym fragmencie ciągu) taki indeks może nie być używany efektywnie przez PostgreSQL.
Indeks ten tworzy strukturę, która pomaga szybko znaleźć konkretną wartość lub zakres wartości w kolejności od początku do końca, co dobrze działa przy zapytaniach typu:
SELECT * FROM directories WHERE path = '/1/2/3/'
lub
SELECT * FROM directories WHERE path > '/1/2/3/'
W obu przypadkach PostgreSQL może skorzystać z indeksu B-drzewa, ponieważ operacje =
i ``> zachowują pełny porządek, w którym B-drzewo jest efektywne.
Dla zapytania LIKE '/1/2/3/%'
PostgreSQL potrzebuje sprawdzić wszystkie wartości, które zaczynają się od /1/2/3/
, ale bez sortowania dalej według kolejnych znaków po tym prefiksie. Domyślny indeks B-drzewa nie wspiera efektywnie wyszukiwań opartych tylko na początku tekstu, więc** PostgreSQL może zignorować taki indeks** i wykonać pełne przeszukanie tabeli (co jest wolniejsze).
Użycie klasy operatorów text_pattern_ops
informuje PostgreSQL, że chcesz używać indeksu tylko do dopasowywania prefiksów. Taki indeks jest zoptymalizowany do wyszukiwań w stylu „zaczyna się od”, dlatego PostgreSQL może szybciej znajdować wyniki, które mają konkretny prefiks, zamiast przeszukiwać całą tabelę.
CREATE INDEX CONCURRENTLY ON directories (path text_pattern_ops);
text_pattern_ops
, varchar_pattern_ops
, and bpchar_pattern_ops
support B-tree indexes on the types text, varchar, and char respectively. The difference from the default operator classes is that the values are compared strictly character by character rather than according to the locale-specific collation rules. This makes these operator classes suitable for use by queries involving pattern matching expressions (LIKE or POSIX regular expressions) when the database does not use the standard “C” locale.
Note that you should also create an index with the default operator class if you want queries involving ordinary <, <=, >, or >=
comparisons to use an index. Such queries cannot use the xxx_pattern_ops
operator classes. (Ordinary equality comparisons can use these operator classes, however.) It is possible to create multiple indexes on the same column with different operator classes. If you do use the C locale, you do not need the xxx_pattern_ops operator classes, because an index with the default operator class is usable for pattern-matching queries in the C locale.
psql has commands \dAc
, \dAf
, and \dAo
to show operator classes
SELECT first_name
FROM users
WHERE first_name LIKE 'Jane%'
LIMIT 5;
LIKE and ILIKE support the % symbol, which matches zero or more characters. The _ (underscore) character can be used instead to match exactly one character. Collations[400] are how text types are sorted and compared.
text_pattern_ops
operator class can be used with a B-Tree index to speed up text matching queries.
-- Use `varchar_pattern_ops` Operator Class
CREATE INDEX idx_users_first_name_vpo
ON users(first_name VARCHAR_PATTERN_OPS);
-- Find missing indexes (look at seq_scan counts)
-- https://stackoverflow.com/a/12818168/126688
SELECT
relname AS TableName,
TO_CHAR(seq_scan, '999,999,999,999') AS TotalSeqScan,
TO_CHAR(idx_scan, '999,999,999,999') AS TotalIndexScan,
TO_CHAR(n_live_tup, '999,999,999,999') AS TableRows,
PG_SIZE_PRETTY(PG_RELATION_SIZE(relname::REGCLASS)) AS TableSize
FROM pg_stat_all_tables
WHERE schemaname = 'public' -- change schema name, i.e. 'rideshare' if not 'public'
-- AND 50 * seq_scan > idx_scan -- more than 2%, add filters to narrow down results
-- AND n_live_tup > 10000 -- narrow down results for bigger tables
-- AND pg_relation_size(relname::REGCLASS) > 5000000
ORDER BY totalseqscan DESC;
-- missing indexes from GCP docs:
-- Optimize CPU usage
-- https://cloud.google.com/sql/docs/postgres/optimize-cpu-usage
SELECT
relname,
idx_scan,
seq_scan,
n_live_tup
FROM
pg_stat_user_tables
WHERE
seq_scan > 0
ORDER BY
n_live_tup DESC;
- https://www.cybertec-postgresql.com/en/b-tree-index-deduplication/ PostgreSQL 13 introduced index deduplication that helps remove nulls from indexes.
UPDATE creates a new version of a table row, and each row version has to be indexed. Consequently, even a unique index can contain the same index entry multiple times. The new feature is useful for unique indexes because it reduces the index bloat that normally occurs if the same table row is updated frequently
Deduplication results in a smaller index size for indexes with repeating entries. This saves disk space and, even more importantly, RAM when the index is cached in shared buffers. Scanning the index becomes faster, and index bloat is reduced.
If you want to make use of the new feature after upgrading with pg_upgrade, you'll have to REINDEX promising indexes that contain many duplicate entries. Upgrading with pg_dumpall and restore or using logical replication rebuilds the index, and deduplication is automatically enabled.
If you want to retain the old behavior, you can disable deduplication by setting deduplicate_items = off
on the index.