postgres extension pg_trgm - ghdrako/doc_snipets GitHub Wiki
pg_trgm extension is an official extension and is included in the official PostgreSQL contribs package.
The PostgreSQL extension pg_trgm
provides functions and operators for determining the similarity of alphanumeric text based on trigram matching, as well as index operator classes that support fast searching for similar strings.
Trigrams, Similarity and Searches
select show_trgm('tomy') as tomy,
show_trgm('Tomy') as "Tomy",
show_trgm('tom torn') as "tom torn",
similarity('tomy', 'tom'),
similarity('dim', 'tom');
The similarity
function returns a number that indicates how similar the two arguments are. The range of the result is zero (indicating that the two strings are completely dissimilar) to one (indicating that the two strings are identical).The default similarity threshold is 0.3 and you can tweak it by using the GUC setting pg_trgm.similarity_threshold
.
select artist, title
from lastfm.track
where title % 'love'
group by artist, title
order by title <-> 'love'
limit 10;
the ``pg_trgm
extension introduce several new operator:
- The operator
%
reads similar to and involves comparing trigrams of both its leff and right arguments - The operator
<->
computes the βdistanceβ between the arguments,i.e.one minus the similarity() value. Trigrams are computed in a case insensitive way.
select show_trgm('peace') as "peace",
show_trgm('Peace') as "Peace";
β[ RECORD 1 ]
ββββββββββββββββββββββββββ
peace β {" p"," pe",ace,"ce ",eac,pea}
Peace β {" p"," pe",ace,"ce ",eac,pea}
select artist, title 2 from lastfm.track 3 where title %> 'peace';
The operator %>
uses the word_similarity
function introduced above, and takes into account that its left operand is a longer string, and its right operand is a single word to search. We could use the <%
operator, where left and right operands are used the other way round:word <%
phrase.
select artist, title
from lastfm.track
where title %> 'peas'
order by title <-> 'peas'
limit 5;
Find suggestion and limit to top 5 in ordered by how close the titles are to the search term.
Comparing to regexp search its not return sny result is search string is mispealing
select artist, title
from lastfm.track
where title ~* 'peas';
This query returns
Trigram Indexing
B-tree, using the varchar_pattern_ops
operator class, is able to index text queries for sentences that begin with search%
, but cannot index text queries for sentences that end in %search
or contain %search%
:
set enable_seqscan to 'off'; -- force Post-greSQL to use an index if it exists
create index on categories using btree(title varchar_ pattern_ops);
explain analyze select * from categories where title like 'Da%'; -- use index
explain analyze select * from categories where title like '%Da%'; -- not use index
To improve this kind of search, we can use the pg_trgm extension. When we use this extension, PostgreSQL splits every word into a set of trigrams and makes a GIST or GIN index on it.
create extension pg_trgm;
\dx+ pg_trgm
create index on categories using gin (title gin_trgm_ ops);
explain analyze select * from categories where title like 'Da%'; -- use index
pg_trgm extension solves the access index for tables of all types of like and ilike queries;
The pg_trgm extension comes with specific indexing algorithms to take care of searching for similarity. Moreover, it covers searching for regular expressions too. Hereβs how to build our index:
create index on lastfm.track using gist(title gist_trgm_ops);
explain (analyze, costs off)
select artist, title
from lastfm.track
where title %> 'peas'
order by title <-> 'peas'
limit 5;
This plan step is able to stop the index scan as soon as it has sent the first five rows, because the index scan is known to return them in order: