postgres extension pg_trgm - ghdrako/doc_snipets GitHub Wiki

pg_trgm extension is an official extension and is included in the official PostgreSQL contribs package.

The PostgreSQL extension pg_trgm provides functions and operators for determining the similarity of alphanumeric text based on trigram matching, as well as index operator classes that support fast searching for similar strings.

Trigrams, Similarity and Searches


select show_trgm('tomy') as tomy, 
       show_trgm('Tomy') as "Tomy", 
       show_trgm('tom torn') as "tom torn", 
       similarity('tomy', 'tom'), 
       similarity('dim', 'tom');

The similarity function returns a number that indicates how similar the two arguments are. The range of the result is zero (indicating that the two strings are completely dissimilar) to one (indicating that the two strings are identical).The default similarity threshold is 0.3 and you can tweak it by using the GUC setting pg_trgm.similarity_threshold.

select artist, title 
 from lastfm.track 
 where title % 'love' 
 group by artist, title 
 order by title <-> 'love' 
 limit 10;

the ``pg_trgm extension introduce several new operator:

  • The operator % reads similar to and involves comparing trigrams of both its leff and right arguments
  • The operator <->computes the “distance” between the arguments,i.e.one minus the similarity() value. Trigrams are computed in a case insensitive way.
select show_trgm('peace') as "peace", 
       show_trgm('Peace') as "Peace";
─[ RECORD 1 ]
────────────────────────── 
peace │ {" p"," pe",ace,"ce ",eac,pea}
Peace │ {" p"," pe",ace,"ce ",eac,pea} 
select artist, title 2 from lastfm.track 3 where title %> 'peace';

The operator %> uses the word_similarity function introduced above, and takes into account that its left operand is a longer string, and its right operand is a single word to search. We could use the <% operator, where left and right operands are used the other way round:word <% phrase.

select artist, title 
 from lastfm.track 
 where title %> 'peas' 
 order by title <-> 'peas' 
 limit 5;

Find suggestion and limit to top 5 in ordered by how close the titles are to the search term.

Comparing to regexp search its not return sny result is search string is mispealing

select artist, title 
 from lastfm.track 
 where title ~* 'peas';

This query returns

Trigram Indexing

B-tree, using the varchar_pattern_ops operator class, is able to index text queries for sentences that begin with search%, but cannot index text queries for sentences that end in %search or contain %search%:

set enable_seqscan to 'off'; -- force Post-greSQL to use an index if it exists
create index on categories using btree(title varchar_ pattern_ops);
explain analyze select * from categories where title like 'Da%'; -- use index
explain analyze select * from categories where title like '%Da%'; -- not use index

To improve this kind of search, we can use the pg_trgm extension. When we use this extension, PostgreSQL splits every word into a set of trigrams and makes a GIST or GIN index on it.

create extension pg_trgm;
\dx+ pg_trgm
create index on categories using gin (title gin_trgm_ ops);
explain analyze select * from categories where title like 'Da%'; -- use index

pg_trgm extension solves the access index for tables of all types of like and ilike queries;

The pg_trgm extension comes with specific indexing algorithms to take care of searching for similarity. Moreover, it covers searching for regular expressions too. Here’s how to build our index:

create index on lastfm.track using gist(title gist_trgm_ops);
explain (analyze, costs off) 
 select artist, title 
 from lastfm.track 
 where title %> 'peas' 
 order by title <-> 'peas' 
 limit 5;

This plan step is able to stop the index scan as soon as it has sent the first five rows, because the index scan is known to return them in order: