Virtuoso full text indexing - dkmfbk/knowledgestore GitHub Wiki

A full text index indexing object literals is managed by Virtuoso to perform bif:contains queries. The documentation for this feature is here.

Important notes

By default, the full text index is updated asynchronously after data modification operations; i.e., after data is loaded (e.g., via ttlp), the full text index may be incomplete. It is not clear if and how far the use of log_enable(2) for bulk loading influences this situation.

To force the completion of the full text index and align it to loaded data, the following command should be executed:

DB.DBA.VT_INC_INDEX_DB_DBA_RDF_OBJ ();

If for whatever reason it doesn't work, the following command forces the full text index to be recreated from scratch (it may take some time):

DB.DBA.RDF_OBJ_FT_RECOVER();

While the documentation says it "inserts all missing free-text index items", in practice it seems to recreate the index from scratch, or almost do that. As a reference, on a DB of 95M DBpedia triples it took 570s (around 160/170 Ktriples/s).

Real-time full text indexing (doesn't work in our experience)

According to the documentation, the following command should switch to real-time full text indexing --

DB.DBA.VT_BATCH_UPDATE ('DB.DBA.RDF_OBJ', 'OFF', null);

-- while the following command configures periodic indexing every 10 minutes --

DB.DBA.VT_BATCH_UPDATE ('DB.DBA.RDF_OBJ', 'ON', 10);

We tried the first one with no effect on successive ttlp population commands. I.e., we inserted new data, ttlp returned, but we weren't able to obtain results using bif:contains on newly indexed triples. So it seems we have to stick to asynchronous indexing.

Indexing rules (no need to mess with them)

Internally, Virtuoso has a table DB.DBA.RDF_OBJ_FT_RULES, where each tuple is a rule saying which literals should be indexed (the table should be something like graph URI, property URI, 'All' constants; the latter standing for some human-readable reason for indexing those literals). The following command lists the content of that table:

SELECT * FROM DB.DBA.RDF_OBJ_FT_RULES;

It should contain a tuple (null, null, ALL) which stands for a rule requiring all literals for any predicate and graph to be indexed. The following commands can be used to add or delete rules (never had a reason to call them):

DB.DBA.RDF_OBJ_FT_RULE_ADD (null, null, 'All');
DB.DBA.RDF_OBJ_FT_RULE_DEL (null, null, 'All');