Virtuoso full text indexing - dkmfbk/knowledgestore GitHub Wiki
A full text index indexing object literals is managed by Virtuoso to perform bif:contains
queries. The documentation for this feature is here.
Important notes
By default, the full text index is updated asynchronously after data modification operations; i.e., after data is loaded (e.g., via ttlp
), the full text index may be incomplete. It is not clear if and how far the use of log_enable(2)
for bulk loading influences this situation.
To force the completion of the full text index and align it to loaded data, the following command should be executed:
DB.DBA.VT_INC_INDEX_DB_DBA_RDF_OBJ ();
If for whatever reason it doesn't work, the following command forces the full text index to be recreated from scratch (it may take some time):
DB.DBA.RDF_OBJ_FT_RECOVER();
While the documentation says it "inserts all missing free-text index items", in practice it seems to recreate the index from scratch, or almost do that. As a reference, on a DB of 95M DBpedia triples it took 570s (around 160/170 Ktriples/s).
Real-time full text indexing (doesn't work in our experience)
According to the documentation, the following command should switch to real-time full text indexing --
DB.DBA.VT_BATCH_UPDATE ('DB.DBA.RDF_OBJ', 'OFF', null);
-- while the following command configures periodic indexing every 10 minutes --
DB.DBA.VT_BATCH_UPDATE ('DB.DBA.RDF_OBJ', 'ON', 10);
We tried the first one with no effect on successive ttlp
population commands. I.e., we inserted new data, ttlp
returned, but we weren't able to obtain results using bif:contains
on newly indexed triples. So it seems we have to stick to asynchronous indexing.
Indexing rules (no need to mess with them)
Internally, Virtuoso has a table DB.DBA.RDF_OBJ_FT_RULES
, where each tuple is a rule saying which literals should be indexed (the table should be something like graph URI, property URI, 'All' constants; the latter standing for some human-readable reason for indexing those literals). The following command lists the content of that table:
SELECT * FROM DB.DBA.RDF_OBJ_FT_RULES;
It should contain a tuple (null, null, ALL)
which stands for a rule requiring all literals for any predicate and graph to be indexed. The following commands can be used to add or delete rules (never had a reason to call them):
DB.DBA.RDF_OBJ_FT_RULE_ADD (null, null, 'All');
DB.DBA.RDF_OBJ_FT_RULE_DEL (null, null, 'All');