xtf - snac-pilot/data-model GitHub Wiki

XTF indexes the content and structure of XML documents with lucene indexes of words and "lazy tree" indexes of XML structure.

Right now, this happens in a batch mode. eScholarship -- the XTF instance run by Martin Haye and Kirk Hastings -- does not use the stock XTF ./bin/textIndexer command to do incremental indexes. Rather, it has a special directory based queue that watches for new files to come in. The python script they have set up runs ./bin/textIndexer with special options which allow XTF to reindex small batches of documents quickly with low overhead.

A queue based re-indexer needs to be written for the SNAC pilot, as the participants will expect their changes published soon after their edits are approved, they do not want to wait overnight.

https://github.com/snac-pilot/xtf-reindex-queue