xtf - snac-pilot/data-model GitHub Wiki
XTF indexes the content and structure of XML documents with lucene indexes of words and "lazy tree" indexes of XML structure.
Right now, this happens in a batch mode. eScholarship -- the XTF instance run by Martin Haye and Kirk Hastings -- does not use the stock XTF ./bin/textIndexer
command to do incremental indexes. Rather, it has a special directory based queue that watches for new files to come in. The python script they have set up runs ./bin/textIndexer
with special options which allow XTF to reindex small batches of documents quickly with low overhead.
A queue based re-indexer needs to be written for the SNAC pilot, as the participants will expect their changes published soon after their edits are approved, they do not want to wait overnight.