ZXINFO (STEP 03) ‐ FSCrawler (document indexing) - thomasheckmann/zxinfo-es GitHub Wiki

This section describes how to index all .txt and .pdf files for searching using FSCrawler.

Indexing files for SpectrumComputing / ZXInfo runs in two jobs:

  • Indexing old WoS documents --jobname wos (only requires an initial run)
  • Indexing SC documents --jobname zxdb (run after every ZXDB update)
cd ~/Public/HETZNER_SITES/ZXINFO/elastic/fscrawler

# INDEX WoS documents, only needed to run once
docker run -it --rm \
     -v "$PWD/.fscrawler:/home/fscrawler/.fscrawler" \
     -v "/Volumes/M2_SSD/kolbeck/Public/HETZNER_SITES/ZXINFO/elastic/assets/spectrumcomputing.co.uk/pub/sinclair/games-info/:/tmp/es":ro \
     dadoonet/fscrawler:noocr wos --restart --loop 1

# INDEX ZXDB, needs to be run after every ZXDB/SC updates
docker run -it --rm \
     -v "$PWD/.fscrawler:/home/fscrawler/.fscrawler" \
     -v "/Volumes/M2_SSD/kolbeck/Public/HETZNER_SITES/ZXINFO/elastic/assets/spectrumcomputing.co.uk/zxdb/sinclair/entries/:/tmp/es":ro \
     dadoonet/fscrawler:noocr zxdb --restart --loop 1

Two ned INDICES should have been created: zxdb_doc & zxdb_doc_folder