ZXINFO (STEP 03) ‐ FSCrawler (document indexing) - thomasheckmann/zxinfo-es GitHub Wiki
This section describes how to index all .txt and .pdf files for searching using FSCrawler.
Indexing files for SpectrumComputing / ZXInfo runs in two jobs:
- Indexing old WoS documents --jobname wos (only requires an initial run)
- Indexing SC documents --jobname zxdb (run after every ZXDB update)
cd ~/Public/HETZNER_SITES/ZXINFO/elastic/fscrawler
# INDEX WoS documents, only needed to run once
docker run -it --rm \
-v "$PWD/.fscrawler:/home/fscrawler/.fscrawler" \
-v "/Volumes/M2_SSD/kolbeck/Public/HETZNER_SITES/ZXINFO/elastic/assets/spectrumcomputing.co.uk/pub/sinclair/games-info/:/tmp/es":ro \
dadoonet/fscrawler:noocr wos --restart --loop 1
# INDEX ZXDB, needs to be run after every ZXDB/SC updates
docker run -it --rm \
-v "$PWD/.fscrawler:/home/fscrawler/.fscrawler" \
-v "/Volumes/M2_SSD/kolbeck/Public/HETZNER_SITES/ZXINFO/elastic/assets/spectrumcomputing.co.uk/zxdb/sinclair/entries/:/tmp/es":ro \
dadoonet/fscrawler:noocr zxdb --restart --loop 1
Two ned INDICES should have been created: zxdb_doc & zxdb_doc_folder