Using QLever for Wikidata - ad-freiburg/qlever GitHub Wiki

Setting up a QLever instance for a fresh copy of the complete Wikidata is very easy.

Prerequisites

You need a machine with at least 32 GB of RAM (better 64 GB) and 2 TB of disk space (SSD is best, but HDD also works). Download the qlever script and follow the simple instructions given on that page. Once you downloaded the script and started it, it is largely self-explanatory.

Get the data and build the index

The following commands create a Qleverfile (QLever's config file for everything) for Wikidata, downloads the dataset (aroung 100 GB compressed), load the data into QLever (aka builds an index), starts the server, and starts the UI and enables fast autocompletion (if you want that).

mkdir wikidata && cd wikidata
qlever setup-config wikidata
qlever get-data
qlever index
qlever start
qlever ui
qlever autocompletion-warmup

Performance statistics

The following statistics on the data loading (= index building) time and index size are from a build on 21.01.204, on a PC with an AMD Ryzen 9 5900X processor (16 cores), 128 GB RAM, and 7.3 TB of NVMe SSD space.

Parse input             :   1.4 h
Build vocabularies      :   0.4 h
Convert to global IDs   :   0.2 h
Permutation SPO & SOP   :   0.6 h
Permutation OSP & OPS   :   0.9 h
Permutation PSO & POS   :   0.9 h

TOTAL index build time  :   4.4 h
54G	wikidata.index.ops
108G	wikidata.index.ops.meta
51G	wikidata.index.osp
108G	wikidata.index.osp.meta
2.8G	wikidata.index.patterns
72G	wikidata.index.pos
2.3M	wikidata.index.pos.meta
73G	wikidata.index.pso
2.3M	wikidata.index.pso.meta
39G	wikidata.index.sop
94G	wikidata.index.sop.meta
41G	wikidata.index.spo
94G	wikidata.index.spo.meta
206G	wikidata.vocabulary.external
54G	wikidata.vocabulary.external.idsAndOffsets.mmap
908K	wikidata.vocabulary.internal
992G	total