QLever's distinguishing features - ad-freiburg/qlever GitHub Wiki

We are often asked what distinguishes QLever from other RDF databases aka triple stores, and how it achieved its great performance. This is a preliminary list that provides some answers.

Efficient on commodity hardware

QLever can index and query hundreds of billions of triples on a single commodity PC. QLever tries to keep all of the central metrics as small as possible: indexing time, space of the index on disk, query time, RAM consumption during query processing.

Data layout

QLever's indexes are laid out in such a way that typical queries have access to exactly the data they need (no more and no less) with maximally high locality of access. In other words, no more data is read than necessary and there are no more random accesses than necessary. This is crucial for a good performance, when the index resides on disk, in particular, when it is a disk with slow random access, like a HDD. However, the data layout is also beneficial when the data resides on SSD or when it resides fully in RAM (in all three media, random accesses are more expensive than sequential accesses, just the factor varies).

RDF native

QLever is built from scratch and does not build on other frameworks. The only libraries it uses are for low-level operations, like asynchronous low-level IO, JSON parsing, or basic data structures. This tight coupling of the storage layer and the query processor enable many RDF-specific optimizations that are useful for many typical SPARQL queries that are otherwise very expensive to compute. In contrast, many other engines are built on top of a separate data storage layer, like a general purpose key-value store (like RocksDB) or a full-featured relational database. In particular, this is true for Virtuoso, Stardog, Oracle Spatial and Graph, Amazon's Neptune, Oxigraph, and SAP's knowledge graph engine.

State-of-the-art algorithms and data structures

QLever originated at the Chair for Algorithms and Data structures. Consequently, throughout the code great care has been taken to employ the algorithms and data structures with the right complexity for the task. Note that this is not always the theoretically best algorithm. Depending on the circumstances, a carefully engineered baseline algorithm can be better. The important point is to know when to use which. In contrast, many other engines use baseline algorithms throughout their codebase, even when more sophisticated algorithms would provide significant performance improvements.

C++ and algorithm engineering

Choosing the conceptually right algorithm is one thing, implementing it properly so that it takes maximal advantage of the available hardware is another. This is known as algorithm engineering, and it requires experience, a deep understanding of the implemented algorithms, and a programming language like C++ that supports writing code that is potentially as efficient as code written in a low-level language like C. Despite claims to the opposite, this kind of optimization is possible only to a limited extent in programming languages like Java. Many other engines are written in Java, in particular, GraphDB, Stardog, and Apache Jena.

Special features

QLever provides a variety of special features that are not offered by other engines. This includes: efficient content-sensitive autocompletion (for computing automatic suggestions for completing a partially typed SPARQL query), combined SPARQL and Text search (not just keyword search in literals, but keyword search in an arbitrary text corpus that has been entity-linked to the RDF dataset), and efficient spatial search (surpassing all spatial databases from the relational world, including the widely used PostgreSQL+PostGIS, regarding performance).

Powerful user interfaces

QLever comes with Qlue-ls, the first comprehensive implementation of the language server protocol (LSP) for SPARQL. In particular, Qlue-ls provides automatic formatting, configurable autocompletion based on QLever's efficient context-sensitive autocompletion, and a variety of useful code actions. QLever's powerful UI is built on Qlue-ls. QLever also features a powerful Map View that provides interactive visualization of very large numbers of geometric objects on a map (up to hundreds of millions, where most map applications already start lagging for tens of thousands of objects).