Findings on the Redland internal implementation - nbaksalyar/redland-rs GitHub Wiki

  • Redland has OOP-style interfaces and implementations. Iterator is an abstract module that provides a generic interator interface in C.

  • Model provides an API to work with RDF graphs in an abstract way.

  • Redland has a single storage interface but multiple storage implementations. Storages implement backends for triple stores, such as MySQL, PostgreSQL, or key-value storages. Key-value storage is somewhat confusingly named hashes.

  • The hashes storage implementation depends on the hashes interface, which provides a hash-table functionality (basically, a general-purpose key-value based storage). It also has multiple implementations (memory - in-memory key-value storage and bdb - BerkelyDB-backed key-value storage). A bit confusingly, the hashes interface also used for some internal purposes (such as handling the storage options, etc.).

  • The storage interface is used internally by model to look up triples and iterate over the RDF statements. When iterating, a new iterator object is created in librdf_hash_get_all and each call to librdf_iterator_get_object is delegated to the storage implementation. In case of the librdf_storage_hashes storage, it relies on the internal representation of k/v pairs in the librdf_hashes module, decoding the value.

  • The idea for Mutable Data storage is to copy the keys/values stored in rdf_hashes and transform them into Vec<EntryAction> which can be applied directly in a mutate_mdata_entries operation. This should allow efficient insertion and deletion operations, without overwriting an entire RDF graph. We should also account for updates; this, however, should be simple, as it must be sufficient to overwrite the values having the same keys.

    Data retrieval is more tricky, but not too much. We just need to restore a state of an librdf_hash object. For that, it should be sufficient to dump all MData keys & values, transform them into librdf_hash_datum structs, and call librdf_hashes_put(item) for each one. Then, the internal state of librdf_storage_hashes object has got to be synchronised with the internal state of the librdf_hash object.

    It should be noted that the redland-rs hash table implementation supports storing multiple values for a single key (one-to-many). It needs to be translated into Mutable Data which supports only one-to-one mappings.

    Subsequently, while iterating over the storage, the stored encoded values will be decoded into RDF terms and streamed into a model, providing the full scope of librdf functionality, including queries.