Storage Backend Overview - andrew-nguyen/titan GitHub Wiki

Titan persists all data in a storage backend. It supports various storage backends which is denoted by the storage.backend option. The choice of backend determines the transactional guarantees and scalability of a particular Titan graph instance. This means, Titan can accomodate any level of isolation, consistency, scalability, or availability that a particular application may require.

However, the CAP theorem stipulates that any practical database can only provide 2 of the 3 desirable properties: Consistency, Availability, and Partitionability (i.e. scalability). The choice of storage backend is therefore a tradeoff guided by the requirements of a particular use case.

Titan currently supports 3 storage backends covering all 3 edges of the CAP theorem triangle as shown in the figure below.



Storage Backend Comparison

Name storage.option Consistency Availability Scalability Replication Persistence
Cassandra cassandra eventually consistent highly available linear scalability yes disk
HBase hbase vertex consistent failover recovery linear scalability yes disk
BerkeleyDB berkeleyje ACID single point of failure single machine HA mode available disk
Persistit persisit ACID single point of failure single machine none disk

Choosing a Storage Backend

  1. Decide between disk-backed and in-memory storage backends
    • Disks/SSDs are significantly less expensive than RAM. For large graphs, cost can therefore determine this decision
    • In-memory storage backends can provide lower latency answers to queries. This greatly depends on the queries, however. If only a small subset of the graph is queried, disk-backed storage backends will cache that part of the data in-memory as well. It is only once disk access becomes inevitable that latencies will significantly increase.
  2. Decide between the availability, consistency and partitionability characteristics of the storage backend based on your application requirements.
⚠️ **GitHub.com Fallback** ⚠️