Storage Backend Overview - andrew-nguyen/titan GitHub Wiki
Titan persists all data in a storage backend. It supports various storage backends which is denoted by the storage.backend
option. The choice of backend determines the transactional guarantees and scalability of a particular Titan graph instance. This means, Titan can accomodate any level of isolation, consistency, scalability, or availability that a particular application may require.
However, the CAP theorem stipulates that any practical database can only provide 2 of the 3 desirable properties: Consistency, Availability, and Partitionability (i.e. scalability). The choice of storage backend is therefore a tradeoff guided by the requirements of a particular use case.
Titan currently supports 3 storage backends covering all 3 edges of the CAP theorem triangle as shown in the figure below.
Name |
storage.option |
Consistency | Availability | Scalability | Replication | Persistence |
---|---|---|---|---|---|---|
Cassandra | cassandra | eventually consistent | highly available | linear scalability | yes | disk |
HBase | hbase | vertex consistent | failover recovery | linear scalability | yes | disk |
BerkeleyDB | berkeleyje | ACID | single point of failure | single machine | HA mode available | disk |
Persistit | persisit | ACID | single point of failure | single machine | none | disk |
- Decide between disk-backed and in-memory storage backends
- Disks/SSDs are significantly less expensive than RAM. For large graphs, cost can therefore determine this decision
- In-memory storage backends can provide lower latency answers to queries. This greatly depends on the queries, however. If only a small subset of the graph is queried, disk-backed storage backends will cache that part of the data in-memory as well. It is only once disk access becomes inevitable that latencies will significantly increase.
- Decide between the availability, consistency and partitionability characteristics of the storage backend based on your application requirements.