The Benefits of Titan - andrew-nguyen/titan GitHub Wiki
Titan is designed to support the processing of graphs so large that they require storage and computational capacities beyond what a single machine can provide. This is Titan’s foundational benefit. This section will discuss the various specific benefits of Titan and its underlying, supported persistence solutions.
- Support for very large graphs. Titan graphs scale with the number of machines in the cluster.
- Support for very many concurrent transactions. Titan’s transactional capacity scale with the number of machines in the cluster.
- Support for geo, numeric range, and full text search for vertices and edges on very large graphs.
- Native support for the popular property graph data model exposed by Blueprints.
- Native support for the graph traversal language Gremlin.
- Easy integration with the Rexster graph server for programming language agnostic connectivity.
- Numerous graph-level configurations provide knobs for tuning performance.
- Vertex-centric indices provide vertex-level querying to alleviate issues with the infamous super node problem.
- Provides an optimized disk representation to allow for efficient use of storage and speed of access.
- Open source under the liberal Apache 2 license.
- Continuously available with no single point of failure.
- No read/write bottlenecks to the graph as there is no master/slave architecture.
- Elastic scalability allows for the introduction and removal of machines.
- Caching layer ensures that continuously accessed data is available in memory.
- Increase the size of the cache by adding more machines to the cluster.
- Integration with Hadoop.
- Open source under the liberal Apache 2 license.
- Tight integration with the Hadoop ecosystem.
- Native support for strong consistency.
- Linear scalability with the addition of more machines.
- Strictly consistent reads and writes.
- Convenient base classes for backing Hadoop MapReduce jobs with HBase tables.
- Support for exporting metrics via JMX.
- Open source under the liberal Apache 2 license.
When using a database, the CAP theorem should be thoroughly considered (C=Consistency, A=Availability, P=Partitionability). Titan is distributed with 3 supporting backends: Cassandra, HBase, and BerkeleyDB. Their tradeoffs with respect to the CAP theorem are represented in the diagram below. Note that BerkeleyDB is a non-distributed database and as such, is typically only used with Titan for testing and exploration purposes.
“Despite your best efforts, your system will experience enough faults that it will have to make a choice between reducing yield (i.e., stop answering requests) and reducing harvest (i.e., giving answers based on incomplete data). This decision should be based on business requirements.” – Coda Hale
HBase gives preference to consistency at the expense of yield, i.e. the probability of completing a request. Cassandra gives preference to availability at the expense of harvest, i.e. the completeness of the answer to the query (data available/complete data).
Through the Fulgora extension Titan can also be used as an in-memory graph database for low latency applications. The Fulgora extension provides support for in-memory storage backends and currently supports the Hazelcast distributed in-memory data grid.