Using Hazelcast - andrew-nguyen/titan GitHub Wiki
Hazelcast is a distributed in-memory data-grid that provides fast access to large amounts of data distributed across a cluster of machines. The Hazelcast storage backend for Titan is a low latency optimized alternative that excels at read-mostly workloads that uniformly access a graph. Note, that Hazelcast does not provide durable persistence. For production deployments, Hazelcast is operated as a data-grid where data is replicated to multiple machine for safety. Hazelcast runs in the same JVM as Titan.
Note, that the Hazelcast storage backend was first included in Titan 0.4.0 and is considered experimental for now.
Hazelcast is the leading open source in-memory data grid it provides java developers an easy-to-use and powerful solution for data distribution that can be used as a clustering solution, a distributed cache and a NoSQL key-value store. — Hazelcast Homepage
Since Hazelcast runs in the same JVM as Titan, connecting the two only requires a simple configuration and no additional setup:
Configuration conf = new BaseConfiguration();
conf.setProperty("storage.directory", "/tmp/graph");
conf.setProperty("storage.backend", "hazelcastcache");
TitanGraph graph = TitanFactory.open(conf);
Currently, the Hazelcast storage backend adapter stores some properties in local files (i.e. not in the data-grid) and hence the necessity to provide a local directory. This will likely change in future releases.
In addition to the general Titan Graph Configuration, there are the following Hazelcast specific Titan configuration options:
Option | Description | Value | Default | Modifiable |
---|---|---|---|---|
storage.transactions | Enables transactions and detects conflicting database operations. CAUTION: While disabling transactions can lead to better performance it can cause to inconsistencies and even corrupt the database if multiple Titan instances interact with the same Hazelcast grid. | true or false | true | yes |
Further configuration options can be set in the Hazelcast configuration XML file. Please see the Hazelcast documentation for more information.
The Hazelcast storage backend is best suited for read-mostly graph workloads on medium size graphs that can fit entirely into memory on one or multiple machines. Read-mostly means that most of the transactions read data from the graph and very few updates occur. For write-heavy applications, other storage backends will yield much better performance.
Also, for this storage backend to be cost effective, most of the graph should be accessed regularly. If large parts of the graph go stale (e.g. old elements are rarely accessed) it might be more cost effective to use a disk-backed storage backend where stale data can reside on disk where it is rarely accessed.
For uniformly read graphs with few updates, the Hazelcast storage backend provides low latency query times that are hard to match by disk-backed storage backends.