Database Cache - andrew-nguyen/titan GitHub Wiki
Titan provides a database level cache that caches parts of the graph for faster subsequent access. In contrast to the transaction level caches, the database level caches do not expire immediately after closing a transaction. Hence, the database level cache significantly speeds up graph traversals for read heavy workloads across transactions.
The Graph Configuration page lists all of the configuration options that pertain to Titan’s database level cache. This page attempts to explain their usage.
Most importantly, the database level cache is disabled by default in the current release version of Titan. To enable it, set storage.db-cache=true
.
The most important setting for performance and query behavior is the cache expiration time which is configured via cache.db-cache-time
. The cache will hold graph elements for at most that many milliseconds. If an element expires, the data will be re-read from the storage backend on the next access.
If there is only one Titan instance accessing the storage backend or if this instance is the only one modifying the graph, the cache expiration can be set to 0 which disables cache expiration. This allows the cache to hold elements indefinitely (unless they are evicted due to space constraints or on update) which provides the best cache performance. Since no other Titan instance is modifying the graph, there is no danger of holding on to stale data.
If there are multiple Titan instances accessing the storage backend, the time should be set to the maximum time that can be allowed between another Titan instance modifying the graph and this Titan instance seeing the data.
If any change should be immediately visible to all Titan instances, the database level cache should be disabled in a distributed setup. However, for most applications it is acceptable that a particular Titan instance sees remote modifications with some delay. The larger the maximally allowed delay, the better the cache performance.
Note, that a given Titan instance will always immediately see its own modifications to the graph irrespective of the configured cache expiration time.
The configuration option cache.db-cache-size
controls how much heap space Titan’s database level cache is allowed to consume. The larger the cache, the more effective it will be. However, large cache sizes can lead to excessive GC and poor performance.
The cache size can be configured as a percentage (expressed as a decimal between 0 and 1) of the total heap space available to the JVM running Titan or as an absolute number of bytes.
Note, that the cache size refers to the amount of heap space that is exclusively occupied by the cache. Titan’s other data structures and each open transaction will occupy additional heap space. If additional software layers are running in the same JVM, those may occupy a significant amount of heap space as well (e.g. Rexster, embedded Cassandra, etc). Be conservative in your heap memory estimation. Configuring a cache that is too large can lead to out-of-memory exceptions and excessive GC.
When a vertex is locally modified (e.g. an edge is added) all of the vertex’s related database level cache entries are marked as expired and eventually evicted. This will cause Titan to refresh the vertex’s data from the storage backend on the next access and re-populate the cache.
However, when the storage backend is eventually consistent, the modifications that triggered the eviction may not yet be visible. By configuring cache.db-cache-clean-wait
, the cache will wait for at least this many milliseconds before repopulating the cache with the entry retrieved from the storage backend.
If Titan runs locally or against a storage backend that guarantees immediate visibility of modifications, this value can be set to 0.