Graph Configuration - andrew-nguyen/titan GitHub Wiki
This page summarizes Titan’s configurations options. To configure a TitanGraph
invoke the TitanFactory.open()
method with a commons configuration object or the file name of a properties configuration file. The TitanFactory
will return a TitanGraph
according to the configuration.
The following example opens a TitanGraph
with the configured local storage directory.
Configuration conf = new BaseConfiguration();
conf.setProperty("storage.directory","/tmp/titan");
TitanGraph g = TitanFactory.open(conf);
Alternatively, we could have written the configuration into a properties file and opened a
TitanGraph
as followsTitanGraph g = TitanFactory.open("/tmp/titan/configuration.properties");
It is very important to note that some configuration options should NEVER be modified after the TitanGraph
is first initialized. Doing so may corrupt the graph or lead to unexpected behavior. The modifiable column indicates whether a particular configuration option can be changed after the graph has been created.
Titan leverages various storage backends for persistence. Use the following options to configure one of the pre-defined storage backends or your own. Particular storage backends may provide or require additional configuration options. For more information on how to configure a TitanGraph
over Cassandra or HBase please review the respective wiki pages.
Option | Description | Value | Default | Modifiable |
---|---|---|---|---|
storage.backend | Full class name of the StorageManager implementation defining the storage backend to be used for persistence or one of the following pre-defined storage backends: cassandra, hbase, hazelcastcache, persistit, berkeleyje |
Class name or short-hand | local | no |
storage.directory | Storage directory for those storage backends that require local storage | path | – | no |
storage.read-only | Specifies whether write operations are supported on the graph | true or false | false | yes |
storage.batch-loading | Enables batch loading which improves write performance but assumes that only one thread is interacting with the graph and that vertices retrieved by id exist. Under these assumptions locking and some read operations can be avoided. Furthermore, the configured storage backend will make backend specific configurations that facilitate loading performance. Careful: enabling batch loading when the assumptions are violated can result in an inconsistent or partially corrupt graph. We strongly suggest to disable automatic type creation. | true or false | false | yes |
storage.buffer-size | Buffers graph mutations locally up to the specified number before persisting them against the storage backend. Set to 0 to disable buffering. Buffering is disabled automatically if the storage backend does not support buffered mutations. | >0 | 1024 | yes |
storage.write-attempts | Number of times the database attempts to persist the transactional state to the storage layer. | >0 | 5 | yes |
storage.read-attempts | Number of times the database attempts to execute a read operation against the storage layer in the current transaction. | >0 | 3 | yes |
storage.attempt-wait | Time in milliseconds that Titan waits after an unsuccessful storage attempt before retrying. | >=0 | 250 | yes |
storage.page-size | Number of results to pull over the wire when iterating over a distributed storage backend. | >=0 | 100 | yes |
storage.username | Username to authenticate access to the storage backend (if necessary) | String | – | yes |
storage.password | Password to authenticate access to the storage backend (if necessary) | String | – | yes |
Titan contains a database level cache that is shared amongst all transactions and can significantly speed up traversal times – in particular on read heavy work loads.
Option | Description | Value | Default | Modifiable |
---|---|---|---|---|
cache.db-cache | Enables the database level cache | true or _false | false | yes |
cache.db-cache-size | The size of the database level cache. If this value is >0.0 and <1.0 then it is interpreted as a percentage of the total heap space available to the JVM Titan is running in. If this value is bigger than 1.0 it is interpreted as an absolute size in bytes. | number | 0.3 | yes |
cache.db-cache-time | The default expiration time in milliseconds for elements held in the database level cache. This is the time period before Titan will check against storage backend for a newer results. Setting this value to 0 will cache elements forever (unless they get evicted due to space constraints or modifications). | >=0 | 10000 | yes |
cache.db-cache-clean-wait | How long (in milliseconds) the database level cache will keep keys expired while the mutations that triggered the expiration are being persisted. This value should be larger than the time it takes for persisted mutations to become visible. This setting only ever makes sense for distributed storage backends where writes may be accepted but are not immediately readable. | >=0 | 50 | yes |
These configuration settings how Titan allocates and assigns ids.
Option | Description | Value | Default | Modifiable |
---|---|---|---|---|
ids.block-size | Size of the id block to be acquired. Larger block sizes require fewer block applications but also leave a larger fraction of the id pool occupied and potentially lost. For write heavy applications, larger block sizes should be chosen. | positive integer | 10,000 | No* |
ids.flush | If flush ids is enabled, vertices and edges are assigned ids immediately upon creation. If not, then ids are only assigned when the transaction is committed. | true or false | true | yes |
ids.renew-timeout | Number of milliseconds Titan’s id pool manager will wait while attempting to acquire a new id block before failing. | Time in ms | 60000 | yes |
ids.renew-percentage | Threshold percentage of ids left in the pool before Titan’s id pool manager starts acquiring a new id block. A smaller percentage reduces potential id waste but can lead to delays when the current id block is exhausted. | Percentage | 0.3 | yes |
When Titan is used in distributed mode over multiple machines, locking may be required to ensure consistency of certain operations. In particular, locking is required when allocating id blocks for id assignment to individual TitanGraph
instances. Locking is also required for certain key type configuration as described in Type Configuration. These options control how locks are acquired.
Option | Description | Value | Default | Modifiable |
---|---|---|---|---|
storage.machine-id | A unique identifier for the machine running the TitanGraph instance. Ensures that no other machine accessing the storage backend can have the same identifier. |
String | Machine IP | Yes |
storage.machine-id-appendix | A locally unique identifier for a particular TitanGraph instance. This only needs to be configured when multiple TitanGraph instances are running on the same machine. A unique machine specific appendix guarantees a globally unique identifier. |
short integer | 0 | Yes |
storage.lock-wait-time | The number of milliseconds the system waits for a lock application to be acknowledged by the storage backend. Also, the time waited at the end of all lock applications before verifying that the applications were successful. This value should be a small multiple of the average consistent write time. | positive integer | 100 | No |
storage.lock-retries | Number of times the system attempts to acquire a lock before giving up and throwing an exception. | positive integer | 3 | Yes |
storage.lock-expiry-time | Number of milliseconds after which a lock is considered to have expired. Lock applications that were not released are considered expired after this time and released. This value should be larger than the maximum time a transaction can take in order to guarantee that no correctly held applications are expired pre-maturely and as small as possible to avoid dead lock. | positive integer | 300,000 | No |
storage.idauthority-wait-time | The number of milliseconds the system waits for an id block application to be acknowledged by the storage backend. Also, the time waited after the application before verifying that the application was successful. | positive integer | 300 | No |
storage.idauthority-retries | Number of times the system attempts to acquire a unique id block before giving up and throwing an exception. | positive integer | 20 | Yes |
If the modifiable column has a No* this means that the option cannot be adjusted while Titan instances are running. To change the option, all Titan instances must be shutdown and the value must be changed across the entire cluster be starting instances again. Also note, that while all configuration options mentioned on this page must be identical for all Titan instances running in the same cluster, the storage.machine-id and storage.machine-id-appendix must be configured uniquely for each Titan instance individually unless the default values are chosen.
These configuration options apply when Titan is used with a distributed storage backend, i.e. the data is stored across multiple machines. They control how Titan communicates with the storage backend cluster.
Option | Description | Value | Default | Modifiable |
---|---|---|---|---|
storage.setup-wait | When Titan is started, it will attempt to communicate with the storage backend. If the backend cluster is in the process of starting up or otherwise not yet fully available, Titan will repeatedly re-attempt communication and wait for this long before failing. | Time in ms | 60000 | Yes |
storage.page-size | Some Titan operations, like retrieving all vertices, require a lot of data to be retrieved from the storage cluster. Such operations are broken down into smaller batches and this option configures their size. The larger this value, the fewer requests need to be made but the longer each request takes. | >=0 | 100 | Yes |
Titan can interface with multiple external index systems to provide support for geo, numeric range, and full-text search. This configuration option tells Titan which indexing backend to use for a particular index identified by name.
Option | Description | Value | Default | Modifiable |
---|---|---|---|---|
storage.index.INDEX-NAME.backend | Implementation of the index backend configured for INDEX-NAME. Additional configuration options specific to this index are configured in the same configuration namespace. Please refer to the documentation page of the index backend for more information | elasticsearch or lucene | none | No |
For more information on external indexes and how to configure a particular index system, please refer to Indexing Backend Overview.
This section contains general configuration options to customize Titan’s behavior.
Option | Description | Value | Default | Modifiable |
---|---|---|---|---|
autotype | Specifies the factory to use when automatically creating types. See Default Type Creation for more details. Set to none to disable automatic type creation | blueprints or none | blueprints | Yes |
tx-cache-size | The maximum number of vertices to hold in the transaction local cache. Each transaction maintains a cache of up to this size. Larger cache size means possibly faster transactions while consuming more memory | integer | 20000 | yes |
fast-property | Whether all properties should be pre-fetched for a vertex with the first property lookup on that vertex. Significantly speeds up multiple property retrievals. Disable on graphs with vertices that have lots of properties. | boolean | true if storage backend is distributed else false | yes |
This page lists all the common configuration options for Titan. Learn more about some special purpose configuration:
Each storage backend has additional configuration options which are listed on the following pages:
- Cassandra Configuration
- HBase Configuration
- Hazelcast Configuration
- Persistit Configuration
- BerkeleyDB Configuration
Each index backend has additional configuration options which are listed on the following pages: