Write Batch With Index - cz2h/rocksdb GitHub Wiki
Write Batch With Index
The WBWI (Write Batch With Index) encapsulates a WriteBatch and an Index into that WriteBatch. The index in use is a Skip List. The purpose of the WBWI is to sit above the DB, and offer the same basic operations as the DB, i.e. Writes - Put
, Delete
, and Merge
, and Reads - Get
, and newIterator
.
Write operations on the WBWI are serialized into the WriteBatch (of the WBWI) rather than acting directly on the DB. The WriteBatch can later be written atomically to the DB by calling db.write(wbwi)
.
Read operations can either be solely against the WriteBatch (e.g. GetFromBatch
), or they can be read-through operations. A read-through operation, (e.g. GetFromBatchAndDB
), first tries to read from the WriteBatch, if there is no updated entry in the WriteBatch then it subsequently reads from the DB.
The WriteBatch itself is a thin wrapper around a std::string
field called rep
. The WriteBatch can be thought of as append-only. Each new write operation simply appends to the WriteBatch rep
, see: https://github.com/facebook/rocksdb/blob/main/db/write_batch.cc#L10
The purpose of the Index within the WBWI is to track which keys that have been written to the WriteBatch. The index is a mapping from the key to the offset of the serialized operation within the WriteBatch (i.e. the position within the rep
string).
This index is used for read-operations against the WriteBatchWithIndex to determine whether a write operation on the WriteBatch has previously occurred for the key that is being written. If there is an entry in the Index the offset into the WriteBatch is used to read the value, if not, then the database can be consulted. The Index allows us to avoid having to scan the WriteBatch when performing Get
or Seek
(for iterator) operations.
Transaction Building Block
The WBWI can be used as a component if one wishes to build Transaction Semantics atop RocksDB. The WBWI by itself isolates the Write Path to a local in-memory store and allows you to RYOW (Read-Your-Own-Writes) before data is atomically written to the database.
It is a key component in RocksDB's Pessimistic and Optimistic Transaction utility classes.
Modes of Operation
The WBWI has two modes of operation, 1) where all write operations to the same key can be retrieved by iteration, and 2) where only the latest write operation for a key can be retrieved by iteration. Regardless of the mode, all write operations themselves are preserved in the append-only WriteBatch, the mode only controls what is retrievable from the WriteBatch when RYOW.
The mode of operation is controlled by use of the WriteBatchWithIndex constructor argument named overwrite_key
. The default mode of operation is to allow all write operations to be retrieved (i.e. overwrite_key=false
).
Not all WBWI functions are supported in both modes, see the table below:
WBWI Function | overwrite_key=false | overwrite_key=true |
---|---|---|
Put | Yes | Yes |
Delete | Yes | Yes |
DeleteRange | No | No |
Merge | Yes | Yes |
GetFromBatch | Yes (1) | Yes (2) |
GetFromBatchAndDB | Yes (3) | Yes (4) |
NewIterator | One-or-more entries per Key in the batch | One entry per Key in the batch |
NewIteratorWithBase | No - nullptr | Yes |
- If a Merge happened before GetFromBatch the value of the Merge Operator and the values of any previous Put operations on the batch are returned; not the result of the Merged Operation.
- If a Merge happened before GetFromBatch the error status
kMergeInProgress
is returned. - If a Merge happened before GetFromBatchAndDB the value of the Merge Operator, the values of any previous Put operations on the batch, and the values of any matching keys in the DB are returned; not the result of the Merged Operation.
- If a Merge happened before GetFromBatchAndDB the error status
kMergeInProgress
is returned.
overwrite_key=false
Index Entries when TODO
overwrite_key=true
Index Entries when TODO