Database Versioning - Jmanuel4SandMan/BIMserver GitHub Wiki
Introduction
The BIMserver is storing all data in a key-value store. The key-value store is being accessed through the KeyValueStore Interface. Right now there is only one implementation, which is using the open source BerkeleyDB Java Edition.
A key value store is defined as:
- A set of named tables
- Each table has 2 columns, key and value
- Both the key and value columns can contain an arbitrarily sized byte array, this size may vary per record
- All keys in a table are always ordered
- No duplicate keys can exist
Principles
- Models are stored in projects
- Each new version of the model is stored a new revision
- Each revision of the project should always be accessible
- Revisions, once stored, can never be changed
- Objects in a project can only reference other objects in the same project.
Details
Each record in a table has the following layout:
Key | Value |
---|---|
Pid (4 bytes) + Oid (8 bytes) + Rid (4 bytes) | See description of value |
- Pid = Project Id
- Oid = Object Id
- Rid = Revision Id
One other term which is used is Cid (Class Id), this is a short (2 bytes), and used as a shorter way (than a complete class-name) to reference classes.
Records are only added, never modified or deleted.
Versioning
The image below is showing four revisions of a project. There are 3 tables: A, B and C. The diagrams on top are showing the objects + relations for each revision. The number between the parenthesis is the Oid (Object Id). The tables on the bottom are showing the complete database contents at the end of each revision. The number before the "." is the Object Id, the number after the dot is the Rid (Revision Id).
Value
Please read this first: [Link to description of EMF] For all structural features of the class of the object, some bytes are written to the value part of the record. All values are written in the order defined by the EMF model and include all structural features of all super classes.
Single attributes
Type | Size(bytes) | Serialisation | Null representation |
---|---|---|---|
String | 2 + size of UTF-8 encoded bytes | UTF-8 encoded | -1 (as a short) |
Integer | 4 | Default java serialisation | Cannot be null |
Long | 8 | Default java serialisation | Cannot be null |
Float | 4 | Default java serialisation | Cannot be null |
Double | 8 | Default java serialisation | Cannot be null |
Boolean | 1 | 0 for false, 1 for true | Cannot be null |
Date | 8 | Number of milliseconds since January 1, 1970, 00:00:00 GMT | -1 (as a long) |
Tristate | 1 | 0 for true, 1 for false, 2 for undefined | Cannot be null |
ByteArray | 4 + Length | 4 bytes (int) for length + bytes | 0 (as an int) |
Enum | 4 | Enum literal (int) | Cannot be null |
Single references
A null reference is stored as:
Short |
---|
-1 |
A non-null reference is stored as:
Short | Long |
---|---|
Cid | Oid |
Multiple attributes
Multiple attributes (such as a list of integers) are stored inline. The first two bytes indicate the length of the list, after that all values are serialized like normal single attributes.
Multiple references
Multiple references (lists of references to objects) are also stored inline. The first two bytes indicate the length of the list, after that all values are serialized like normal references.
Example
Let's say we have two classes, Person and Company:
The Person class has Cid 1, the Company class has Cid 2. Both classes have one instance, the person has Oid 100, the Company has Oid 101.
When serialized, the values will be
The person:
Name | Age | Company |
---|---|---|
(short)4 + 4 bytes | (int)80 | (short)2 + (long)101 |
The Company:
Name | Employees |
---|---|
(short)9 + 9 bytes | (short)1 + (short)1 + (long)100 |