Datastore Service - OpenSlides/OpenSlides GitHub Wiki

Basics

The datastore consists of a chain of events. An event captures a single modification of a single model, which may be the creation, an update, deletion or restoration. This is saved in the event_type (compare below).

Apart from the information about the event itself (e.g. the updated fields), every event is assigned a position. Every call to the write method generates a new position. Multiple events can share the same position if they are send in the same request. You can think of the position as a kind of "transaction number", except that there is no actual transaction happening. Instead, the datastore follows an optimistic concurrency control (OCC) pattern.

Migrations

See Datastore Migrations.

Optimistic concurrency control

Since in OpenSlides we do not have many concurrent write requests, the datastore does not lock models or even whole collections and does not have a classical transaction model. Instead, with every call to one of the reader's methods the current position of the datastore is returned (meaning the global maximum of all positions). If locking of fields, models or collections is needed, this always happens after some data has been requested from the datastore [citation needed], so the returned position can be send as a locking indicator in the write request.

Locking

In every write request, the locked_fields parameter can be given to indicate what to lock and on which position the underlying information was acquired. Suppose I want to fetch the model c/1 from the datastore, update its value based on the current value and then write it back. The get response could look like this:

{
    // ...
    "value": 100,
    "meta_position": 42,
    // ...
}

I want to add 100 to value and then write it back to the datastore. After building the rest of the update event, I can add the following:

{
    // ...
    "locked_fields": {
        "c/1/value": 42
    }
}

where 42 is the position returned from the get request. If there was no event after the get request which modified the field c/1/value, this request can just be executed since the value must still be 100. If the value was modified in the meantime, the request is rejected with an ModelLockedException and has to be restarted by the client.

The keys of the locked_fields object can be fqfields, fqids or whole collections.

Null values

To simplify communication, the datastore assumes null === undefined. This means that writing null to any field or model equals the deletion of it. As a consequence, the datastore will never return null values on any read request.

PostgreSQL backend limitations

The current implementation of the datastore uses a PostgreSQL backend as its database. For better indexing purposes, the keys (collections, ids and fields) are stored with a fixed length instead of a variable one. The following maximum length restrictions apply when using the PostgreSQL backend:

collection: 32
id: 16
field: 207

Longer keys will be rejected with an InvalidFormat error.

Potential Optimizations

See Potential Optimizations.

Interface

Enum EventType {
    Create,
    Update,
    Delete,
    Restore,
}

Exception ModelDoesNotExist(model: Fqid);
Exception ModelExist(model: Fqid);
Exception ModelNotDeleted(model: Fqid);
Exception ModelLocked(key: (Fqid | Fqfield | CollectionField)[]);
Exception InvalidFormat(msg: string);
Exception InvalidRequest(msg: string);
Exception InvalidDatastoreState(msg: string);

// Note: Error returns via HTTP 400:
interface ErrorResponse {
    error: InvalidFormatData |
        InvalidRequestData |
        ModelDoesNotExistData |
        ModelExistData |
        ModelNotDeletedData |
        ModelLockedData |
        InvalidDatastoreState;
}
interface InvalidFormatData {
    type: 1;
    msg: string;
}
interface InvalidRequestData {
    type: 2;
    msg: string;
}
interface ModelDoesNotExistData {
    type: 3;
    fqid: string;
}
interface ModelExistData {
    type: 4;
    fqid: string;
}
interface ModelNotDeletedData {
    type: 5;
    fqid: string;
}
interface ModelLockedData {
    type: 6;
    keys: string[];
}
interface InvalidDatastoreState {
    type: 7;
    msg: string;
}
interface DatastoreNotEmpty {
    type: 8;
    msg: string;
}

// Writer
// Note: Different host and port than the reader!

/**
 * Writes Events into the datastore.
 * If multiple WriteRequests are given, they are fully executed one-by-one, meaning
 * if a earlier event invalidates the locked_field of a later WriteRequest, an
 * exception is thrown.
 * Sets `information` to null, if it is an empty value ([], {}, "", 0, false)
 * Url: POST to /internal/datastore/writer/write
 *
 * @throws ModelDoesNotExist
 * @throws ModelExists
 * @throws ModelLocked
 * @throws InvalidFormat
 * @throws ModelNotDeleted
 * @throws InvalidDatastoreState
 * @throws DatastoreNotEmpty when a migration index is given, if the datastore is not empty.
 */
write(request: WriteRequest | WriteRequest[]): void publishes ModifiedFieldsEvent

interface WriteRequest {
    events: (CreateEvent | RestoreEvent | UpdateEvent | DeleteEvent)[];
    information: JSON;
    user_id: number;
    locked_fields: {
        <fqid>: Position;
        <fqfield>: Position;
        <CollectionField>: Position | CollectionFieldLock | CollectionFieldLock[];
    }
    migration_index?: number;
}

interface CreateEvent {
    type: 'create';
    fqid: Fqid;
    fields: {
        <field>: Value;
    }
}

/**
 * Note: For deleting keys, they must be set to `None`. These keys will be removed from
 * the model.
 * list_fields can be used to partially update list fields: the values in `add` will be
 * appended to the given field, the values in `remove` will be removed from the field.
 * Either fields or list_fields must be given or an error will be thrown.
 * An exception will be thrown if:
 * - a field in list_fields is not empty and not a list
 * - a field in list_fields contains other entries than strings or ints
 * Other edge cases:
 * - an element should be added that is already in the list: this element is ignored,
 *   other potentially given elements are still added as normal
 * - an element should be removed that is not in the list: this element is ignored,
 *   other potentially given elements are still removed as normal
 * - the field does not yet exist on the model:
 *      - add: same function as if the value was given in `fields`
 *      - remove: nothing happens
 */
interface UpdateEvent {
    type: 'update';
    fqid: Fqid;
    fields: {
        <field>: Value;
    }
    list_fields: {
        add: {
            <field>: Value[];
        }
        remove: {
            <field>: Value[];
        }
    }
}

interface RestoreEvent {
    type: 'restore';
    fqid: Fqid;
}

interface DeleteEvent {
    type: 'delete';
    fqid: Fqid;
}

// Collection fields can not only be locked to a specific position, but also filtered
// first, e.g. when selecting all models from a specific meeting. WARNING: the filter
// should always contain an equals check with the meeting_id, since this will be
// indexed. Other filters can lead to long query times.
// If no filter is given, it has the same meaning as just giving the position.
interface CollectionFieldLock {
    position: Position;
    filter: Filter | null;
}

// Note: The modified fqfields include:
// - all updated fqfields
// - all deleted fqfields
// - all fqfields of all deleted models
// - all fqfields of all created models
// - all fqfields of all restored models (logically the same as created)
Event ModifiedFieldsEvent on topic ModifiedFields {
    modified: Fqfield[];
}

/**
 * Reserves multiple sequential ids for the given collection und returns them.
 * Url: POST to /internal/datastore/writer/reserve_ids
 */
reserveIds(collection: Collection, amount: number): Id[]

/**
 * Deletes all history information from all positions. Use with caution, because this will
 * make it impossible for the end user to access these positions!
 * Url: POST to /internal/datastore/writer/delete_history_information
 */
delete_history_information(): void


// Reader
// Note: Different host and port than the writer!

/** Common notes:
 * - parameter `position`: Optional, if given reads the data to this position.
 * - parameter `mapped_fields`: List of fields that should only be present in the response.
 * - parameter `get_deleted_models`: Optional, defines which models to return
 *    - DeletedModelsBehaviour.NO_DELETED:   (Default) only non-deleted models are returned.
 *                                           get throws a ModelDoesNotExist error if the given
 *                                           model is deleted.
 *    - DeletedModelsBehaviour.ONLY_DELETED: only deleted models are returned. get throws
 *                                           a ModelNotDeleted if the given model is not deleted.
 *    - DeletedModelsBehaviour.ALL_MODELS:   all models are returned
 * - All operations adds the fields `meta_position` and `meta_deleted` to the models.
 * - The InvalidFormat exception can always be thrown, if the requested formats are
 *   wrong, including something like empty collections, ...
 */

Enum DeletedModelsBehaviour {
    NO_DELETED = 1,
    ONLY_DELETED = 2,
    ALL_MODELS = 3
}

/**
 * Returns a model by fqid.
 * Url: POST to /internal/datastore/reader/get
 *
 * @throws ModelDoesNotExist
 * @throws InvalidFormat
 */
get(fqid: Fqid, mapped_fields?: Field[], position?: Position, get_deleted_models?: DeletedModelsBehaviour): Partial<Model>;

/**
 * Returns multiple models.
 * Url: POST to /internal/datastore/reader/get_many
 *
 * Can either be called with a list of Fqfields or with a list of specific request
 * objects that map a collection to the needed ids and fields. If both the lower and
 * the higher level mapped_fields are given, the higher level one is merged into all
 * lower level ones. If Fqfields are given, the mapped_fields are ignored.
 * If an id is not found, it is not included in the response instead of throwing a
 * ModelDoesNotExist.
 *
 * @returns A mapping of collection to ids to models. Example:
 *          {
 *              "collection1": {
 *                  "id1": {
 *                      "field1": "foo",
 *                      "field2": "bar",
 *                  },
 *              },
 *              "collection2": {
 *                  "id2": {
 *                      "field3": 42,
 *                  },
 *              },
 *          }
 *
 * @throws InvalidFormat
 */
get_many(requests: GetManyRequest[] | Fqfield[], mapped_fields?: Field[], position?: Position, get_deleted_models?: DeletedModelsBehaviour): Map<Collection, Map<Id, Partial<Model>>>;

interface GetManyRequest {
    collection: Collection;
    ids: Id[];
    mapped_fields?: Field[];
}

/**
 * Returns all models of one collection.
 * Url: POST to /internal/datastore/reader/get_all
 *
 * It is not possible to specify a position, so this method cannot be used if the user
 * browses the history. It should be noted that it is highly disencouraged to use this
 * method because it might return a huge amount of data.
 *
 * @returns see get_many
 * @throws InvalidFormat
 */
get_all(collection: Collection, mapped_fields?: Field[], get_deleted_models?: DeletedModelsBehaviour): Map<Id, Partial<Model>>;

/**
 * Returns all models.
 * Url: POST to /internal/datastore/reader/get_everything
 *
 * This is a dev route only!
 *
 * @returns The example data format: A mapping of a collection to a list of models.
 */
get_everything(get_deleted_models?: DeletedModelsBehaviour): Map<Collection, Map<Id, Model>>;

interface FilterResponse {
    position: Position;
    data: Map<Id, Partial<Model>>;
}

/**
 * Returns all models of one collection that satisfy the filter condition.
 * Url: POST to /internal/datastore/reader/filter
 *
 * The global max position of the datastore is returned next the the filtered data.
 * This method does not take a position and can not be used when browsing the history.
 *
 * @returns see get_many
 * @throws InvalidFormat
 */
filter(collection: Collection, filter: Filter, mapped_fields?: Field[]): FilterResponse

/**
 * Url: POST to /internal/datastore/reader/exists
 *
 * See `filter`, returns true, if at least one model was found. The returned position is
 * the highest position in the complete datastore.
 *
 * @throws InvalidFormat
 */
exists(collection: Collection, filter: Filter): {exists: boolean; position: Position;}

/**
 * Url: POST to /internal/datastore/reader/count
 *
 * See `filter`, returns the amount of found models. The returned position is
 * the highest position in the complete datastore.
 *
 * @throws InvalidFormat
 */
count(collection: Collection, filter: Filter): {count: number; position: Position;}

/**
 * Url: POST to /internal/datastore/reader/min
 *
 * Executes a min aggregation on all models of one collection on
 * the given field that satisfy the filter condition.
 * The field is cast to int by default. If aggregation of another field type is needed,
 * a valid type can be passed via the type parameter.
 *
 * @throws InvalidFormat
 */
min(collection: Collection, filter: Filter, field: Field, type?: string): {min: Value; position: Position;}

/**
 * Url: POST to /internal/datastore/reader/max
 * Analogous to min.
 *
 * @throws InvalidFormat
 */
max(collection: Collection, filter: Filter, field: Field, type?: string): {max: Value; position: Position;}

Type Filter = And | Or | Not | FilterOperator

/**
 * The filter predicate. M[field] states the value of the field of a model M.
 * For all operations the predicate if true, if `M[field] <op> value` is true.
 *
 * In this case, `~=` is the case-insensitive equality operator and `%=` translates
 * to the Postgres `ILIKE` operator with support for the wildcards `_` and `%`. 
 */
Interface FilterOperator {
    field: Field;
    value: Value | null;
    operator: '=' | '!=' | '<' | '>' | '>=' | '<=' | '~=' | '%=';
}

Interface Not {
    not_filter: Filter;
}

Interface And {
    and_filter: Filter[];
}

Interface Or {
    or_filter: Filter[];
}

/**
 * Url: POST to /internal/datastore/reader/history_information
 * Returns a list of HistoryInformation for the provided fqids.
 * If one models does not exist it is not included in the response.
 *
 * The returned timestamp is the unixtime as a number.
 *
 * @throws InvalidFormat
 */
history_information(fqids: Fqid[]): {[fqid: Fqid]: HistoryInformation[]}

Interface HistoryInformation {
    position: Position;
    user_id: number;
    information: JSON;
    timestamp: number;
}
⚠️ **GitHub.com Fallback** ⚠️