Autoupdate Service - OpenSlides/OpenSlides GitHub Wiki

About the Autoupdate

Detailed explanation about how the autoupdate works and why.

Interface

/**
 * SyntaxError is returned, when the syntax of the request body is wrong.
 * This error is returned on the beginning of a request with http-status-code
 * 400.
 */
Exception SyntaxError(msg: string);

/**
 * JsonError is returned, when the body does not contain valid json. This error
 * is returned on the beginning of a request with http-status-code 400.
 */
Exception JsonError(msg: string);

/**
 * ValueError is returned, when the value of a field does not have the expected
 * format. E.g. there is an indicated relation to a key, but the data are no
 * foreign ids/fqids. The exception may happen, if the stream is used at
 * runtime, because this cannot be detected when the caller makes the request.
 */
Exception ValueError(msg: string);

/**
 * InternalError is an unexpected error on the server side. When it happens at
 * the beginning of a request, an http-status-code 500 is used. But it can also
 * happen after the first data have been streamed to the client. The error does
 * not contain any useful information. More information can be found in the
 * server log. This is the only error that generates a server log message.
 */
 Expection InternalError(msg: string);

/**
 * This methods subscribes to a list of given request. The response is a stream
 * (language dependent) updating all models according to the ModelRequest if new
 * data is available. On subscription, initial data must be pushed to the caller
 * as soon as possible. The stream can be closed by closing the stream (e.g. the
 * underlying network connection).
 *
 * @throws SyntaxError
 * @throws JsonError
 * @throws ValueError
 * @throws InternalError
 */
subscribe(request: ModelRequest[]): stream<ModelData>;

/**
 * This is the main interface for requesting models in a structured, nested way.
 * The initial request targets some models as the root models of one collection
 * with all the same fields. This build a tree of dependencies, because the
 * value of some fields may be a GenericRelationFieldDescriptor again.
 *
 * For a description of `fields` and `collection`, see
* GenericRelationFieldDescriptor and RelationFieldDescriptor.
 *
 * `ids`: This is a list of ids for a collection, that should be provided. All
 * models, that the user can see, must be included in the response. The model
 * fields are handled according to `GenericRelationFieldDescriptor`.
 */
interface ModelRequest extends Fields {
    ids: ID[];
    collection: Collection;
}

interface Fields {
    fields: {
        [field: Field]: GenericRelationFieldDescriptor
            | RelationFieldDescriptor
            | null;
    }
}

/**
 * For an overview, see `ModelRequest`.
 *
 * `fields` (inherited from `Fields`, see above):
 * Regardless of the value of a field, the restricted values are given in the
 * response. If the restricted value is null, the field must be included here.
 *
 * If the value is not null, it is indicated, that there is a reference to
 * follow. There are three types of values:
 * - GenericRelationFieldDescriptor: The reference is a generic one. This means,
 *   that the actual value from the model is a fqid or an array of fqids.
 * - RelationFieldDescriptor: A collection is given, so it can be expected, that
 *  the actual model value is an id or an array of ids.
 */
interface GenericRelationFieldDecriptor extends Fields {
    type: 'generic-relation' | 'generic-relation-list';
}

/**
 * For an overview, see `ModelRequest`. For `fields`, see
 * GenericRelationFieldDescriptor.
 *
 * `collection`:
 * This is the collection, the ids are associated to. The ids are provided with
 * two different ways:
 * - In a ModelRequest, the ids are given.
 * - If this interface is used in a field indication a relation, the id(s) are
 *   given by the actual model data.
 */
interface RelationFieldDescriptor extends Fields {
    type: 'relation' | 'relation-list';
    collection: Collection;
}


/**
 * This structure holds all data given by the called service as a map of
 * fqfields to the fields values.
 */
interface ModelData {
    [fqfield: Fqfield]: Value;
}

The cmd folder:

Creates a binary from each subfolder

cmd/autoupdate:

Builds the actual service. In addition, the package takes over the role of dependency injection. Individual packages expect abstract interfaces as dependencies. Here, specific implementations are used, i.e. redis as a message bus.

cmd/datastore:

This is a "mock datastore" which only supports the get_many function and only in the variant with the Fqfield. The "example data" is delivered. The keys can be changed using stdin. The changed keys are then written into the redis-message-bus.

cmd/performance:

A small debug tool which measures the speed when 5000 clients connect and retrieve a key.

The internal folder:

Contains packages that are only relevant for this repository (service).

internal/autoupdate:

Actual business logic. Processes single client connections. The requested key value pairs are restricted and then returned. If the data changes, the corresponding connections are informed so.

internal/http:

Defines the http-handlers (currently three). The user-id is read from the requests and the request is passed to the internal/autoupdate package.

internal/keyscreens:

The query language of the autoupdate service is defined here.

internal/projector:

This is where the projection/content field is computed.

internal/projector/slides:

The individual slides used to write projection/content are defined here.

internal/restrict:

Here the restrictor is defined. All data and a user ID are passed to it before the data is sent to the client. The restrictor removes and/or modifies data.

internal/restrict/permission:

Contains the code of the old permission-service. Currently it is decided per collection if a fqField may be seen by a user-ID. The package is deprecated. It should get a new API and be integrated into the internal/restrict package.

The pkg folder:

Contains packages that are also relevant for other services (icc, vote, etc).

pkg/auth:

An implementation of our auth system in Go. It reads the user ID from a request and validates the session. If the session has expired, renews it. In addition, the message bus is read if a user logs out. A go-context is returned which is closed when the user logs out.

pkg/datastore:

A connection to the datastore. Only the datastore method get_many with fqfields is used. Chaches retrieved keys. A request to the datastore therefore only occurs when a fqfield is retrieved for the first time. The packet is initialized with the message bus. Updates the cache if data, coming over the message bus, is changed. Other packages can subscribe to changes. This way, the internal/autoupdate package is informed when data changes. Calculated fields can also be registered. This way the internal/projector package registers that requests to projection/content are not sent to the database but are calculated. The cache is still WIP.

pkg/redis:

This is the implementation of the message bus on redis. It listens to the redis stream and passes each message to pkg/datastore.

How it works:

Connection of a client:

When establishing a connection, this middelware is traversed first, then this handler is called.

Firstly, the user-ID is read from the request. Then the request body is passed to the keysbuilder. A keysbuilder object is created that tells us which keys/fqFields were retrieved.

This object is passed to the autoupdate.Live function:

This function creates a "Connection" object where conn.Next() is called in an infinite loop and the return value (a map[string]json.RawMessage) is sent to the client in json-line format.

conn.Next() calls the keys from the keyscreen, pulls the data from the datastore to the keys and passes the data to the restrictor. At the end the data goes through a filter which remembers which data was sent to a client last. Keys that have not changed are removed. The function blocks until there is data that needs to be sent to the client.

The query in the datastore happens in this function:

The cache.GetOrSet() function is used to directly retrieve the keys that are already in the cache and for the other keys the passed set function is called.

This set function sorts the requested keys into those that are retrieved from the datastore and those that are calculated (currently only projection/content). The normal keys are retrieved from the datastore reader here:

Autoupdtes/data changes:

When data changes, the datastore-write writes it to a redis stream, which is retrieved here:

The redis command is blocking, for the connection to redis is kept alive until there is new data. The function is called in an infinite loop from the datastore in a background job. The background job runs only once, so it does not run once per client:

The keys/fqfields and the corresponding values are returned.

The data is written to the cache with SetIfExist. Therefore, data that is not in the cache (because it has never been called) will be ignored. After that, the "calculated fields" are recalculated (currently only projection/content).

After, the change listener are called. This is a list of callbacks to which the modified data is passed. This is the following function in the package internal/autoupdate

Finally, all keys are written to a topic.Topic. This is a pub-sub data structure that works on the pull principle that is also used by Redis-Stream. A consumer (the function above) writes the data into the topic. The client connections read the data from it.

So far, this was a part of a go routine and therefore a single-thread. Only at this point all client connections are "woken up", which then continue to process the data in parallel.

The code is inside the connection object described above. At this point, the keysbuilder-object is recalculated. Afterwards it is determined which new keys a client wants to get and which old ones have changed. For these, the data is pulled from the datastore (also described above), passed to the restrictor and then filtered so that actually only changed data is sent to the client (above).

All of this happens event-based. There is no component that polls at time intervals. All waiting functions (in Redis, as well as per client in topic.Receive) wake up immediately when the data is available.

Other background tasks:

Here old data is deleted from the topic. Every minute all keys older than 10 minutes are deleted from the topic. This frees up memory, but would be a problem if the connection to a client should be so slow that it hangs behind for more than 10 minutes.

Currently, all data is deleted from the cache every 10 seconds. This value should be set much higher later, e.g. every 24 hours. It is currently so small to find possible race conditions. The point is that keys that have been retrieved once but are then no longer needed will eventually be removed from the cache. However, this system might be replaced with another one that automatically deletes old keys.

The Auth-Service writes to the message bus when a session has been terminated. From the system it works the same way as the events from the datastore writer. At this point, the events are queried and the corresponding sessions are closed. Technically, this also works via a topic, from which all data older than 15 minutes is deleted every 5 minutes (a session expires after 10 minutes anyway).

⚠️ **GitHub.com Fallback** ⚠️