Dataset Concepts - dsriseah/ursys GitHub Wiki

note: this is more of a design doc that tracks the evolution of the system

New to URSYS in 2024 is the Dataset architecture for managing a set of related data collections under one address.

A Dataset consists of several DataBins that have a unique binID name and a binType. A client application can request a particular Dataset by specifying its dataURI and authentication credentials. After the dataset request succeeds, the client can then perform CRUD, Search, and Query operations on any given DataBin by specifying its binID and operation.

The API for Datasets is exposed through the modules sna-dataclient.ts and sna-dataserver.mts. Synchronization is maintained from the server's instance of the loaded dataset with the client's copy. From the client's perspective, data mutations use a write followed by notify pattern that is accessible through an URSYS EventMachine pub/sub interface. Data reads, by comparison, use read from cached dataset which is guaranteed to be "up to date" at time of request.

Core Architecture

New as of March 17, 2025

Note

This is the reworked dataset architecture, taking the newly clarified idea of a layered schema metadata architecture into account for defining properties in settings. It further defines the high-level relationship of _schemaID and _storageMap properties to a dataset.

Core Concepts

  • datasets that are defined by a dataURI, _schemaID, and _storageMap that distinguishes between different kinds of schemas: entity, collection, manifest, etc. It is a collection of nested data objects.
  • schema registry is a module that stores/loads schema definitions
  • storage map registry is a module that manifest files by _storageMap
  • dataset registry is a module that manages the "live dataset collection" consisting of different named collections of objects (aka a data-group), loading/saving datasets from the dataURI and the selected _storageMap.

Supporting Concepts

  • base schema is a data structure defines groups of named properties with a type, which is defined as YAML. These schemas are identified by a _schemaID
  • object mapper is a module that serializes/deserializes data objects. It also validates data objects based on a schema defined schema registry.
  • storage mapper is a module that loads/saves serialized data objects to the dataURI according to the _storageMap in use, using a storage adapter
  • storage adapter is a platform-specific module that converts the dataURI into an addressable persistance object (e.g. a folder of files in a filesystem, or an AWS thingy)
  • UI renderer is a module which creates a property editor that is capable for combining layered metadata on top of the base schema, reading/writing data objects according to the schema. Our new system is based on generated web components.

Additionally, the client-server architecture imposes additional modules

  • server dataset manager - this is the registry that serves clients request a particular dataset providing dataURI with optional _schemaID, and _storageMap. The server can implement various dataset protocol listeners to handle different connection types. The default protocol listener is URSYS messaging SYNC:SRV_DSET and SYNC:SRV_DATA over a web socket.
  • client dataset manager - this is a mirror of the server version, capable of running independently in the browser or synchronized via a dataset protocol connector. The default protocol connector uses URSYS messaging, dispatching SYNC:SRV_DSET and SYNC:SRV_DATA over a web socket, and receiving synchronization state via SYNC:CLI_DATA from the web socket.

NOTE: The connection is authenticated based on the URSYS connection authentication protocol followed by the user credentials that determine permissions and access control, which is a separate system but is expected to be checked by the server dataset manager

--- REFERENCE MATERIAL FOLLOWS ---

Component: Dataserver

source: sna-dataserver.mts

The dataserver is a SNA Component that receives URSYS SYNC messages for "whole dataset" and "data CRUD+Query" operations from multiple dataclients on the web. The dataserver is the single source of truth for data; client-based data operations are synchronized to and from the dataserver in normal sync mode.

Dataserver Configuration

  • The PreConfig object must contain runtime_dir which is used to resolve the "file bucket" address where all runtime data is stored.
  • The dataclient must execute the SELECT DATASET command to load the data from permanent storage before any of the API methods work

Dataserver Message API

  • SYNC:SRV_DSET handles "whole dataset" operations declared in DatasetOp on a provided dataURI, which is sent from a dataclient's remote adapter. At the time of this writing, the operations are LOAD, UNLOAD, PERSIST, GET_MANIFEST and GET_DATA
  • SYNC:SRV_DATA handles "databin CRUDQ" operations declared in DataSyncOp on a provided optional dataURI; data operations default to the "selected dataset". Current operations are CLEAR, GET, ADD, UPDATE, WRITE, DELETE, REPLACE, FIND and QUERY.

Dataserver Direct API

In addition to the message-based API, DataSet provides direct API methods. These are currently unused, but are provided for future server-side dataset access.

  • LoadDataset()
  • CloseDataset()
  • PersistDataset()
  • OpenBin()
  • CloseBin()

Dataserver Extensibility

The SYNC:SRV_DSET and SYNC:SRV_DATA protocols are designed to be independent of the filesystem, but dataserver is a default implementation that uses a default dataobject adapter to serialize/deserialize data to the filesystem.

Component: Dataclient

source: sna-dataclient.ts

The dataclient is a SNA Component that sends URSYS SYNC messages to select datasets to use and perform "data CRUD+QUERY" operations on the various databins in the dataset. It also receives update messages when the dataserver updates.

Dataclient Configuration

  • The PreConfig object must have a dataset property containing uri (providing a dataURI that is sent to the server) and mode (default 'sync' to enable two-way synchronization)

The dataclient otherwise initializes itself through its built-in PreHook declarations to call Configure() and Activate() during the application startup cycle, making use of the PreConfig object parameters.

(TBD) Dataclient Authentication

The SYNC:SRV_DSET and SYNC:SRV_DATA protocols for data operations will rely on an access token that is derived from the URSYS authentication token. Currently, though, this support is only stubbed-in and ignored by dataserver. The idea is that once a web app using dataclient is logged-in, the SELECT DATASET operation will negotiate the handshake.

Dataclient Direct API

There is no message API that is exposed for users, as the direct API is suitable. Behind the scenes, however, the dataclient sends SYNC:SRV_DSET and SYNC:SRV_DATA messages to talk to the dataserver and receives SYNC:CLI_DATA messages to synchronize its dataset instance.

The following methods assume that the selected dataset was set through PreConfig as dataURI

  • Get(binID, ...)
  • Add(binID, ...)
  • Update(binID, ...)
  • Write(binID, ...)
  • Delete(binID, ...)
  • DeleteIDs(binID, ...)
  • Replace(binID, ...)
  • Clear(binID, ...)
  • Find(binID, ...)
  • Query(binID, ...)
  • Subscribe(evt, callback)
  • Unsubscribe(evt, callback)

In the default case syncMode=='sync', these calls are routed to the dataserver but changes are not applied locally until the server sends a SYNC:CLI_DATA message. As all web apps using the dataclient module implement this message, this ensure that everyone receives the same change.

(TBD) Accessing Other Datasets through DataClient

In the future it will be possible to access read-only datasets by specifying syncMode=='sync-ro', but in the meantime it's possible to use DatasetAdapter.getDataObj(dataURI) to request the dataset object associated with the dataURI, subject to (TBD) access control.

The DataClient also specifies two implemented methods:

  • async DS_RemoteFind(dataURI, binID, matchCrit?) to find matching items in specified dataURI/binID
  • async function DS_RemoteQuery( dataURI, binID, query?) to query matching items and return them in a RecordSet.

The latter is intended to be used for accessing read-only shared data across the dataset server, subject to (TBD) access controls.

Dataset Portability

The SNA dataset client/server is a default implementation built on top of a protocol that assumes the following:

  • a dataURI that uniquely identifies a set of named dataBins (collections), composed of several platform-independent serialized and persisted stored contents.
  • a set of common CRUD+QUERY operations based on DataObjects with an _id field.
  • a message protocol that maps to the operations for managing datasets (DatasetOp) and the collections within (DataOp)
  • a write through cache implementation that synchronizes the master dataset on the server with multiple client dataset mirrors that are updated behind-the-scenes
  • a notification system through which clients can be aware of changes that were made behind-the-scenes
  • an authentication and access token system to detemine access
  • the Dataset class holds the collection
  • the abstract-class-databin class is the ancestor providing the common set of CRUD+QUERY operations across multiple collection types
  • the Dataset designated by a dataURI has an associated manifest object maps dataBins by name to their platform-dependent storage locations

While the default protocol is URSYS messaging, the actual implementation of the above protocol is up to the developer. In our case, sna-dataserver.mts and sna-dataclient.ts is our default implementation of the dataset concept, and we've approached it as follows:

  • the dataset and data operations are conducted over URSYS with the SYNC:SRV_DSET, SYNC:SRV_DATA, and SYNC:CLI_DATA messages
  • the dataURI is mapped to a directory on the server's filesystem
  • the dataset collections are serialized as JSON files in the dataURI's associated directory

These functions have been isolated into adapters:

  • sna-dataclient makes use of an object using the interface IDS_DatasetAdapter which provides selectDatabase(), getDataObj() and syncData() methods. The default implementation uses URSYS messaging and stores accessToken, sending SYNC:SRV_DSET, SYNC:SRV_DATA, and receiving SYNC:CLI_DATA.
  • sna-dataserver implements separate URSYS message handlers for SYNC:SRV_DSET andSYNC:SRV_DATA, sending SYNC:CLI_DATA whenever the master Dataset instance is updated.
  • sna-dataserver implements the interface IDS_DataObjectAdapter which is the bridge between the filesystem and data objects that represent the persisted data of a dataset. The abstract base class abstract-dataobj.adapter.ts is extended by the default implementation sna-dataobj-adapter.mts, providing methods getDatasetInfo(), readDatasetObj(), readDataBinObj(), writeDatasetObj() and writeDataBinObj().

Note

The server-side message handler could be considered an "adapter", but as the entire sna-dataserver.mts is a default implementation this handler isn't broken-out into its own thing like ProtocolAdapter. The DataObjectAdapter, however, could be replaced in this dataserver implementation to use a different storage backend.


ToDo List

@run-sna.mts

  • uses SNA to build a project directory
  • SNA.Build() scans for .mts files for server
  • SNA.Build() scans for .ts files for web client

server.mts

  • loads SNA server module
  • invokes SNA.Start()

sna-node.mts

  • registers sna-dataserver SNA_MOD
  • implements SNA_Start() export as Start()

sna-dataserver

  • exports SNA_MOD PreHook for 'EXPRESS_READY'
  • handles 'SYNC:SRV_DATA' and 'SYNC:SRV_DSET'
  • receives dataURI from configured project info
  • can load or generate manifest from dataURI
  • can initialize dataset from manifest
  • can manage multiple datasets loaded
  • can load dataset object from disk
  • can serialize dataset object to disk
  • implements DATASET and DATA operations through protocol

app.ts

  • imports SNA client module
  • prompts for login creds
  • successfully authenticates websocket
  • receives dataURI, authToken from appserver
  • saves session dataURI, authToken, userID
  • registers dc-comments as an SNA_MOD
  • invokes SNA.Start() to kick stuff off

sna-web.ts

  • registers sna-dataclient as an SNA_MOD
  • hook DOM_READY for SNA_NetConnect()
  • hook NET_CONNECT for hot module reloading
  • implements SNA_Start() export as Start()

sna-dataclient

  • exports SNA_MOD PreHook for 'NET_DATASET'
  • can get session dataURI
  • can submit session authToken to request dataset
  • can determine syncmode from response
  • can initialize dataset instance from getDataset()
  • handle SYNC:CLI_DATA protocol conditionally syncmode
  • provides means to select dataset by dataURI
  • provides CRUDQ interface for dataset and its databins
  • provides eventmachine notify pubsub

dc-comments

  • exports SNA_MOD PreHook for 'LOAD_DATA'
  • exports SNA_MOD PreConfig to receive GlobalConfig
  • select dataURI and open 'comments' ItemList
  • provide methods for performing comment data manipulation
  • provide eventmachine notify pubsub
⚠️ **GitHub.com Fallback** ⚠️