DataModel_01Tree - 1and1/soma GitHub Wiki

The SOMA data model - 1. The Tree

SOMA is a tree based configuration system. A tree can contain the following object types:

  1. Tree
  2. Repository
  3. Bucket
  4. Group
  5. Cluster
  6. Node

As a rule, every object in a SOMA tree in a stable state has either:

  1. No parent object if it is the root Tree object
  2. Exactly 1 parent object in all other cases

Objects can switch to a floating state with 0 parents during tree manipulation. This can not be persisted into a stable state, causing the entire tree operation to fail and rollback.

Tree

The root of the tree. It functions mainly as an internal management object. It can have only one child of type repository.

Tree objects act as a concurrency border, with each running in separate go routines.

Repository

The user visible top of the configuration tree. Mostly a metadata container for information such as which team this repository belongs to. The repository can not contain assets that do not belong to that team as well. Teams are not limited in the number of repositories they have.

Repositories are also permission borders for per-repository permissions.

Repositories can only be attached to a tree. Which they are, automatically.

Repositories can have an arbitrary number of children of type bucket.

Repository names are globally unique and must be at least 4 characters long.

Bucket

Buckets are an artifical sharding layer below repositories. Their main purpose is to:

  1. combine a repository with an environment
  2. exist below a repository so that inheritance can start above where environments are defined (more on inheritance later)

From a data organization standpoint, they can be thought of as branches inside the repository; analogue to a puppet environment defined via a git branch in a control-style puppet git repository.

Bucket names are globally unique and must begin with the repository name followed by an underscore. Examples could be:

  1. foobar_master
  2. foobar_default
  3. foobar_dev
  4. foobar_reallylongnamethatisbadtotype

Be aware that the naming has no influence on the environment the bucket is considered to be in, ie. one can create a bucket called foobar_qa with an environment definition of prelive.

Buckets can only be attached to a repository.

Buckets can have an arbitrary number of children of the following types:

  1. group
  2. cluster
  3. node

Objects that are direct children of bucket are in state standalone since no tree organisation has been performed by the user. The object simply could not have been dumped higher into the tree.

Group

Groups are the general-purpose grouping objects in SOMA.

They can be attached as children to the following types:

  1. bucket
  2. group

They can have the following types of children:

  1. group
  2. cluster
  3. node

Groups can not form very small circles by attaching as children to themselves.

Group names are unqiue per bucket. This gives far wider flexibility but has the disadvantage that the bucket has to be specified to uniquely address a group.

Cluster

Clusters are the special-purpose grouping objects in SOMA.

They can be attached as children to the following types:

  1. bucket
  2. group

They can have the following types as children:

  1. node

Cluster names are unique per bucket. This has the same benefits and drawbacks as for groups.

Only allowing node objects as children allows for simpler mechanics when checks are created on cluster objects. Getting the number of children does not require a tree traversal, and the children are guaranteed to be of type node, not possibly some meta grouping object.

Node

Nodes are SOMA's objects that represent a specific execution environment. This can be:

  1. full OS installation on bare metal
  2. full OS installation inside a hypervisor virtualization
  3. full OS installation inside a kernel virtualization (container)
  4. reduced runtime inside a kernel virtualization (container)
  5. some unikernel/rumpkernel weirdness
  6. it really does not matter

It is just some uniquely identifiable and enumerated computational node.

One or more nodes run on a hardware server. Layers in-between are not tracked by SOMA. The link to the hardware server is only maintained to connect the node to the physical datacenter location. Otherwise hardware relations are of little interest for SOMA.

For cloud deployments, this can be adapted to create a datacenter per region, and then one server for every availability zone within that region.

Node names are globally unique since SOMA is not designed as the primary asset tracking source-of-truth. This means that node information is imported into SOMA and node objects therefor can exist without being part of a repository, which for example bucket, group and cluster objects can not.

For this reason node objects have to be assigned to a bucket, whereas a cluster object is created inside a bucket.

Nodes can be attached as children to the following types:

  1. bucket
  2. group
  3. cluster

Nodes can themselves not have any children. They are guaranteed to be tree leaves.

ASCII Graph

Given the above information, the following crude ASCII graph displays all possible parent/child relations that can be formed in SOMA.

tree
|
repository
 \
  bucket
       |\
       | group
       |      \
       |       group
       |           |\
       |           | cluster
       |            \      |\
       |             node  | node
       |                    \
       |                     node
       |\
       | cluster
        \      |\
         node  | node
                \
                 node

Opt-in

It is not required to model a tree in SOMA. While the tree, repository, bucket chain is required, of these three the bucket is the one with the significant user interaction.

It is perfectly acceptable to simply throw nodes into a bucket and call it a day.