CheckDeploymentWorkflow - 1and1/soma GitHub Wiki
SOMA Check Deployment Workflow
User step, check configuration
A unsuspecting user uses somaadm checks create to send a
check_configuration to SOMA. This check_configuration is the entity
that users can interact with, all other derived objects are internally
managed.
Internal step, job phase: check creation
The user request is saved as a pending job in the database and
acknowledged to the user 202/Accepted. It is then put into the
appropriate job queue and processed asynchronously.
Based on the specification in the check_configuration a check is
created on the selected object of the tree. If inheritance was true
and the object has children, the check creation request is passed down
and every child object creates a check as well.
Internal step, job phase: check instance creation
Every tree object evaluates all its checks and their constraints and
creates the appropriare arbitrary amount of check_instances. Together
with the check_instance, a check_instance_configuration is also
created in state awaiting_computation. Check instance configurations
are versioned and a check instance can have multiple configurations.
Internal step, job phase: check instance configuration computation
For every check_instance_configuration in state awaiting_computation
the deployment details are assembled. The state transitions into
computed.
Internal step, job phase: check instance configuration ordering
If a check_instance_configuration is the first configuration for the
instance, it transistions into state awaiting_rollout and the check
instance is updated with the id of its current configuration.
The update_available flag is set.
If a previous configuration was found, that configuration is loaded and the deployment details of both are compared. If the new version is the same as the current one, the new configuration is discarded. This deep compare ignores:
- values that must be different, ie. the check instance configuration id and the version number
- array element order, i.e.
[a, b] == [b, a]is true
If a difference between the two versions was found, the new
configuration is moved into state blocked. The registered unblocking
condition is the old configuration in state deprovisioned.
The old configuration is transitioned into state awaiting_deprovision.
The update_available flag is set.
This ordering step means that SOMA never sends out an update deployment. If there is a change, the destination monitoring system first receives an undeployment of the exact same deployment details used for the deployment. Due to this, the deployment/undeployment on the client side can be completely stateless. It should also be order independent and idempotent.
This concludes the part of the workflow that is ran as part of the
add_check_to_${foo} user requested job.
Internal step, life cycle phase: ghost removal
This is the first step by the internal life cycle component that activates every 20 seconds.
It performs three tasks:
- configurations in state
awaiting_rolloutflagged as deleted with activeupdate_availableflag are transitioned directly toawaiting_deletionsince the destination monitoring system has not yet picked them up - configurations flagged as deleted in state
rollout_failedare transitioned directly toawaiting_deletionsince there is nothing to deprovision - configurations flagged as deleted in state
deprovisionedare transitioned toawaiting_deletion
Internal step, life cycle phase: remove blocked deleted
During this next step, all configurations in state blocked that belong
to a deleted check instance are transitioned directly to
awaiting_deletion and their registered unblocking condition deleted.
Internal step, life cycle phase: unblock configurations
This step is only executed if there was no error during the previous
remove blocked deleted step.
Every registered unblock condition is evaluated. If the condition is
true, the condition is deleted and the configuration transitioned to
either awaiting_rollout or awaiting_deprovision.
The update_available flag for the check instance is set.
Internal step, life cycle phase: active deletions
This step transitions all configurations flagged as deleted in state
active to state awaiting_deprovision and sets the update_available
flag on the instance.
Internal step, life cycle phase: poke
This steps takes all check instances with the update_available flag
set, that are provisioned on a monitoring system with a notification
callback registered. For every available check instance, the monitoring
system receives a poke on its callback.
The update_available flag is cleared if the poke was successful.
This step is the transition point where a check instance deployment leaves the SOMA application server.
External step, fetch deployment
Using the id received with the poke, the destination monitoring system fetches the deployment information from SOMA. This GET request has a side effect and transitions the workflow!
The following transitions can be triggered by request:
awaiting_rollout -> rollout_in_progressrollout_in_progress -> rollout_in_progressactive -> activerollout_failed -> rollout_in_progressawaiting_deprovision -> deprovision_in_progressdeprovision_in_progress -> deprovision_in_progressdeprovision_failed -> deprovision_in_progress
External step, deployment result
The destination monitoring system must, after processing the deployment request, send feedback about the deployment result. This transitions the check instances as follows:
Feedback: success
rollout_in_progress -> activedeprovision_in_progress -> deprovisioned
Feedback: failed
rollout_in_progress -> rollout_faileddeprovision_in_progress -> deprovision_failed
Polling step, list deployments
Monitoring systems that do not have a registered callback address, which requires a REST'ish service that can be contacted to be implemented, can poll SOMA for updates.
This request returns all instance ids that have the update_available
flag set and clears it. This means every deployment is only exactly once
part of of this list.
With this list of IDs, the destination monitoring system can fetch the deployments the same way as if it had received pokes for it.
Polling step, list all deployments
This request returns all instance ids with configurations in one of the
following states, regardless of the update_available flag. If the flag
is active, it is cleared.
awaiting_rolloutrollout_in_progressawaiting_deprovisiondeprovision_in_progress
This request can be used to resynchronize pending requests.
A REST'ish configuration service can use it on startup, clean or after a crash, the fetch all pending deployments again. This allows these services to be fully stateless with regards to which deployments they have already fetched.
User step, check deletion
Sometimes users wish to delete a check configuration via somaadm checks delete.
Internal step, job phase: check deletion
The check deletion job deletes the following objects from the in-memory tree:
- all checks for the configuration
- all check instances spawned by those checks
This results in the following objects to be flagged as deleted in the database:
- the check configuration
- all checks derived from the check configuration
- all check instances derived from the checks
At this point the lifeccycle component will pick this up and deprovision
all currently active configurations, ultimately moving them into state
awaiting_deletion.
Internal step, cleanup pruning
At some point we may have to clean up the database of all the things
either in state awaiting_deletion or simply flagged as deleted. At
that point, we also need to decide how much deleted history to keep
around and whether to simply delete or archive these old records.
That point has not yet come.