Guide for developers - snowplow-archive/sauna GitHub Wiki

HOME > GUIDE FOR DEVELOPERS

Who this guide is for

We welcome contributions to Sauna! This guide is for developers that wish to contribute to the core platform or add a new integration.

If you are a data engineer looking to integrate Sauna into ETL pipelines, please check out the Guide for analysts first.

Sauna architecture

This guide covers Sauna v.0.1.0.

Sauna 0.1.0 is built on top of the Akka akka [Actor model] actor-model, representing most of its' core entities as actors - independent units of computation, storing a state and connection to parents and (optionally) children.

Configuration

Sauna's entities are configured using self-describing Avro entities. The SaunaOptions object contains a parser which parses command-line arguments and extracts SaunaSettings, which in turn are immutable snapshots of configurations.

Entities

Mediator

Sauna starts with a root "supervisor" actor, called Mediator. This is the most stateful actor, mediating other entities and routing messages between them. It stores a state with information about which events were sent to which entities to maintain consistency and warn the user if some events are lost (instead of just awaiting for response from the responder, which can take quite a while to process.)

Unlike other actors, which can immediately recover from a crash, a crash of Mediator is effectively a crash of the entire application. (This also means Mediator is essentially a singleton. So far Sauna can be run only on a single machine, but in the future it will be a cluster handled by a similar mediator.)

Mediator holds references to all core entities: observers, responders and loggers. These core actors can communicate with Mediator simply by sending a message to context.parent. The references are constructed depending on the SaunaSettings object, which is passed as an argument of Mediator's constructor.

Observers

Observers are actors which are a source of lowest-level events. Observer events are usually raw ("file created", "HTTP request received") and not necessary awaited by responders. Currently only two "batch" events - LocalFilePublished and S3FilePublished - are supported, both of which can be intercepted and processed by any of Responders. In the future, we're planning to support a wide variety of different events coming from various SaaS vendors and applications, which can have streaming/batch/singleton/etc. nature.

Usually when an Observer emits an event, it "bubbles" up to Mediator and then gets broadcasted to all Responders, which in turn decide whether this event should be processed and acted on or skipped. Observer events must contain a path to an underlying file or S3 object, streamContent to iterate over the file's/object's content and observer to help Mediator keep track of sources that are currently processing events.

Observers can use some auxiliary constructs, such as dedicated threads to poll services.

Responders

Responders are the bread and butter of Sauna. These entities contain processing logic and carry out various actions depending on events they receive. Each responder extends the Responder[RE] trait, where RE is a specific responder event type.

A responder event is an entity which can be extracted by a specific responder from an observer event (described above) using the extractEvent method. If the extraction is successful, the enriched event is passed to the extracted event will be passed to the process method which contains all custom logic. process should be non-blocking and must send a message inherited from ResponderResult when processing is finished.

Apis

Objects in this package are designed to communicate with third-party APIs. Optimizely and Sendgrid have implemented several necessary features.

Loggers

Loggers provide feedback to end users based on messages received from observers and/or responders. There are two types of messages:

  • Structured Manifestation (conforms to a specified schema; can have several required fields).
  • Unstructured Notification (a simple wrapper over String).

DynamodbLogger can be used for structured messages, HipchatLogger - for unstructured, and StdoutLogger for both of them at once.

Utils

Shares common variables (TSVFormat, WSClient) in single instance.

⚠️ **GitHub.com Fallback** ⚠️