Building a Production Service Based on prosEO - dlr-eoc/prosEO GitHub Wiki

A Production Service Using prosEO – The Processing System for Earth Observation Data

Table of Contents

DISCLAIMER: Not all of the functionalities described below are available in the current version of the code. Some features are design goals, for which an implementation will be added at a later stage.

Introduction

The “prosEO” software system is an open-source processing control system designed to perform all activities required to process Earth Observation satellite data (e. g. Sentinel data), generating user-level data, engineering data and/or housekeeping telemetry data as desired by a configured mission. The technical infrastructure used to deliver the Production Service is a cloud-native multi-mission infrastructure by design, with strict separation of competences and concerns.

Three types of Production Service can be delivered:

  • Systematic Production from satellite raw data (CADUs) to user-level products (Level-0, Level-1, Level-2 or higher);
  • On-demand Production for a specific user-level output product and from a given user-level input product, which may already reside within the Production Service or may be retrieved from some external Archiving Service;
  • Bulk Reprocessing of past user-level data from archived lower level data.

For all three service types the technical capacities and performance can be adjusted to a large degree due to the cloud-based approach based on the technologies of Docker and Kubernetes as detailed below.

For the orchestration of the processing prosEO uses a novel data-driven approach, which abandons the use of fixed workflows or processing chains in favour of a dynamic generation of processing networks from configured product class dependencies.

prosEO centres around the idea of describing dependencies between product classes by selection rules. Selection rules define the input (source) product classes (including lower-level data products, auxiliary data or even configuration data) necessary to produce a given output (target) product class (which again can be a user-level data product, an engineering product, an intermediate auxiliary product etc.), and they specify the retrieval policies to apply to each of the input product classes (e. g. “ValIntersect” meaning “select all products, whose validity period intersects somehow with the validity period of the output product).

prosEO core data model

Thinking in product dependencies like this turns the usual logic of specifying processing orders around, as the core data model in the figure above shows:

  1. The static configuration for a mission describes product classes and their interdependencies as selection rules, and it defines, which processor classes are able to create products of a certain class.
  2. Processor classes materialize in various, usually, but not necessarily subsequent versions, and they may have multiple configurations, which are tied to one or more processor versions by “configured processors”. The validity of selection rules may be restricted to certain configured processors, since the need for input and auxiliary data of a processor may vary as the processor evolves.
  3. Processing orders are defined by the product classes they request as output (in contrast to some conventional workflows, which define the input), by the time frame, for which the products shall be generated, and optionally by a certain set of configured processors to use for the processing (thus allowing for multiple configurations and processor versions to be used in parallel on a single prosEO instance). Future versions will also include spatial selection options in addition to or instead of a time frame.
  4. The production planning process generates jobs for each time slice (e. g. orbit) in the selected time frame and job steps for each final output product requested and intermediate product required as a prerequisite (as long and recursively over as many processing levels as configured processors are defined for the respective product types). Some of the advantages of such an approach over classic, workflow-based approaches are:
  5. Each job step can be processed as soon as the required input products are available (there is no need to adhere to a strict order), which means that products with few dependencies are produced as early as possible.
  6. Foreseen “processing chains” can start at any point “in the middle”, provided the required products are already present (which is often the case in both systematic production and reprocessing scenarios).
  7. The configuration of processing networks is much simpler, since possible parallelisations of tasks need not be analysed in advance, but are handled implicitly by the Production Planner component.
  8. Multiple processor and configuration versions may exist in parallel, allowing for easy switching between processor and configuration versions or even running different configurations in parallel (this includes the ability to fallback on any earlier processor or configuration version in case of a malfunction of a processor CFI). The prosEO system consists of the control instance (a set of microservices) performing the dispositive tasks (the “brain”) and an arbitrary number of processing facilities for the actual workload of processing the satellite data (the “hands”). Note that the brain and the hands are not required to reside on the same cloud system; however, to ensure scalability the hands rely on the Kubernetes infrastructure.

prosEO components

Operational Scenarios

prosEO is intended for the three operational scenarios described below.

Systematic Production

Systematic Production is defined as:

  • Gathering the data stream acquired by the Sentinel satellites and made available at pick-up points by the Acquisition services;
  • Processing it into a set of pre-defined set of data meeting the expected specifications;
  • And deliver them systematically and fully to the production delivery point within a given timeliness.

Systematic Production in the context of a prosEO-based production service is implemented as follows (based on a production stream beginning with CADU to L0 processing, but any other start level would also be possible):

  1. A polling process scans configurable Acquisition Interface Points (XBIP, EDIP) for new satellite raw data.
  2. When data is found, the prosEO Ingestor ingests it into the prosEO Storage Manager as input product(s), and an order to process Level-0 products and accompanying products (engineering data, HKTM) is created and immediately released. Where necessary, some pre-processing may be performed before data ingestion.
  3. Optionally (depending on the mission requirements) additional orders for higher user-level products are created and released (this may also be triggered by a timer, which pre-fabricates orders in a pool, which are then released as Level-0 and subsequent input data become available).
  4. The prosEO Production Planner creates jobs and job steps for the orders; each job step is started immediately upon availability of all required input products (if sufficient system resources are available).
  5. Upon completion each job step delivers its output product(s) to the Storage Manager and notifies the Production Planner of its completion and (indirectly via the Ingestor) of the availability of the new product, thereby potentially in turn triggering the start of other job steps.
  6. Delivered products are visible on the PRIP and can be retrieved by the authorized consumers.

This process can be fully automated and does not require operator intervention except in case of anomalies. In parallel to this process a continuous stream of auxiliary data is ingested from the configured Auxiliary Data providers [REQ-040].

On-demand Production

On-demand Production is defined as allowing the generation of higher level of data (typically Level-1/2) from archived lower level of data (typically Level-0) available in a Long Term Archive (LTA), as per user requests coming from some ordering service for a product not readily available (typically a historic product).

The production process in prosEO is as follows:

  1. The ordering service (or any other authorised entity) sends a workflow query request to the On-Demand Interface point (the “OD” part of the ODPRIP), which is answered with a list of available workflows (modelled in prosEO as “configured processors”).

  2. The Data Distribution service sends a production order specifying the workflow (configured processor) desired and the input product to use, from which prosEO derives the actually requested output product type and validity period.

    As an extension to the On-demand interface it is proposed that the client may just specify the requested output product type and the requested validity period and leave the decision on the required input files to the Production Service. This would avoid the need to define the rules for generating output products from input products twice, once in prosEO and once in the ordering service.

  3. The prosEO Production Planner checks for the availability of the requested output product (may be cached from a previous request), input product(s) (may be cached) and auxiliary products (may be cached). For any missing input products a retrieval request is sent to one or more configured Long-term Archives, for missing auxiliary products to one or more configured Auxiliary Data providers.

  4. As soon as the requested products have been ingested, the Production Planner starts the processing job steps, which deliver their outputs to the prosEO Storage Manager and hence to the PRIP (same as step ‎5 above).

  5. If a notification endpoint is given in the production order, the client will be notified of the order completion actively, otherwise it may poll the ODPRIP for the order state. This step is optional.

  6. The output product can be retrieved from the PRIP either by searching for it or by the download link given in the order completion message.

This process can be fully automated and does not require operator intervention except in case of anomalies.

Bulk Reprocessing

Bulk Reprocessing aims at “the bulk (re-)generation of past higher level data (typically Level-1/2) from archived lower level data (typically Level-0)”. A typical reason for bulk reprocessing is the availability of new processing algorithm versions, which make it worthwhile to re-generate its output products for some part or even the whole duration of the mission. In contrast to Systematic Production and On-demand Production setting up orders for a reprocessing campaign is not an automated process, since it requires a significant amount of co-ordination and clarification between the parties involved. Typically and also in contrast especially to On-demand Production the input data will be provided as bulk requests from the Long-term Archives and/or Auxiliary Data providers before the actual processing starts to keep the duration of the processing as low as possible in order to minimise cloud service costs. Therefore the process is much more elaborate especially in its early stages:

  1. The customer presents their reprocessing requirements to the Production Service provider, who creates an initial version of the prosEO processing order(s) required to perform the reprocessing.
  2. The parameters of the reprocessing campaign (including the sizing of the processing facility) are iterated and refined between customer and Production Service provider, until a final reprocessing plan is agreed. The processing orders for the reprocessing campaign are then approved in the prosEO system. For the sizing of the processing facility the intrinsic limitations of the degree of parallelisation due to dependencies between the input, auxiliary and output products need to be taken into account, especially where feedback loops are involved.
  3. The required input products and auxiliary products are retrieved in batches from the respective interfaces of the Long-term Archive and Auxiliary Data providers. This step may already be started in parallel to the preceding steps, and it is ideally completed at the same time as the approval of the reprocessing plan.
  4. The Production Service provider performs a planning run on the processing orders and discusses the results with the customer. This may lead to further changes to the reprocessing parameters, an update of the reprocessing plan and renewed approval and planning actions.
  5. Once the parties agree on the outcome of the planning run, the processing orders are released.
  6. The prosEO Production Planner starts the individual jobs and job steps according to the availability of their input products and to the maximum degree of parallelisation allowed by the size of the processing facility.
  7. Upon completion each job step delivers its output product(s) to the Storage Manager and notifies the Production Planner of its completion and (indirectly via the Ingestor) of the availability of the new product, thereby potentially in turn triggering the start of other job steps.
  8. Delivered products are visible on the PRIP and can be retrieved by the authorized consumers.

Steps ‎6 to ‎8 essentially correspond to steps ‎4 to ‎6 of the Systematic Production scenario.

Special Scenarios

TODO: Add PRIP-only scenario

System Components

The prosEO system consists of two main parts, the control instance (“brain”) and one or more processing facilities (“hands”). The control instance performs the processing orchestration and holds metadata information on all processing orders and products currently in the system as well as configuration information on missions, their product classes, processors, configurations etc. as presented above. The processing facilities perform the data processing workload and store the input, auxiliary and output data. To maximise re-use of cached product data in the processing facilities it is recommended (but not required) that a single processing facility is used per mission. Where requirements dictate otherwise, a different layout can be configured (e. g. a dedicated processing facility with a PRIP of its own).

With respect to the PRIP it has to be noted that the PRIP is just an API definition (implemented as just another microservice, which can run as part of the “brain” or as a standalone application), and the PRIP itself as implemented in prosEO does not hold any product data. All product data is managed by the prosEO Storage Manager, of which one instance resides in each processing facility. It is physically stored either on a POSIX file system within the processing facility or (more frequently) in an object storage service provided by the external cloud service provider.

Control Instance

The control instance (the “brain” of prosEO) consists of the following components (see also the figure above):

  • Metadata database (currently PostgreSQL) holding configuration information for missions, product classes and their selection rules, processor classes, versions and configurations as well as metadata on processing orders with jobs and job steps and on input, auxiliary and output products including their storage location in one or more processing facilities.
  • User interface for monitoring and manual operation of prosEO (a graphical user interface [GUI] deployed together with the other brain components and a command-line interface [CLI] running locally on a user workstation as a stand-alone Java application; the CLI can be configured to connect to different control instances).
  • Production Planner: The central component for the processing orchestration, which analyses processing orders and generates jobs (e. g. one job per orbit for processing orders spanning multiple orbits, or one job per calendar day etc.) and job steps for all output and intermediate products, which need to be produced, and for which a configured processor is defined. The Production Planner does not create an explicit network of job steps, but such a network can be deduced from the product dependencies within a job and can be visualized on the GUI.
  • Ingestor: Creates metadata information for input, auxiliary and output products from various sources and notifies the Production Planner of any newly ingested product, thereby allowing for an event-based starting of job steps as soon as the required input and/or auxiliary products are available. The Ingestor also interfaces with the prosEO Storage Manager for the storage of the actual product data, which is then fetched directly from the source by the Storage Manager for maximum efficiency (i. e. the high-volume product data is never routed through the Ingestor).
  • Other microservices dealing with user management, order management, configuration of missions and mission elements, and of processing facilities and external data providers;
  • The PRIP (described in detail in sec. ‎5.2.3), which may or may not be considered a part of the control instance, depending on mission-specific deployment layout considerations.

All sub-components of the control instance (except for the CLI) are deployed as Docker containers and co-located on one (virtual) machine, which may reside in a public cloud environment or in any other environment, where unimpeded access to and from the Internet is ensured (for communication between the components of the Production Service as well as for communication with the external entities via the interfaces listed below).

Processing Facility

The processing facility is the runtime environment for the data and quality control CFIs and for the prosEO Storage Manager, which controls the product data storage (POSIX file system or object storage). Multiple processing facilities can be used with any one control instance, however one processing facility should not be tasked by more than one control instance (in theory this is possible, because there is no fixed relationship between these components, but it would violate the rule of separation of concerns). It is recommended to separate each processing facility from the Internet by a bastion host, and to control it by one Kubernetes master host.

In order to make any specific processor CFI work with prosEO, it needs to fulfil the following criteria:

  1. Provided as a self-contained Docker image (except for static configuration data of significant size, which may be placed on a local file server in the processing facility and mounted into the container at a location pre-defined by prosEO),
  2. Fully controlled by a Job Order file (i. e. input files, output files and all processing parameters taken from the Job Order),
  3. Running as a one-off process with a single processing run, not as a background (daemon) service monitoring some pickup point or polling some external source.

Should one of the first two conditions not be met, a custom Docker container and/or a custom adapter script can be created to integrate the processor. The third condition must be met for the processor to run under full control of prosEO. Where the third condition is not met, the processor CFI can be run on a separate processing node outside of the control of prosEO and the output can be ingested into prosEO for further processing steps and for publishing on the PRIP.

Once the above criteria are fulfilled, the Docker image is wrapped in a prosEO Docker image with a Java process (the Wrapper) acting as the interface between the Production Planner and the CFI Docker image. The Wrapper may be customized at two extension hooks, so that it may modify the Job Order file created by the Production Planner to suit the needs of the processor CFI (e. g. replacing explicitly named output files by output directories to cater for multiple output files), and may handle processor CFI output not in the main line of processing (e. g. engineering data, HKTM data or reports), create packed archives of the output data and the like.

Interfaces

The Production Service will provide/use the following interfaces (derived from ESA's Copernicus Ground Segment Architecture):

  • Input:
    • Archive Interface Delivery Point (AIP): Retrieval of lower user-level input products for On-demand Production and Bulk Reprocessing (via the Query Product, Retrieve Product and Bulk Retrieve Product methods),
    • Auxiliary Interface Delivery Point (AUXIP): Retrieval of auxiliary data for all production modes,
    • POD Interface Point (PODIP): Retrieval of Precise Orbit Determination (POD) elements for Systematic Production and Reprocessing (stored as spacecraft-dependent orbit information in the prosEO Metadata Database),
    • X-Band Interface Delivery Point (XBIP) and EDRS Interface Delivery Point (EIP/EDIP): Retrieval of satellite raw data for Systematic Production,
    • Reference System: Retrieval of current versions for data and quality control processor CFIs,
    • Docker Repository: Retrieval of current versions for wrapped processor CFIs (see above) and for prosEO microservices.
  • Input/Output:
    • On-demand Production Interface Delivery Point (ODPRIP): Acceptance of workflow query requests and production orders, information about production order status.
  • Output:
    • Production Interface Delivery Point (PRIP): Querying of products available for download and download of product data.

For details on the AIP, AUXIP, XBIP, EIP/EDIP and (OD)PRIP interface definitions please refer to ‎ESA's Copernicus Ground Segment Architecture and their respective ICDs, which for participants of Copernicus missions can be obtained from ESA. The Docker Repository may be a private Docker repository deployed in some DMZ, but it must be accessible for all control instances and processing facilities operated by the user.

PRIP Interface

The working method of the PRIP warrants a more detailed explanation, because its structure is apparently different from the assumptions underlying the PRIP specification. As noted in the introductory paragraphs the prosEO PRIP implementation is a shallow interface layer to the prosEO Ingestor and Storage Manager services. As such, it has no persistent storage of its own, but relies on the Ingestor service for the metadata storage and on the Storage Manager services at one or more processing facilities for the persistent storage of product data.

The figure below illustrates the communication sequence when downloading product data: After sending the download request to the PRIP API, the PRIP API client receives a redirect (HTTP status 302) response indicating the actual location of the resource. Since multiple processing facilities and hence multiple storage locations may be used at any point in time, the PRIP API client cannot guess the correct location of the data. This information is first gathered from the metadata database and then converted into a redirection URL, which can be used by the client (usually automatically) to retrieve the data stream. The client must ensure that it uses the same authentication info for the data download as it used for the initial request, otherwise the download attempt will fail with a “401 NOT AUTHORIZED” HTTP status.

PRIP communication sequence

Users, Roles and Authentication

Independent of any deployment considerations of the prosEO operator, the following security mechanisms are provided in the prosEO code:

  • All requests to the functionality of the control instance require authentication, where usernames and BCrypt password hashes are stored in the metadata database.
  • User privileges are granted per mission, making prosEO a secure multi-tenant system.
  • User privileges follow a role-based access model with users and user groups, and fine-grained individual privileges assignable to user groups or to users individually (although the latter is discouraged to ensure maintainability) – example user groups are:
    • Production Operator: Monitor real-time production status, create, plan and release orders (but not “approve” - see “Moderator”), analyse job step logs, resume failed job steps and so on;
    • Archive Operator: Monitor archive usage, ingest products, delete product files from processing facility, delete product metadata;
    • Engineer: Configure missions, product classes, selection rules, processor classes, versions and configurations, processing facilities etc., low-level failure analysis (e. g. access to system logs);
    • Service Manager: Monitor real-time production status, query event history, monitor KPIs;
    • Moderator: Approve orders (during Reprocessing campaign; this could be an external entity, e. g. the customer);
    • External Entity: Product query, product download (only via PRIP, no GUI or CLI access);
    • User Manager: Manage user, groups and privileges;
    • “Root” account: Create and delete missions, create a user manager account for a mission (no other privileges should be given to this account).

Available authentication methods are HTTP Basic Authentication (implemented) and OAuth2 token authentication (planned).

Monitoring and Reporting

Systems monitoring for prosEO can be performed on three levels:

  • Application level: A dashboard in the prosEO GUI provides information on the real-time production status for orders, jobs and job steps with the possibility to drill down to a detailed view of each order, job and job step including the logs of running and completed job steps (logs are refreshed in a configurable time interval).
  • Kubernetes/Docker level: Due to the runtime environments of Docker and Kubernetes, the monitoring tools available for these environments can be used to monitor the current status of the control instance (Docker only) and the processing facilities (Kubernetes). Selected information from Kubernetes is also displayed on the prosEO application dashboard (planned).
  • System level: Apart from the regular system logging, which should be enabled on all nodes per default, an external supervision of all nodes through the management environments of their respective cloud service providers and real-time monitoring of the service availability using a systems management tool are highly recommended.

Additionally, execution statistics can be gathered from the prosEO Metadata Database, which collects runtime statistics for all job steps and information on all products (input, auxiliary and output) known to the prosEO system. Selected performance statistics are presented on the prosEO application dashboard (planned).