ESO Data Processing System Notes - casangi/RADPS GitHub Wiki
ESO Data Processing system (EDPS) is a framework to run ESO's data processing pipelines implemented in Python.
- “Adaptive data reduction workflows for astronomy:The ESO data Processing System(EDPS)” Freudling, Zampieri, Coccato et al. 2024 A&A 2024 A&A 681. A93
- EDPS workflow design tutorial (more practical guide to how to create an EDPS workflow)
EDPS recipe - Descriptions of processing steps (including algorithms, parameters, and methods) which can be executed independently. Each recipe has specifications of required (main and associated) inputs and outputs.
EDPS task - A specific instance of an executing recipe.
EDPS job = Prefect task and its inputs - almost analogous to a Prefect “run” but without the execution and associated information (?)
Automatically adapts to different use cases for data reduction (QA, production of science products) and automatically derives workflows for them.
Automatically derives processing workflows for difference use cases from a single specification of a cascade of processing steps (advantage: no need to write and maintain a set of static workflows that need to be modified when observing strategies, pipeline, or calibration plans change. What steps are run and what is processed depends on the target selection.
Running different ‘recipes’ based on different circumstances is something the PL team has expressed would be useful.
‘Smart-re-runs’ If the same task is executed with the same set of parameters and input files, EPDS will skip the processing and use the previously saved result. Prefect can handle this with results cacheing.
Automatically wait until all needed inputs for downstream tasks are present before executing them.
Their tests test:
- The grouping and data association
- Structure of the processing cascades So, the goal is to verify that the generated workflow, the tasks triggered and their inputs are as expected. No actual recipes are run.
In interactive mode, the tasks are executed in an order that is easily understood by the user and optimized for interaction at the necessary steps. The order that is most easily understood by the user (to understand the consequences of their interactions) is often not the most efficient for computing resources.
EDPS server gets requests via a REST API. Request info includes: data location, processing cascade spec, a target, and workflow parameters if needed. EDPS derives and executes the data processing workflow in a sense that seems more broad than the way Prefect does.
It consists of:
- List of tasks (main workflow) e.g.
instrumentname__wkf.py - Datasource file contains the list of the inputs of the various tasks in the workflow. E.g.
instrumentname_datasources.py - A file with the classification statements (e.g.
instrumentname_classification.py) Containsclassfication_rulesobjects. The classification rules may be in a separate file - A file with rules, functions that allows classification and association of files (
instrumentname_rules.py) - A file with the definition of (FITS?) header keywords. (
instrumentname_keywords.py) - A yaml file with workflow and task parameters (
instrumentname_parameters.yaml) - A file with subworkflows
Main workflow (demo0_wkf.py)
from edps import task
from .demo0_datasources import *
#--- Processing tasks -------------------------------------------------------------------
#- Task for processing raw biases
bias_task = (task(’bias’)
.with_recipe('run_bias')
.with_main_input(raw_bias)
.build())
#- Task for processing raw flats
flat_task = (task(’flat’)
.with_recipe('run_flat')
.with_main_input(raw_flat)
.with_associated_input(bias_task)
.build())
#- Task for processing science exposures
science_task = (task(’object’)
.with_recipe('run_science')
.with_main_input(raw_science)
.with_associated_input(raw_sky, min_ret=0) # sky is an optional input
.with_associated_input(bias_task)
.with_associated_input(flat_task)
.with_associated_input(static_catalog)
.build())
demo0_datasources.py
from edps import data_source
# --- Raw types datasources --------------------------------------------------------------
raw_bias = (data_source(’BIAS’)
.build())
raw_flat = (data_source(’FLAT’)
.build())
raw_science = (data_source(’OBJECT’)
.build())
raw_sky = (data_source(’SKY’)
.build())
# Catalogue of standard stars
static_catalog = (data_source("catalog")
.build())
The paper raised the following questions:
- What are the requirements for managing workflows? What is the workflow lifecycle for our use cases and for “custom” workflows? How do we build, refactor, and reuse workflows? (There are currently 33 PL recipes.)
- For each of the use cases^*, what information is needed for each stage, where does the information come from, and when is it available? How do the answers to these questions affect workflow, stage, and domain library design (including interfaces/contracts between elements)? What does this information tell us about stage sequencing?
- Context design is an open question but using a service to make required information available is attractive.
The paper reinforces elements of the current design:
- Ensure domain functions are not coupled to infrastructure.
- We want a library of domain functions.
- We want a web api to launch processing.
- Cyclic graphs for processing and Algorithm Architecture are sufficient fundamental concepts.
Other issues raised:
- Brian Kirk is investigating how to apply ML for optimal task (stage?) invocation sequencing.
- If we were able to auto-sequence stages, we would not need a large set of recipes.
- The more stage information available up front, the fewer conditionals (lower complexity) in the codebase.
- All workflows will need to be defined to the level of detail provided in RADPS Memo 6 “Example RADPS Workflow Decomposition”.
^*Background: Current use cases currently being developed: Standard Mode Data Reduction (automated), Interactive Workflow (human-assisted), Calibration, Commissioning and New Modes, Operations from Array Operator Perspective, Operations from Software Operations Perspective, Triggered Observations (target of opportunity).