Interfaces - multiply-org/multiply-core GitHub Wiki

Within the MULTIPLY platform, there are several types of interfaces. Here, we will focus on two types of interfaces: Command Line Interfaces (CLIs) and application programming interfaces (API). First, Command Line Interfaces and APIs will be explained.

Command Line Interfaces

Command Line Interfaces (CLIs) shall be provided by all those components that execute processes. When called, the components will exceute the specific process. The components which shall be called from a command line are (for now):

Data Access Component
Coarse Res Pre-processing
High Res Pre-processing
SAR Pre-processing
Emulation framework (when creating an emulator)
Inference Engine

The result of command line calls are files that are written to a hard drive. Allowed file formats are NetCDF for EO Data, YAML for configuration data, or the native format of EO Data retrieved from the Data Access Component. Components that are called from a CLI operate independently from other components, as they are completely separated from other parts of the platform. In its most radical form this would mean that these components can be implemented in the language of choice of the developers without considering other modules.

Command line interfaces are the only interfaces that the orchestrator will use. An example of how to create a command line interface from python is given at https://github.com/multiply-org/multiply-core/tree/master/cli_example .

Application Programming Interfaces

For some components it does not make sense to call them (solely) via a CLI.

These components will offer programming interfaces that might be called from other components. It might also be possible that objects of classes from one component are used within another component (I expect this to happen for priors, which will be used within an inference engine).

As components communicate with each other directly, these components must be implemented in the same programming language. This language will be Python, as has been agreed on by the members of the MULTIPLY team. Components that communicate with each other through programming interfaces are:

Prior Engine
Emulation Framework
Inference Engine

Component Interfaces

In this section we list CLI and API definitions. These are opened here for discussion. They shall ultimately be implemented and used by other components.

Data Access Component

Command Line Interface

multiply get_data_urls --config CONFIG --roi ROI --start_time START_TIME --end_time END_TIME 
    --end_time END_TIME --data_types DATA_TYPES --DOWNLOAD --working_dir WORKING_DIR 
    --download_dir DOWNLOAD_DIR
    Retrieves a list of urls for the data that intersect with the roi, lie between 
    start_time and end_time and are any of the specified data_types.
    - config: A config file in YAML format. This file will expect the parameters 
    also listed below:
    - roi: The spatial extent of the region of interest, given as wkt string. EO products 
    that intersect this zone will be included. If not given, it is assumed that the whole 
    globe is the region of interest.
    - start_time: The earliest time for which EO data is requested. Given in ISO format. 
    Must not be later than end_time.
    - end_time: The latest time for which EO data is requested. Given in ISO format, 
    must not be earlier than start_time. Optional.
    - data_types: List of string values. The types of EO data shall actually be retrieved. 
    If not given, no data will be retrieved.
    - download: Whether the data shall be downloaded immediately when it is not already 
    provided. True by default
    - working_dir: Directory to which the list with the urls shall be downloaded.
    - download_dir: Directory to which data shall be downloaded. Mandatory when download 
    is true. NOTE: This might later be replaced, so users might enter 
    a specific data store here.
    Returns: A yaml file consisting of a list of urls. These urls will either point to 
    local or remote data. The file will also contain additional information about the 
    temporal and spatial extent of the EO data.

Python Interface

@classmethod
def get_data_urls(cls, site: str, start_time: str, end_time: str, data_types: list, 
    download: bool, working_dir: str, download_dir: str) -> list:
    """
    Retrieves a list of urls for the data that intersect with the roi, lie between 
    start_time and end_time and are of any of the specified data_types.
    :param site: The spatial extent of the region of interest, given as wkt string. 
    EO products that intersect this zone will be included. If not given, it is assumed that 
    the whole globe is the region of interest.
    :param start_time: The earliest time for which EO data is requested. Given in ISO format. 
    Must not be later than end_time.
    :param end_time: The latest time for which EO data is requested. Given in ISO format, 
    must not be earlier than start_time. Optional.
    :param data_types: List of string values. The types of EO data that shall actually 
    be retrieved. If not given, no data will be retrieved.
    :param download: If true, data that is not locally available will be downloaded to 
    download_dir. True by default.
    :param working_dir: Directory to which the list with the urls shall be downloaded.
    :param download_dir: Directory to which data shall be downloaded. Mandatory when download 
    is true. NOTE: This might later be replaced, so users might enter a specific data store here. 
    :return: a list of urls to files of the requested data types.
    """

Prior Engine

Python Interface

def get_mean_state_vector(parameters: List[str], state_mask: List[bool], time: str, 
    logger: ?, file_lcc_biome=?, file_prior=?) -> List[float], List[List[float]]
    """Retrieves a state vector and an inverse covariance matrix
    :param parameters: A list of parameters for which priors need to be available those 
    will be inferred).
    :param state_mask: A georeferenced array that represents the space where solutions 
    will be calculated. Spatial resolution should be set equal to highest observation. 
    True values in this array represents pixels where the inference will be carried 
    out, False values represent pixels for which no priors need to be defined 
    (as those will not be used in the inference)
    :param time: The time for which the prior needs to be derived
    :param logger: A logger or "traceability database"
    :param file_lcc_biome: ?
    :param file_prior: ?
    :return: The updated mean state vector and the associated inverse covariance matrix.
    """

edit:

prior engine is called by the orchestrator with method get_mean_state_vector.

def get_mean_state_vector(date:str, variables:list[str])->dict:
    """
    :param datestr: The date (time?) for which the prior needs to be derived
    :param variables: A list of variables (sm, lai, roughness, ..) for which priors need to be available
    :return: dictionary with keys being the variables and values being tuples of filenames and bands
    """

Inference Engine

Command Line Interface

multiply infer --config CONFIG --roi ROI --start_time START_TIME --end_time END_TIME 
    --spatial_res SPATIAL_RES --time_interval TIME_INTERVAL --observations OBSERVATIONS 
    --a A --inflation --INFLATION --output_dir OUTPUT_DIR --parameters PARAMETERS 
    Performs inference on a set of observations, taking into account the spatial and
    temporal constraints.
    -config A config file in YAML format. This file will expect the parameters 
    also listed below:
    - roi: The spatial extent of the region of interest, given as wkt string. EO products 
    that intersect this zone will be included. If not given, it is assumed that the whole 
    globe is the region of interest.
    - start_time: The earliest time for which EO data is requested. Given in ISO format. 
    Must not be later than end_time.
    - end_time: The latest time for which EO data is requested. Given in ISO format, 
    must not be earlier than start_time. Optional.
    - spatial_res: The spatial resolution of the output grid. Specified in m. Mandatory.
    - time_interval: The interval between start_time and end_time for which inferences
    shall be made. Mandatory.
    - observations: The observations on which to operate. These are outputs of the high res
    and SAR pre-processing branches. Mandatory.
    - a: (what is this?). Optional, default is identity matrix.
    - inflation: (What is this?) Optional, default is 1e-3.
    - output_dir: The directory to which the inference result shall be written. Mandatory.
    - parameters: The names of the biophysical parameters that shall be derived. Mandatory.
    Returns: One (or more?) NetCDF file(s) that contain(s) the requested biophysical 
    parameters for the stated roi and temporal selection

SAR Pre-Processing

Command Line Interface

multiply sar_pre_processing --config CONFIG --input_folder INPUT_FOLDER --output_folder OUTPUT_FOLDER 
    --gpt_path --xml_graph_path XML_GRAPH_PATH --file_list FILE_LIST
    Process all files listed in file_list. If no file_list is specified all files in input_folder will be processed
    - config: A config file in YAML format. This file will expect the parameters also listed below:
    - input_folder: path to folder where downloaded files stored (format of the downloaded files should be .zip, 
    standard format if downloading from Sentinel Data Hub). Mandatory.
    - output_folder: path to folder where preprocessed files are saved (three directorys will be created within the 
    output_folder) Mandatory.
    - gpt_path: path to location of SNAP's graph-processing-tool, If not given no preprocessing at all.
    - xml_graph_path: path to location with xml-graphs for preprocessing. Mandatory.
    - normalisation_angle. Optional.
    - area_of_interest: For subsetting and therefore reduction of processing time. Optional.
    Returns: output_folder with all processed SAR files in NetCDF4 format.