Redesign of Observations - multiply-org/multiply-core GitHub Wiki

Observations is a collective term for several auxiliary classes that are at the time of writing used by the inference engine, all relevant code is located in https://github.com/multiply-org/KaFKA-InferenceEngine . These classes might be used by the coarse and/or high resolution pre-processing, too, so it makes sense to relocate their functionality to multiply-core (and refactor, while we're at it).

The purpose of the observations classes is to encapsulate access to the required bands of the products for the time steps within the given time period. This can be actual observation data or uncertainty data. The latter might be given in the form of an uncertainty matrix. A description of the current state is given here: https://github.com/multiply-org/KaFKA-InferenceEngine/wiki#the-observations-class . The class will return values resampled/reprojected to the user-defined output grid, omitting pixels that haven been masked out by a state mask. Also, in some versions this class holds an emulator.

The Observations class has the following interface that is expected:

def get_band_data(self, the_date: int, band_no: int)

where the_date is not an actual date as datetime object or as utc string, but the index of one of the products represented by the observations file. These products need to be sorted by time, obviously. band_no refers to the band index of the product at hand. The number of bands is given by bands_per_observation, which is a [number of input products]-dimensional array in observations. That attribute therefore needs to be set, too.

Note: The input to the observations class is undefined. Therefore, it is possible to hand in EO data in the native grid and perform the reprojection on the fly. We could, however, also reproject the data before. This choice currently does not seem to be crucial.

Questions:

  • What to do when data is coming from more than one sensor? Should an Observations object grant access to data from more than one sensor?
  • Why can the emulator often to be seen part of the observations class?
  • Where is it decided which bands Kafka requires? <- I guess this should be decided by the forward operator / emulator.

Suggested improvement:

  • Keep the emulator out of this. Find out why it is in here in the first place in the first place.
  • The Observations object shall provide a list of the bands it can grant access to (can be directly taken from the input products). Kafka should then not ask for integer indexes but for names. As a compromise, the indices might point to a string array containing the names.
  • Get rid of dependency to brdf-descriptors repo
  • Define interface for Observations, make all observation classes adhere to this interface. Do not allow access to variables - all communication shall be handled through the interfaces.