Dev Tutorial : Extending dispty with proprietary commands - Schlumberger/distpy GitHub Wiki

Writing extensions to distpy whilst protecting intellectual property

In creating distpy as a permissive Open Source project, we also wanted a way to continue extending it with proprietary algorithms and to allow the development of proprietary signal processing flows.

The second of these is enabled by the capture of the signal processing flows in JSON format externally to the licensed code.

This tutorial covers the first aspect, the extension of the distpy command set with custom commands. We will cover the mechanism, and then provide a preferred approach which allows proprietary commands to later become submissions to the Open Source base.

Using extended command sets in distpy

When calling the core command controller, there is the option to pass a list of python module names:

extended_command_sets = []
extended_command_sets.append('distpy_3rd.calc.thirdparty_command_set')
extended_command_sets.append('distpy_mine.calc.my_command_set')
distpy.controllers.parallel_strainrate_processing.main(configStrainrate2Summary, extended_list=extended_command_sets)

This shows an example where a set of commands from a third party and some extra commands I have coded myself are appended to the command set.

The first aspect to note is that the supplied string is the module name of a command set. The template for all the command set modules is the pub_command_set.py which houses the publicly available commands.

Writing extended command sets

An extended command set module contains the following aspects:

  1. A class for each command derived from the distpy BasicCommand
  2. A function KnownCommands that extends a python dictionary of BasicCommand objects with the classes in the module

The pattern for extending commands is shown here, with an example that implements our own version of the AbsCommand

from distpy.calc.pub_command_set import BasicCommand

class AbsCommand(BasicCommand):
    def __init__(self,command, jsonArgs):
        super().__init__(command)

    def execute(self):
        self._result = numpy.abs(self._previous.result())

This is the minimum required implementation, there is no requirement to implmement the automatic-documentation. Any parameter configuration your command has is provided in the jsonArgs. The execute(self) method is where your calculation algorithm resides. The self._previous.result() will be a numpy array containing the results of the preceding command. Placing your results in self._result makes them availabe to the next layer of commands.

The associated 'KnownCommands()' function is:

def KnownCommands(knownList):
    knownList['abs']     = AbsCommand
    return knownList    

The preferred approach

In the examples above the module for my commands was distpy_mine.calc.my_command_set, which mirrors the location of the command set module in the distpy package, which is distpy.calc.pub_command_set. We encourage you to create extensions to the ingesters and command sets by mirroring the structure of the distpy package, so that you can then defer deciding which parts of your code are shareable with the wider community. You can then develop in a "proprietary mode" where all your code is in-house, and later decide which commands are sufficiently generic and useful to contribute to the main distpy project.

An explanation of the design choice

The list of string-based names of command set modules, which appears as an optional entry in the top-level distpy.controllers.parallel_strainrate_processing.main() function, coupled with the KnownCommands() function in each command set module, was chosen so that the massively scalable asynchronous processing feature of distpy is preserved. The loading of the module is deferred to the creation of the asynchronous calculation on a single chunk of data. You can see how this is done in the CommandFactory():

def CommandFactory(commandList, commandJson, extended_list=[]):
    # currently we are just extending our own list...
    knownList = {}
    # Add your own command sets below. The order indicates
    # precidence. For example my_command_set is overwriting
    # commands  in the public command set.
    knownList = pub_command_set.KnownCommands(knownList)
    # plotting subset
    knownList = plt_command_set.KnownCommands(knownList)
    # - load any supplied extensions
    for module_name in extended_list:
        exec('import '+module_name)
        exec('knownList = '+module_name+'.KnownCommands(knownList)')

Each supplied command set is parsed in order, starting from distpy.calc.pub_command_set. The results of the successive KnownCommands() functions are potentially over-writing previous definitions of similarly named commands. This means that if we have implemented a far superior version of abs() in our numerical library, it is our command that is used in favour of the original distpy command.