BEP4: Remote Step Submission - Vipyr/BazaarCI GitHub Wiki

30 October 2019 - BEP Created
7 December 2019 - Minor P1 Mockup Updates

Abstract

A natural next step for any process executor is to allow processes to execute in multiple independent processes. The machine hosting these processes shouldn't matter, it should only require that each machine is running the same version of Python and the same version of BazaarCI. Offloading work to both local OS threads and to cloud provided containers, VMs, or functions, is potentially very valuable. All of these submission execution strategies look very similar to the front-end, only significantly differing in the back-end. Each type should be a standalone plug in that can be installed independently, with it's own set of dependencies. User code should be able to declare and describe the submission type and location, then simply convert existing Step instances to remote submit to their described targets.

Proposal 1 (Tom Manner)

Implement a Remote adapter class which can be specialized with different remote types:

  • MultiprocessingRemote
  • DockerRemote
  • AWSRemote
  • AWSLambdaRemote
  • AzureRemote
  • AzureFunctionsRemote

These Remote object would implement a __call__(self, step) method that adds their setup and teardown routines to the Step instance via set_run_behavior.

Frontend Mockup:

from bazaarci.runner import Graph, Product, MultiprocessingRemote, Step

g = Graph("g")
p = Product("p")
s1 = Step("Step1", g, target=lambda: print("Step1"))
s2 = Step("Step2", g, target=lambda: print("Step2"))

s1.produces(p)  # Step 1 Produces p
s2.consumes(p)  # Step 2 Consumes p

# Create a MultiprocessingRemote instance
mp_remote = MultiprocessingRemote()
# Inject setup and teardown run_strategy functions for s2
mp_remote(s2)

g.start()  # Begin Graph Execution

Backend Mockup

class Remote:
    def __call__(self, step):
        raise NotImplementedError("Remote subclasses must implement __call__(self, step)!")


class MultiprocessingRemote(Remote):
    def __init__(self):
        super().__init__()

    @staticmethod
    def execute_in_process(func):
        @wraps(func)
        def wrapped(self):
            process = multiprocessing.Process(target=func)
            process.start()
            process.join()
        return wrapped

    def __call__(self, step):
        step.set_run_behavior(step, skip_if_redundant, wait_for_producers, MultiprocessingRemote.execute_in_process)

For the sake of predictability, this BEP should go in only after a way to retrieve the Step run behaviors, in order, has been implemented. Ideally, step.set_run_behavior would pass the existing behaviors in first. In the short term, this can be accomplished by using setattr(step, "run", MultiprocessingRemote.execute_in_process).

⚠️ **GitHub.com Fallback** ⚠️