ExecutionPlanner - Zarquan/lucia GitHub Wiki

Introduction

The Execution Planner (EP) interface aims to provide a simple way to discover and access computing services.

The design of the EP interface is based around two key classes of objects, computing Tasks and Services.

The primary question the EP interface is designed to answer is "where can I run this Task ?" More specifically, "which computing Services are able to run this Task for me?"

The simplest solution would be for a central agency to maintain enough metadata about all the available computing Services to be able to answer that question using a simple database query.

The simplest case is just to match the type of task with the type of Service, using a simple string match.

SELECT * FROM services WHERE servicetype = 'binder'

This works for a small fixed set of Task types with a simple set of criteria for accepting a Task. However as the range and complexity of Task types begins to grow this centralised solution becomes harder to maintain.

Different types of Tasks will have different metadata to describe them and different Service instances will have different criteria for accepting or rejecting Tasks. Each time a new type of Service or Task becomes available, even if it is just a version increment of an existing type, the software for evaluating an execution request will need to be updated.

As the system evolves, we can see the criteria or rules for accepting a Task growing in complexity over time. If we imagine a system capable of deploying and executing a complex chain of interconnected software components, the criteria for accepting or rejecting a complex Task like this will also also grow in complexity.

The design of the EP aims to address this complexity using the Separation of Concerns pattern to delegate as much as possible to the Service instances

Basic /accepts query

The the EP interface defines a simple stateless HTTP interface for checking if a Service will accept a Task. Delegating all of the complexity of handling the type specific criteria to the Services themselves.

The specification for the interface is based on a simple HTTP endpoint that supports either GET or POST requests.

Jupyter notebook

Replicating the simplest use case outlined above, where the name of the Task type matches the type that the Service implements, the client can call the EP interface with one parameter, the Task type.

HTTP GET /accepts?tasktype=jupyter-notebook

If the service does not accept jupyter-notebook Tasks, then it can simply reply with JSON response containing a reponseword of NO.

{
"reponseword": "NO"
}

If the service does accept jupyter-notebook Tasks then it replies with a simple JSON response containing a reponseword of YES along with details of how to execute the Task.

{
"reponseword": "YES",
"servicetype": "jupyter-hub",
"serviceinfo": {
    "endpoint": "http://jupyter.example.org/"
    }
}

The servicetype value tells the client what kind of service is available, and the serviceinfo element provides details of how to connect to it.

In this example, the tasktype=jupyter-notebook term in the request applies to a generic Juptyer notebook, as defined by the Jupyter project. In the response, the term servicetype=jupyter-hub refers to a JuptyerHub service, as defined by the Jupyter project.

In order to run a notebook in a JupyterHub Service, a client would need to know the endpoint URL of the Service, which is provided in the serviceinfo.endpoint element of the response. The client can use this endpoint URL to pass the notebook to the JuptyerHub Service and launch the Task.

Binder notebook

It is also possible to run a generic Juptyer notebook in a Binder service, as defined by the Binder project. In which case, a Binder service would reply with the following response.

{
"reponseword": "YES",
"servicetype": "binder-hub",
"serviceinfo": {
    "endpoint": "http://binder.example.org/"
    }
}

The response from the Binder Hub service is similar to the response from the JupyterHub service, but the meaning is slightly different. Declaring the servicetype as jupyter-hub or binder-hub in the response, tells the client what kind of service to expect at the endpoint URL. It is then up to the client to decide how to send the details of the notebook to the service based on the service type.

A BinderHub service can handle a more complex Task than just a generic Jupyter notebook. If the notebook comes as part of a git repository that contains a dependency file, such as requirements.txt or environment.yml then a BinderHub service can use the additional information to build a Docker container based on the requirements and deploy it in the BinderHub service.

If a Task requires these additional dependencies, then the client can use a different Task type in the request

HTTP GET /accepts?tasktype=binder-notebook

A generic JupyterHub services would not accept a binder-notebook Task, so it simply replies with JSON response containing a reponseword of NO.

{
"reponseword": "NO"
}

A BinderHub service would reply with a positive response, with the servicetype set to binder-hub and the serviceinfo.endpoint providing the endpoint URL to send the Task to.

{
"reponseword": "YES",
"servicetype": "binder-hub",
"serviceinfo": {
    "endpoint": "http://binder.example.org/"
    }
}

In this case, the client would pass a URL pointing to the Git repository containing the Jupyter notebook and the requirements.txt or environment.yml dependency files needed to build and run the Docker container.

ESAP notebook

In terms of the ESCAPE project there may be additional components beyond simply adding software dependencies that a Task may require. If a notebook requires access to data in the ESCAPE DataLake, then the notebook needs to be run on a platform that is co-located with a Rucio service, and be able to pass the appropriate authentication tokens into the notebook environment to enable it to access data in the DataLake.

In order to specify a Task that requires these additional components to be in place, we can define a new Task type, esap-notebook, which refers to a notebook Task defined by the ESCAPE ESAP project.

If a notebook requires access to data in the ESCAPE DataLake, then the client can use this new Task type to check if a Service supports this environment.

HTTP GET /accepts?tasktype=esap-notebook

In this case, the generic JupyterHub and BinderHub services would not understand the new Task type, and so would reply with a negative response.

{
"reponseword": "NO"
}

A service that can provide access to data in the ESCAPE DataLake and understands what this new Task type means would reply with a positive response.

{
"reponseword": "YES",
"servicetype": "binder-hub",
"serviceinfo": {
    "endpoint": "http://binder.example.eu/"
    }
}

Note that the Service type in the response is still binder-hub. This is because the available Service is a standard deployment of a BinderHub service, so the interface for using the Service is the same as the generic BinderHub.

The difference with this Service instance is that it is deployed within the ESCAPE network and co-located with a Rucio endpoint capable of providing access to the ESCAPE DataLake. This means that in addition to being able to run generic jupyter-notebook and binder-notebook Tasks, it is also capable of understanding and executing an esap-notebook Task.

Authentication

Different Service instances may have different criteria for who they will allow to execute Tasks on their Service.

In the case of a BinderHub Service, the BinderHub Federation provides a free service open to the public to use 1(https://mybinder.readthedocs.io/en/latest/about/about.html).

The EP service for a public access Service like this may accept any HTTP request, without authentication, and evaluates the Task criteria without reference to the user identity.

However, Services provided as part of the ESCAPE project may need to restrict their use to members of the ESCAPE project.

The EP service for a protected Service may use a variety of authentication methods to determine the identity of the client request. In this situation, the content and meaning of the responses from the EP service are as follows:

If the client is not authenticated, and the EP service allows anonymous requests, then it evaluates the Task criteria without reference to the client identity.
If the client is not authenticated, and the EP service requires authentication, then it may follow the OIDC sequence, redirecting the client to an appropriate OIDC authentication service, such as the ESCAPE IAM service.
If the client is authenticated, but the identity supplied is not authorized to access the EP service, then the EP service may reply with a HTTP 403 Forbidden response.
If the client is authenticated, and identity supplied is authorized to access the EP service, then the EP service evaluates the Task criteria using the authenticated identity.
If the client is authenticated, and identity supplied is authorized to execute the Task on the target Service, then the EP service replies with a positive response based on the Task criteria.
If the user is authenticated, but the identity supplied is not authorized to execute the Task on the target Service, then the EP service replies with a negative response.

The difference between the two "not authorized" responses is subtle but important.

A HTTP 403 response means the requesting identity is not allowed to use the EP service. They are not allowed to ask the question.
The EP service NO response means the requesting identity is allowed to ask, but the reply is NO, they are not able to run the Task.

In the simplest case the EP sercive may just reply with a NO. This simply says that the requesting identity can't execute the Task on the target Service, it doesn't need to say why.

{
"reponseword": "NO"
}

To be more informative, the EP service may supply additional detail in the optional repsonseinfo element.

{
"reponseword": "NO"
"reponseinfo": {
    "httpcode": 403
    "reason": "Not authorised"
    }
}

An EP NO response with reponseinfo.httpcode set to 403 means that the reason for the NO is that the identity supplied with the EP request is not authorised to run the Task on the target Service, and they would probably receive a HTTP 403 response from the target Service if they tried.