Integrating a service into the SeaDataCloud Virtual Research Environment - SeaDataCloud/Documentation GitHub Wiki

WIP!!!

Guidelines for service development

Basically there is two types of services:

(A) Services that consist of one container per user, interacting with the user with some HTML based GUI that is served by a web server in the container. These services can be deployed and managed using JupyterHub, which cares for the proxying, the SSL termination, the login, the session management ... - so that the service developer only needs to develop their service.

This is so far:

  • DIVA
  • ERDDAP

(B) Other services, for example services that are developed as a full web application, that have their own user and session management, that serve many users per instance, ...

This is so far:

  • ODV
  • NextCloud
  • Deltares-Visualisation (I think)
  • VLIZ-BioQC (I think)

If your service is a one-instance-per-user service and runs a GUI over HTTP, it is probably most simple to rely on JupyterHub for the management, as it takes care of so many things.

In both cases, certain best practices and rules have to be followed to ensure the service can interact with the Virtual Research Environment and is manageable by the platform operators.

General (A+B)

  • All services must be delivered as docker images.

Docker Setup (A+B)

Please follow these when you develop your Docker images.

docker build -t <yourservice>:2019xxyy .
docker build -t registry-sdc.argo.grnet.gr/<yourservice>:2019xxyy .
docker login registry-sdc.argo.grnet.gr
docker push registry-sdc.argo.grnet.gr/<yourservice>:2019xxyy
# this requires docker login! Creds available with Themis

Data access (A+B)

  • You can access the user's data at /home/jovyan/work/nextcloud/. Other paths (ending on /nextcloud) can be used too (by passing them into JupyterHub's environment as vaue USERDIR_INSIDE_CONTAINER - this has to be done by the admin, just tell them which path you need). Your process can also write data there. Any data written to somewhere else will be lost.
  • Note that the user's data is available to user with uid=1000, as a default, so ideally your service should also run as 1000 (otherwise, reading/writing may be a problem!). In some cases, this uid might be different, e.g. ODV. In doubt, chat with the admins to find out the current status.
  • (MAYBE IN THE FUTURE: You can access the static SeaDataNet products (read-only!) at "/some/path" [TODO]

URL of your service (A)

Environment variables (A)

JupyterHub passes these variables for you to use freely:

  • VRE_USERNAME (the user's username, a unique string, e.g. "alice")
  • VRE_DISPLAYNAME (the user's display name, e.g. "Alice Doe", as obtained from Marine ID service).
  • NB_UID and NB_GID: These are the uid and gid that own the user data. Their values are 1000 and 100, so your service should run as these. (They are passed as environment variable because in a perfect world, your service would be able to change its own uid/gid based on these. JupyterNotebooks can. If you cannot change them at runtime, make sure your image does run as uid 1000, and that the uid is easily changeable in your Dockerfile.)

Web Server / HTTP vs HTTPS / reverse proxying (A+B)

  • Please use plain http, as we will add reverse proxy with SSL termination
  • Make sure your service is able to run behind a reverse proxy, so for any links, redirects, forms etc. the correct hostname and protocol must be picked up by the application. This requires some configuration. Please test this using a nginx reverse proxy. Contact: Merret, Sebastian.

Monitoring and health (A+B)

GUI (A+B)

  • Please include the common header to your GUI's HTML (contact: Leo from IFREMER)
  • If you provide a logout button, design it in a way that it is clear to the user that they logout ONLY of this service, NOT out of the entire VRE.
  • FUTURE: Include a notification for the user about how many days their container will be still available. For this, and endpoint will be made available by the dashboard to provide this info.

Login/Authentication (only services not based on JupyterHub) (B only):

  • Please add a login mechanism to your service, and make sure that no user that is not logged in can access the service, especially if user data is being used by your service! (Most web frameworks offer this functionality, and you only have to specify how to check whether the credentials that a user presents are valid. In our case you need to verify a token. For this, see below).
  • You must accept POST requests to login a user. The way to reach your service from the dashboard is that a (invisible) form is sent with the action is to POST the user's username and token to your endpoint.
  • You must check the user's token by issuing a POST request to the dashboard endpoint <dashboard_url>/service_auth with the body data service_auth_token=<value>. The response is a simple JSON that contains a string true or false. (Contact for help: Sebastian). You must make sure that users who do not present a valid token cannot use the service (either the response from the dashboard was false, or a non-200 HTTP code), but return a 403 FORBIDDEN return code.
  • You receive the token by the POST request, variable name service_auth_token, which leads the user to your service.
  • If a user was not logged in correctly, do not forward/redirect them to your login page, as users should not be able to login into separate services. Hide your login page behind some URL for devs/admins/testing.

Example in PHP, using Guzzle

This code is present in the application's "Login" function. Note that the dashboard_url is an environment variable that you have in the container (in the docker-compose.yml).

       try{
            $client = new Client(); //GuzzleHttp\Client                                                                                                                                       
            $dashboard_url = getenv('dashboard_url');
            $url = $dashboard_url . '/service_auth';
            //Log::info($url);                                                                                                                                                                
            $auth_request = $client->request('POST', $url, [
                'form_params' => [
                    'service_auth_token' => $request->service_auth_token
                ]
            ]);
	    $response = $auth_request->getBody();
            //Log::info("service_auth = " . $response);                                                                                                                                       
	    if ($response != "true"){
                abort(401);
            }
        }
        catch (\GuzzleHttp\Exception\ClientException $e) {
	    abort(401);
            //Log::info("Guzzle Exception");                                                                                                                                                  
            //Log::info($e);                                                                                                                                                                  
	}
 

Health check (only services not based on JupyterHub) (B only):

What you need to report

In order to have your service integrated in the VRE, please report the following items to the VRE operators:

All services (A + B)

  • The image name of your service
  • Whether they need the user's data mounted, and where (admin: May have to set USERDIR_INSIDE_CONTAINER inside JupyterHub's docker-compose.yml)
  • Whether they need any static data mounted, and where to
  • How to set the UID under which your application will run. Ideally, your container reads the environment variables NB_UID and NB_GID to set the uid and gid of the user running the service. If not, the safest is to run it as uid=1000 and gid=100 and remember how to change these.
  • Is there anything that needs to have a backup? Any data that is not easy to recreate? Containers are ephemeral, so anything needing persistence needs to be explicitly configured.
  • Which port your service listens on, inside your container (admin: Must configure c.DockerSpawner.port=xxx OR c.DockerSpawner.container_port=xxx in jupyterhub_config.py)

Non-JupyterHub-Services (B only):

  • What is the URL where your service accepts POST-requests for logging in users
  • Any required env vars and configuration
⚠️ **GitHub.com Fallback** ⚠️