Integrating a service into the SeaDataCloud Virtual Research Environment - SeaDataCloud/Documentation GitHub Wiki
WIP!!!
Basically there is two types of services:
(A) Services that consist of one container per user, interacting with the user with some HTML based GUI that is served by a web server in the container. These services can be deployed and managed using JupyterHub, which cares for the proxying, the SSL termination, the login, the session management ... - so that the service developer only needs to develop their service.
This is so far:
- DIVA
- ERDDAP
(B) Other services, for example services that are developed as a full web application, that have their own user and session management, that serve many users per instance, ...
This is so far:
- ODV
- NextCloud
- Deltares-Visualisation (I think)
- VLIZ-BioQC (I think)
If your service is a one-instance-per-user service and runs a GUI over HTTP, it is probably most simple to rely on JupyterHub for the management, as it takes care of so many things.
In both cases, certain best practices and rules have to be followed to ensure the service can interact with the Virtual Research Environment and is manageable by the platform operators.
- All services must be delivered as docker images.
Please follow these when you develop your Docker images.
- Make sure you follow Docker's best practices for writing Dockerfiles: https://docs.docker.com/develop/develop-images/dockerfile_best-practices/
- Please make sure logs are written to stdout and stderr, so they are picked up by the docker daemon. Log files that are written to file will be lost and cannot be used later on for debugging/problem finding!
- Please make sure your service does not run as root (!!!), but ideally as a easily configurable uid (default: 1000). See https://medium.com/@mccode/processes-in-containers-should-not-run-as-root-2feae3f0df3b . (TODO: Need to care about permissions on written data)
- Please make sure that your containers gracefully handle SIGTERM (because
docker stop
issues aSIGTERM
and if that's not gracefully handled, it will think the application was unhealthy, and issue aSIGKILL
and notify your monitoring system). Please read https://success.docker.com/article/what-causes-a-container-to-exit-with-code-137 and https://www.ctl.io/developers/blog/post/gracefully-stopping-docker-containers/ - If you use large static datasets, please do not include them in the image, but only include dummy-data in the image and let us bind-mount the data from a file system location. Contact: Merret.
- Please use the SeaDataCloud Docker Registry at GRNET and push any images there, with a proper tag (not
latest
, but ideally consisting of / containing the date). Contact: Themis.
docker build -t <yourservice>:2019xxyy .
docker build -t registry-sdc.argo.grnet.gr/<yourservice>:2019xxyy .
docker login registry-sdc.argo.grnet.gr
docker push registry-sdc.argo.grnet.gr/<yourservice>:2019xxyy
# this requires docker login! Creds available with Themis
- You can access the user's data at
/home/jovyan/work/nextcloud/
. Other paths (ending on/nextcloud
) can be used too (by passing them into JupyterHub's environment as vaueUSERDIR_INSIDE_CONTAINER
- this has to be done by the admin, just tell them which path you need). Your process can also write data there. Any data written to somewhere else will be lost. - Note that the user's data is available to user with uid=1000, as a default, so ideally your service should also run as 1000 (otherwise, reading/writing may be a problem!). In some cases, this uid might be different, e.g. ODV. In doubt, chat with the admins to find out the current status.
- (MAYBE IN THE FUTURE: You can access the static SeaDataNet products (read-only!) at "/some/path" [TODO]
- If your service is deployed/spawned/managed by JupyterHub, it must listen inside its container on the path
0.0.0.0:<yourport>/user/<username>
, where you can read the username from the environment variableVRE_USERNAME
(same value asJUPYTERHUB_USERNAME
). You can choose your port, make sure you communicate it to the admins! - You can test the integration of your service yourself, see here: https://github.com/SeaDataCloud/Documentation/wiki/Integrating-a-service-into-the-SeaDataCloud-Virtual-Research-Environment
JupyterHub passes these variables for you to use freely:
- VRE_USERNAME (the user's username, a unique string, e.g. "alice")
- VRE_DISPLAYNAME (the user's display name, e.g. "Alice Doe", as obtained from Marine ID service).
- NB_UID and NB_GID: These are the uid and gid that own the user data. Their values are 1000 and 100, so your service should run as these. (They are passed as environment variable because in a perfect world, your service would be able to change its own uid/gid based on these. JupyterNotebooks can. If you cannot change them at runtime, make sure your image does run as uid 1000, and that the uid is easily changeable in your Dockerfile.)
- Please use plain http, as we will add reverse proxy with SSL termination
- Make sure your service is able to run behind a reverse proxy, so for any links, redirects, forms etc. the correct hostname and protocol must be picked up by the application. This requires some configuration. Please test this using a nginx reverse proxy. Contact: Merret, Sebastian.
- Please provide a proper monitoring probe (for the services to be monitored by GRNET's monitoring system). Contact: Themis.
- Please provide a proper docker-compose healthcheck, which can be run inside the container and uses only packages available inside the container. Please read https://howchoo.com/g/zwjhogrkywe/how-to-add-a-health-check-to-your-docker-container and https://blog.sixeyed.com/docker-healthchecks-why-not-to-use-curl-or-iwr/ .
- Please include the common header to your GUI's HTML (contact: Leo from IFREMER)
- If you provide a logout button, design it in a way that it is clear to the user that they logout ONLY of this service, NOT out of the entire VRE.
- FUTURE: Include a notification for the user about how many days their container will be still available. For this, and endpoint will be made available by the dashboard to provide this info.
- Please add a login mechanism to your service, and make sure that no user that is not logged in can access the service, especially if user data is being used by your service! (Most web frameworks offer this functionality, and you only have to specify how to check whether the credentials that a user presents are valid. In our case you need to verify a token. For this, see below).
- You must accept POST requests to login a user. The way to reach your service from the dashboard is that a (invisible) form is sent with the action is to POST the user's username and token to your endpoint.
- You must check the user's token by issuing a POST request to the dashboard endpoint
<dashboard_url>/service_auth
with the body dataservice_auth_token=<value>
. The response is a simple JSON that contains a stringtrue
orfalse
. (Contact for help: Sebastian). You must make sure that users who do not present a valid token cannot use the service (either the response from the dashboard wasfalse
, or a non-200 HTTP code), but return a 403 FORBIDDEN return code. - You receive the token by the POST request, variable name
service_auth_token
, which leads the user to your service. - If a user was not logged in correctly, do not forward/redirect them to your login page, as users should not be able to login into separate services. Hide your login page behind some URL for devs/admins/testing.
Example in PHP, using Guzzle
This code is present in the application's "Login" function. Note that the dashboard_url
is an environment variable that you have in the container (in the docker-compose.yml).
try{
$client = new Client(); //GuzzleHttp\Client
$dashboard_url = getenv('dashboard_url');
$url = $dashboard_url . '/service_auth';
//Log::info($url);
$auth_request = $client->request('POST', $url, [
'form_params' => [
'service_auth_token' => $request->service_auth_token
]
]);
$response = $auth_request->getBody();
//Log::info("service_auth = " . $response);
if ($response != "true"){
abort(401);
}
}
catch (\GuzzleHttp\Exception\ClientException $e) {
abort(401);
//Log::info("Guzzle Exception");
//Log::info($e);
}
- Please provide a healthcheck (one-line command or bash script that can run inside the container and have exit code 1 or 0) that we can use with docker-compose (Contact: Merret). Read this first: https://blog.sixeyed.com/docker-healthchecks-why-not-to-use-curl-or-iwr/
In order to have your service integrated in the VRE, please report the following items to the VRE operators:
All services (A + B)
- The image name of your service
- Whether they need the user's data mounted, and where (admin: May have to set
USERDIR_INSIDE_CONTAINER
inside JupyterHub's docker-compose.yml) - Whether they need any static data mounted, and where to
- How to set the UID under which your application will run. Ideally, your container reads the environment variables
NB_UID
andNB_GID
to set the uid and gid of the user running the service. If not, the safest is to run it as uid=1000 and gid=100 and remember how to change these. - Is there anything that needs to have a backup? Any data that is not easy to recreate? Containers are ephemeral, so anything needing persistence needs to be explicitly configured.
- Which port your service listens on, inside your container (admin: Must configure
c.DockerSpawner.port=xxx OR c.DockerSpawner.container_port=xxx
in jupyterhub_config.py)
Non-JupyterHub-Services (B only):
- What is the URL where your service accepts POST-requests for logging in users
- Any required env vars and configuration