3b. Developing KnetMiner with Docker - Rothamsted/knetminer GitHub Wiki

This page is dedicated to Knetminer developers, willing to work with Docker for deploying and testing the application.

This might also be useful to address Knetminer running issues (see the troubleshooting section).

Here you find more documentation on the internal software organisation for Knetminer.

Table of contents generated with markdown-toc

The KnetMiner Docker architecture

Activity behind the scenes

KnetMiner, and consequently its Docker image, are based on two WAR applications: the web service (WS), which provides an HTTP/JSON based API, and the client, which is the user interface, based on a couple of technologies like JSP, Javascript, NPM packages.

Both the WS and the client WAR are built by the Aratiny project/modules. This is a KnetMiner reference/ test application, which is usually run against a small default dataset. In turn, Aratiny has dependencies from other more generic Maven modules and files (eg, client-base, ws-base).

The Docker container gets/writes dataset-specific files from/to a fixed location inside its file system, which, in the default case, is /root/knetminer-dataset/data. As explained above, this maps to the Docker host file system by means of Docker volumes.

One picture...

The figure below (Right click => Open Image to view the original size) summarises how the KnetMiner Docker container works, as explained above, which directories are used, either on the container file system or on the host, plus the mappings between each other.

Docker Architecture Diagram

Docker images

For practical reasons, we use two different Docker images to instantiate the final KnetMiner container.

As usually, the main container is defined by the Dockerfile image file. This builds what is needed to deploy our web applications (two .war files) on our Java web server (Tomcat).

This application image extends the Dockerfile-bare image. This pulls up the Tomcat container, which is essentially a Debian Linux environment with Java and Tomcat installed. It also deploys third-party dependencies that are needed by KnetMiner, e.g Python, used for maintenance scripts, some of those dependencies are only used for testing and development, not in production, yet we prefer to use the same image for both types of containers, for sake of simplicity.

The motivation for distributing the build of the final container into two different images is that this makes the development process more efficient. For instance, if one needs just to change a web application page, pushing such changes forward requires only the rebuild of the main container.

Building Docker images

We recommend that you use the docker-build-image.sh script to build and test a new Docker image for KnetMiner. This requires that you first download (git clone) our codebase. The script runs docker build commands, after having prepared the local environment (eg, Maven build with the right profile).

So, to build from scratch and from the master branch in our repository you should perform the following:

git clone https://github.com/Rothamsted/knetminer # Or another fork/branch
cd knetminer/docker
# Most of the time, you don't need to rebuild the bare image, so you can omit --bare
./docker-build-image.sh [--bare]
# The script doesn't push to github

This requires Bash and *NIX environment (tested on Linux and macOS). Windows WSL should work with it too.

If you rebuild the main image without touching the bare one, the latter is taken from our GitHub packages. As mentioned above, we auto-update our GH packages images automatically, upon changes on our github code base, although the corresponding rebuild might be delayed up to 24 hours.

Note that our image build script only builds images locally, it doesn't push them on GH packages. See our continuous integration scripts for an example of how to do that (you'll need proper GitHub credentials and authorisations).

Building with different image tags/versions

You can interact with the Docker tagging (i.e., versioning) mechanism to build different versions of KnetMiner images. For instance, the following builds the image with a 'mytag' version (instead of 'latest', which is the implied tag when nothing is specified), and also builds and uses the bare image with the 'mytage-bare' (if not specified, is the same as the image tag, 'latest' when none is given):

# Build the base image and assign it to the 'mytag' tag/version
./docker-build-image.sh --bare --tag mytag --tag-bare mytag-bare

Build and test KnetMiner out of Docker

As long as you install the necessary requirements (TODO), KnetMiner can be built on your working computer, without involving Docker at all.

A simple way to do so is to build and play with the already-mentioned Aratiny application. This is mainly useful to develop new application features (which apply to any dataset), test them and even adding unit tests that verify them (aratiny-ws already contains a few of them). Changes to this existing application can happen directly into its Maven modules, or on its dependencies (e.g client-base), depending on the type of change you're introducing.

WARNING: always consider that Aratiny is a reference application, anything you change here is reflected on the dataset-specific instances that Docker builds based on Aratiny.

Another useful thing is the manual-test scripts within the Aratiny application, which can be used to quickly launch a working server running the Aratiny applications and accessible from your browser. See the files on GitHub for details.

A third option is to build and run the WAR files in your system, using the same scripts that are used to build the Docker image, and to run the corresponding container. These are build-helper.sh and runtime-helper.sh. Indeed, these scripts are designed to work both with Docker and out of it. See their content for comments on how to use them. Moreover, have a look at this example in the codebase.

Warning: if you are a KnetMiner developer putting your hands on those scripts, always ensure the above portability isn't broken.

Quick running the test Knetminer

As said above, The Aratiny application is a reference/test application, which can be run by triggering two different Maven builds. The scripts mentioned above are shorthand to launch such builds.

If you need to speed up the build and launch time (eg, when running multiple cycles of code change + launch and manual test), you can trigger those builds manually, skipping the parts that you don't need.

For instance, suppose you're making changes in the UI, by modifying the Javascript code in the client-base module. A quicker-than-usual way to redeploy the test applications after such changes is:

Initially, run ./run-ws.sh and ./run-client.sh as above and test the UI against the aratiny test dataset (so, in two different terminal windows).
After you have modified a file in client-base, stop the client, cd to client-base, run 'mvn install', cd back to aratiny/manual-test and issue: mvn jetty:run. This is the same command in run-client.sh, but without the clean Maven step.

This way, you've rebuilt only client-base and Maven skips operations it doesn't need to repeat. Namely, it doesn't re-build and re-run the test API service (which takes much time), it only copies around files from client-base.

In Linux, switching back and forth between different directories can be quickly done using pushd/popd.

What parts of a Maven build you can skip during development depends on which component you're modifying and how this fits into the Knetminer architecture, see details about the latter above and here.

Troubleshooting

In this section, we show a few details that can be useful to debug and troubleshoot the Docker container for KnetMiner.

Cleaning

If you don't see recent changes to setting or data files, you might need to clean your host dataset directory, as explained above.

Access to the logs

There are two log outputs from the container that you might want to check. One is the standard output from Tomcat, which can be accessed from the host via a command like:

docker logs -f arabidopsis

where arabidopsis is the name you assigned to the container upon run.

The other interesting logs are in the Tomcat home. In particular, you might want to give a look to the KnetMiner WS log. This host command will show it live, until you stop with Control-C:

docker exec -it arabidopsis tail -f /opt/tomcat/logs/ws.log

The main log file doesn't contain certain details (eg, exception stack traces). These are available on /usr/local/tomcat/logs/ws-details.log.

Another thing that might be useful is mapping the Tomcat logs directory to some location on the host, for example:

export DOCKER_OPTS='-it --volume /home/knetminer-logs:/opt/tomcat/logs'
./docker-run.sh ...

In particular, this is what you might need to ease the access to tools like web log analysers. Log files are under rolling policy, so you don't need to configure that.

Access to the Tomcat manager

The Docker build command for the KnetMiner image can be invoked with one parameter, a password for accessing the Tomcat Manager web application, which can be useful to know if our WARs were started, as well as operations like quick deployment of a new WAR under a running Tomcat/container.

This can be done with a command like:

export DOCKER_OPTS="--build-arg TOMCAT_PASSWORD='foo123'"
./docker-build-image.sh

Remember to NOT use easy passwords for production images! These are accessible via the web. Ideally, don't enable the manager in production.

Invoking the API

If the client tells messages like "server is offline", it might useful to check if the WS application is responding. This has its own URL. For instance, if the container is mapped on the 8080 port and you started the default dataset (i.e. aratiny), you can type this URL in your browser:

http://localhost:8080/ws/aratiny/countHits?keyword=seed

And you should get back some JSON showing how many results correspond to that keyword.

When using other datasets, you've to type the internal dataset identifier in place of /aratiny/ (eg, http://localhost:8080/ws/araknet/countHits?keyword=seed). This ID is defined in the config.yml file discussed above.

Enabling debugging and monitoring options

A KnetMiner container can be debugged using the standard Java Virtual Machine debugging server. Our launch examples file has instances showing what to do on the container side, that is, setup the JVM debugging options:

export JAVA_TOOL_OPTIONS="$JAVA_TOOL_OPTIONS -Xdebug -Xnoagent
  -Djava.compiler=NONE
  -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=5005"

These enable the debugger in the JVM that runs KnetMiner inside the container (ie, its Tomcat), which will be reachable on the TCP port 5005. Since you're inside Docker, you also need to open that port to the outside:

# 'docker run' defaults (-it) need to be mentioned explicitly if you redefine this variable
export DOCKER_OPTS="-it -p 5005:5005"

Now you can use a debugger client, ie, your IDE, and connect it to localhost:5005. This is how to do it in Eclipse, other IDEs should work pretty much the same.

Of course, 'localhost' assumes your container is running locally, if not, you need to either specify your Docker server address, or extra-steps like SSH tunnelling.

In order for this to be useful, your IDE will need to see the KnetMiner code source. For doing so, import our Maven multi-module project from GitHub. You might also need other dependencies, in particular Ondex, the software we use to manage KnetMiner knowledge graphs and related operations.

In addition to the debugger, you can profile the behaviour and performance of KnetMiner (eg, why a process is slow or if there are memory leaks), by using the JMX interfaces. For instance, jvisualvm is a client application that works with this standard. Enabling JMX in KnetMiner works pretty much like for the debugger:

# Of course, you can pass both the debugger and JMX options to enable both, and the same goes 
# for DOCKER_OPTS
#
export JAVA_TOOL_OPTIONS="-Dcom.sun.management.jmxremote.ssl=false
	-Dcom.sun.management.jmxremote.authenticate=false
 	-Dcom.sun.management.jmxremote.port=9010
 	-Dcom.sun.management.jmxremote.rmi.port=9011
 	-Djava.rmi.server.hostname=localhost
 	-Dcom.sun.management.jmxremote.local.only=false"

export DOCKER_OPTS="-it -p 9010:9010 -p 9011:9011"

As you can see, the JMX ports are enabled in the container's JVM and then they're exposed to the outside. As said above, you might need further layers, like SSH tunnels, in order to connect a remote container.