docker compose deployment - 52North/ecmwf-dataset-crawl GitHub Wiki

Using docker-compose, the application can be set up straight-forward on a docker enabled host.

System requirements:

Docker engine v17.06 or later
6GB RAM (2 gigs for elasticsearch, 2 for the crawler, 1 for the rest. the first two would happily eat more, though configuration may be required)

We successfully deployed the application on a Debian 9 VM with 20GB of RAM running with docker 18.06-ce and docker-compose 1.19.0. Technically a distributed deployment (with elasticsearch or the crawler on another host) could be realized, but we didn't try that.

Example Setup

target=/var/lib/docker-compose/ecmwfcrawler
git clone https://github.com/52north/ecmwf-dataset-crawl $target
cd $target
vi .env # set up WEB_DOMAIN and API keys
docker-compose build
docker-compose up -d

Kibana does not start correctly for the first time with the required elasticsearch configuration. To set it up with a workaround:

comment out action.auto_create_index: false in elasticsearch/config/elasticsearch.yml
(re-)start the application and run a test crawl to populate the elasticsearch indexes
visit $WEB_DOMAIN/kibana and follow the kibana setup procedure described in README.md, while the crawl is running
reset the elasticsearch config and restart the elasticsearch container

Pitfalls

After making changes to .env, always restart the setup with --force-recreate
Don't run docker-compose with --abort-on-container-exit: The frontend container will exit immediately, shutting the whole setup down
When pulling a new version from the repo with changes to the frontend, remove the existing frontend volume first (otherwise the previous build will continue to be used):
```
docker rm ecmwf-dataset-crawl_frontend_1 ecmwf-dataset-crawl_proxy_1
docker volume rm ecmwf-dataset-crawl_frontend
```
When deploying the setup behind another reverse proxy
- configure WEB_DOMAIN to be the internal hostname with which the reverse proxy refers the service
- If you don't specify a port, 2015 will be selected
- You should be able to use a non root base URL (eg /crawler)

TODO: RAM optimizations for elasticsearch & crawler container, proxy set up