Docker - earthlab/earth-lab-operations GitHub Wiki

This page gives an overview of the container infrastructure provided by the Analytics Hub via Docker (a specific type of container).

Docker 101

What is Docker?

Docker is a containerization technology that bundles up operating systems and other software that can run on a variety of machines. For example, you can bundle up the Ubuntu operating system with Python, Jupyter, and other python libraries that can run an identical environment on a local computer or in the cloud.

Docker images are templates that specify the instructions needed to build a container, which is the actual environment that has been created that is running on a computer. This is analogous to the distinction between a class definition and an object (an instance of the class) in object-oriented programming.

Why use Docker?

We use Docker to get the same computational environment running in multiple places. This is useful to avoid the "it works on my computer" issues that arise when collaborating with others, and also useful even when working alone to ensure that you can use the same set of packages locally or in the cloud if you need to scale up to a larger compute infrastructure.

Using Docker

For installation instructions see: https://docs.docker.com/install/

For a high level overview and "getting started" type of material see: https://docs.docker.com/get-started/

Running RStudio

Earth Lab maintains RStudio Docker images based on the excellent Rocker project. For example, to launch our r-spatial-aws image that has RStudio, a bunch of spatial packages, and the AWS command line interface, follow the instructions on this page: https://hub.docker.com/r/earthlab/r-spatial-aws

For example, if you are connected to an EC2 instance via SSH, run the following command from your terminal:

docker run -e PASSWORD=<<insert your password here>> -d -p 8787:8787 earthlab/r-spatial-aws

Then copy your EC2 instance's public DNS address and append :8787 to the end in your web browser, e.g., navigating to the page ec2-34-217-71-152.us-west-2.compute.amazonaws.com:8787, replacing ec2-34-217-71-152.us-west-2.compute.amazonaws.com with your instance's address.

Running a Jupyter notebook server

To run a Jupyter notebook server with conda and a bunch of spatial packages, you can check out the earth-analytics-python-env Docker image: https://cloud.docker.com/u/earthlab/repository/docker/earthlab/earth-analytics-python-env

Integrating Docker into your Workflows and Publishing to Docker Hub using Github Actions

You can find out how to setup secrets to authenticate with docker here: https://medium.com/platformer-blog/lets-publish-a-docker-image-to-docker-hub-using-a-github-action-f0b17e5cceb3

and here is how to push an image to Docker Hub from the Main Branch: https://github.com/earthlab/r-python-eds-lessons-env/blob/main/.github/workflows/build-push-image.yml

The way it is setup:

  • The secrets are in our repo vs the organization. Secrets are the login credentials for pushing to the repo
  • Right now you can see two actions. One is build-push-image.yml - that pushes to docker hub anytime someone commits to the MAIN branch. The build-image.yml file just tests the image so if someone opens a pull request you can see if the image will break or not.
  • Best practice would be an action that pushes when you create a tagged release. This will really make it easy to tag only when you want to push a new image.

Trainings

Research Computing holds trainings, Docker, and singularity/apptainer. Find them Here

Earth Lab Dockers that need maintenance