Introduction to Docker - clizarraga-UAD7/Workshops GitHub Wiki

A brief introduction to Docker Containers

🚧

Docker Logo


What is Docker? đŸŗ

Docker, a platform based service (PaaS) uses OS-level virtualization to deliver software in packages called containers.

In other words, Docker is a platform used to containerize your software. With it, you can build your application, packaging it with all dependencies required for your application into a container. These containers can then be easily shipped to run on other machines

Representation of Docker Architecture

The Docker software as a service consists of three components:

Software: The Docker Engine includes:

  • The Docker daemon, called dockerd, which is a process that manages Docker containers and handles container objects. The daemon listens for requests sent via the Docker Engine API.
  • The Docker client program, called docker, provides a command-line interface (CLI), that allows users to interact with Docker daemons.
📝 Note: (click to open)

There is a Docker Engine called Docker Desktop, available for MacOS/Windows/Linux that includes a Docker daemon and Docker client (CLI) and other tools to run locally on your machine.

Objects:

Docker objects are various entities used to assemble an application in Docker. Objects are of three classes:

  • A Docker container is a standardized, encapsulated environment that runs applications, and is managed using the Docker API or CLI.
  • A Docker image is a read-only template used to build containers used to store and ship applications.
  • A Docker service allows containers to be scaled across multiple Docker daemons, resulting in what is known as a Docker swarm, a set of cooperating daemons that communicate through the Docker API.

An important distinction is between base and child images.

  • A base image is an images that has no parent image, usually are images with some OS version installed (busybox, alpine, ubuntu, centos, amazonlinux, debian, etc.)
  • A child image is build on a base image with some extra functionality integrated.

Then we can find official and user images.

  • Official images are maintained and supported by the staff at Docker.
  • User images are images build on base images with extra functionalities, created and shared by general users. These images can be identified as user/image-name. You can find certified users and general users.

Registries: A Docker registry is a repository for Docker images.

  • Docker clients connect to registries to download ("pull") images for use or upload ("push") images that they have built.
  • Container registries can be public or private. Two main public registries are Docker Hub, and Gitlab Registry. Docker Hub is the default registry where Docker looks for images.

Docker images at Docker Hub đŸŗ

Docker Hub (https://hub.docker.com), is the official repository for images.

Some popular images are:

  • Hello World. Used for testing your Docker Engine installation. (To download it type: docker pull hello-world).
  • Alpine. It is a minimal Linux image less than 5MB in size. (To download it type: docker pull alpine).
  • Ubuntu. Is an Ubuntu Linux distribution. (To download: docker pull ubuntu)
  • rocker/rstudio. RStudio image. (To download: docker pull rocker/rstudio)
  • jupyter/datascience-notebook. Jupyter Notebook Data Science Stack. (To download: docker pull jupyter/datascience-notebook)
  • pangeo/pangeo-notebook. Pangeo big data geosciences. (To download: docker pull pangeo/pangeo-notebook).

Getting Started with Docker đŸŗ

ℹī¸ First, you need to open a user account on Docker.com.

Next, to use Docker, you need to have either installed Docker Desktop on your machine or have access to a Github Codespaces developing environment in an Organization Github.

Docker Desktop will install all docker tools for container development and deployment. It provides with all needed software for running containers in our local machine.

It is a good practice, to also have an integrated code development environment VS Code Editor installed in your computer, that allows you to easily synchronize files with Github repositories. Please install it on your machine.

Aside that you can develop code using VS Code, you can also add a collection of extensions to integrate Container development and deployment (Docker, Kubernetes, Google Cloud, Azure and others), work in Data Science (Python, Jupyter Notebooks, PyTorch, Azure ML and more).

We will assume that you have your environment ready to start working with Docker.

Exercise 1. Testing your Docker environment with the hello-world docker image.

Open VSCode, and start a Terminal.

You can do a simple test, running the hello-world docker application.

docker run hello-world

The docker API will download the hello-world latest image and run it as a container and you should be getting back a message as a result of the action

Hello from Docker!
This message shows that your installation appears to be working correctly. ...

And it explains all the processes that were involved in printing the Hello World message to your Terminal.

Next, we can enter the command: docker ps -a and the docker system will show a log history of containers that have been executed. Of the returned information, we need to note the CONTAINER_ID. We can clean these cache memory by executing the command docker rm CONTAINER_ID, it is sufficient to enter the first 3 unique characters of the CONTAINER_ID, we do not need to enter the full ID.

Basic Docker commands đŸ’ģ

The main docker command option is --help: docker --help

Initial commands:

Command Description
docker --help List all Docker command options
docker create IMAGE_NAME Searches Docker Hub for that image, downloads it to your system and creates a stopped container.
docker run [Options] IMAGE_NAME If image is not found, will search Docker Hub, download it and run it.
docker rename CONTAINER NEW_NAME Rename a container.
docker search TERM Searches Docker Hub for images.

Container and Image manipulation:

Command Description
docker container --help List Docker container options
docker container ls List containers
docker ps List the running containers
docker ps -a Lists all active containers status
docker container rm CONTAINER_ID or docker rm CONTAINER_ID Removes a container by ID
docker image --help List Docker image options
docker image ls Lists available local static docker images
docker image rm IMAGE_ID or docker rmi IMAGE_ID Removes a specific static docker image

From the terminal you can list the running containers by typing: docker ps -a or docker container ls. These commands will return information of the running containers (CONTAINER ID, IMAGE, COMMAND, CREATED, STATUS, PORT, NAME). The CONTAINER ID and IMAGE will be used as an argument for other docker commands. The CONTAINER_ID and NAME tags, change every run.

📝 Note: (click to open) (We can substitute the full `CONTAINER_ID` or `IMAGE_ID` string, with the first 3 or 4 unique characters of the ID)

Docker Start/Stop/Restart/Pause/Unpause CONTAINER_ID:.

Command Description
docker start CONTAINER_ID Starts a stopped container
docker stop CONTAINER_ID Stops a running container
docker restart CONTAINER_ID Restarts a stopped container
docker pause CONTAINER_ID Pauses a running container
docker unpause CONTAINER_ID Resumes a paused container

Docker Volumes:

Command Description
docker volume --help List Docker volume options
docker volume ls List available volumes
docker volume create _myvol_ Create a local volume named _myvol_
docker volume inspect _myvol_ Returns volume general description
docker volume rm _myvol_ Removes the specific volume

List of instructions of Dockerfile

If we start with a Docker image base, and we would like to customize it by adding some additional packages to fit our needs, the we need to configure a Dockerfile to build a new Docker image. The Dockerfile is a set of line instructions and does not have any file extension.

Instruction Description
FROM Initializes a new build stage and sets the base image.
ARG Defines a variable that users can pass at build-time to the builder with the docker build command. The ARG variable can be used before RUN to pass a default value.
ARG VERSION=latest
FROM base:${VERSION}
RUN Executes any command in a new layer on top of the current image and commits the results.
RUN <command>. The command runs in a shell.
RUN ["executable", "param1", "param2"]. The exec form.
CMD Also has 3 forms:
CMD ["executable", "param1", "param2"]. The exec form (preferable).
CMD ["param1", "param2"]. As default parameters to ENTRYPOINT.
CMD command param1 param2. The shell form.
LABEL Adds metadata to an image.
EXPOSE Informs Docker that the container listens on the specified network ports at runtime. The port number must be included in the docker run -p 80:80
ENV Sets an environment variable value.
ADD Copies new files, directories or remote file and adds them to the filesystem of the image at the path.
COPY Copies new files or directories from <src> and adds them to the filesystem of the image at the path <dest>.
Has 2 forms:
COPY <src> ... <dest>
COPY ["<src>",...,"<dest>"]
ENTRYPOINT has 2 forms:
ENTRYPOINT ["executable", "param1", "param2"]. The exec form (preferable)
ENTRYPOINT command param1 param2. The shell form.
You can override the default value with --entrypoint and an executable command.
VOLUME Creates a mount point for exterior mounts.
Format can be VOLUME ["/home/user"] or VOLUME /home/user.
USER Sets the user name (or UID) to use when running the image and for any RUN, CMD, and ENTRYPOINT instructions that follows in the Dockerfile.
WORKDIR Sets the working directory path for any RUN, CMD, ENTRYPOINT, COPY, and ADD instructions followed in the Dockerfile.

Exercise 2. Running an Alpine Linux docker image.

Next we will run a minimal Docker image based on Alpine Linux named alpine, which is only 5 MB in size. 🐧

Let's create a local volume called myvol.

docker volume create myvol

And type docker volume ls and find out what docker volume inspect myvol.

The local volumes can be assigned to a docker container using the -v myvol:/tmp option, where myvol will track changes in docker container /tmp directory.

Run

docker run -it --rm -v myvol:/tmp --name AlpineLinux alpine

This command will download the latest docker image of Alpine Linux, and run a docker container, where we have introduced the docker options:

  • -it, which runs ion interactive mode inside the terminal.
  • The --rm option tells the docker CLI to remove the cached image from memory when we finish.
  • The -v myvol:/tmp assigns equivalency between my volume myvol and the /tmp directory in the docker container.
  • The -name option assigns a specific static name AlpineLinux to our running docker container. If we don't specify a name, the docker system will assign one in a random fashion every run.

Next, explore the Alpine container doing the following:

  • Use the apk update command, to update the available packages list.
  • If you want to use the nano editor, you will find that it is not installed. Use apk add nano to install it.
  • Change directory to /tmp and edit a sample.txt text file and save it.

Unfortunately when the Alpine docker container stops, we will loose all of our work. We need to find the way of importing and saving our work in an external work directory, available from the docker container.

To save a copy of the file we created inside the Alpine docker container, we can use the docker container cp command to copy files/folders between a container and the local filesystem.

docker container cp AlpineLinux:/tmp/sample.txt .

Will copy the file sample.txt from the /tmp directory of docker running container into the present working directory . in the terminal you are working. The copy command works in both directions to get information into the docker container or out of it.

Once you finish using this container, from a terminal enter docker ps -a to find the CONTAINER_ID, then enter docker stop CONTAINER_ID. Remember you can stop it, pause/unpause or restart later.

Customizing a Docker image đŸŗ

Say, we want to enhance out Alpine Linux base image and add an editor and also being able to compile code in C. So, we proceed to add a nano editor and an essential C developer kit.

Create/Edit a file named Dockerfile in one of your directories.

# The base image
FROM alpine:latest

LABEL author="your-name" 
LABEL email="your@email-address"
LABEL version="v1.0"
LABEL description="This is your first Dockerfile"
LABEL date_created="2022-05-10"

# Install dev environment (editors & gcc compilers)
RUN apk update && \
    apk add nano && \
    apk add alpine-sdk

Then we can build a new customized Alpine Linux for software development, using the following command

docker build -t linux/alpine-sdk:latest .

The -t flag option is the tag name linux/alpine-sdk:latest for the customized docker image. The last . in the above command tells it the location of the Dockerfile, in this case is the present working directory.

After running the above command, we can see that now we have a new docker image in our list.

docker image list

Now we can run the new docker image and test it.

docker run -it --rm \
  --name alpine-sdk -v myvol:/home/src \
  linux/alpine-sdk:latest

In your Docker container change directory to /home/src. Edit/copy the usual Hello World! in C (hello.c), using the nano editor:

// Simple C program to display "Hello World"
  
// Header file for input output functions
#include <stdio.h>
  
// main function -
// where the execution of program begins
int main()
{
  
    // prints hello world
    printf("Hello World! \n");
  
    return 0;
}

Then compile and run it.

gcc -o hello hello.c
./hello

and see if your program worked.

Exercise 3. Running a RStudio Server.

If we use RStudio for data analysis, then the Docker image rocker/rstudio can be used.

To run it, from a terminal we enter the following command:

docker run -it --rm \ 
   -v $(pwd):/home/rstudio -e PASSWORD=rs_rocks \ 
   -p 8787:8787 rocker/rstudio:latest

Where the options we have used are:

  • -it, it keeps the process running on the used terminal, where all the process log is being received.
  • --rm, will delete the container image after it stops.
  • -v "$(pwd)":/home/rstudio, will link the present working directory of the terminal running the process and the rstudio container directory /home/rstudio.
  • -e PASSWORD=rs_rocks, we are setting a login password for the default user rstudio.
  • -p 8787:8787 are the internal and external ports to connect via the browser.

We then connect via browser to the RStudio Docker container landing page: http://localhost:8787

To login use username rstudio and the set password rs_rocks.

Since we mapped our local present working directory to the /home/rstudio directory, all our R scripts can be accessed from there. Any saved edit will be saved in our local directory.

We are all set.

To stop the RStudio session, we need to save all our work from inside the RStudio container and then quit our session. Then use the command docker stop CONTAINER_ID to stop it.

See more information and R Docker Images options in The Rocker Project. 🚀

Exercise 4. Running a Jupyter Notebook.

Next, we show how to start a personal Jupyter Lab Notebook server in a local Docker container running the jupyter/datascience-notebook Docker image.

 docker run -it --rm \
   -v "${PWD}":/home/jovyan/work \
   -p 8888:8888 jupyter/datascience-notebook 

Where the following options are used:

  • -it, it keeps the process running on the used terminal, where all the process log is being received.
  • --rm, will delete the container image after it stops.
  • -v "${PWD}":/home/jovyan/work, will link the present working directory of the terminal running the process and the container directory /home/jovyan/work, which will appear on the Files section of the Jupyter Notebook.
  • -p 8888:8888 refers to the numeric port mapping, -p External:Internal. The external port number will be used to connect to the local machine running the container via http://127.0.0.1:8888/lab?token=TOKEN_ID.

By running the above command, the terminal will receive all messages off the running container. Look for lines similar to the following, that returns the instructions how to access your container.

To access the server, open this file in a browser:
        file:///home/jovyan/.local/share/jupyter/runtime/jpserver-7-open.html
    Or copy and paste one of these URLs:
        http://43ec85338263:8888/lab?token=e53fd4dfeaca90cc78df58d1bde9bfb941f94041052eb5f6
     or http://127.0.0.1:8888/lab?token=e53fd4dfeaca90cc78df58d1bde9bfb941f94041052eb5f6

Copy the last line and copy it into a Web Browser Tab, and you are ready to start working.

You can copy a Jupyter Notebook into the working directory where your terminal is running, and it will show inside the Jupyter container, since the local directory is mapped to the working directory in the Jupyter Notebook. All changes made will be saved to the local machine.

To stop the Jupyter Notebook container, you can use the usual way of exiting by doing twice Ctrl-C to shut down the kernel, or you can use the standard docker stop CONTAINER_ID.

You can read more information about how to use this Docker image at Jupyter Docker Stacks


Basic References đŸŗ

Official Docs

Supplementary

⚠ī¸ **GitHub.com Fallback** ⚠ī¸