Docker - ttulka/technologies GitHub Wiki

Docker is a standard for Linux containers.

  • own process space
  • own network interface
  • run processes as root (inside the container)
  • own disk space (can share with host too)
  • run in isolation (using linux kernel feature namespaces)

Docker Image - The representation of a Docker Container. Kind of like a JAR or WAR file in Java.

Docker Container - The standard runtime of Docker. Effectively a deployed and running Docker Image. Like a Spring Boot Executable JAR.

Docker Engine - The code which manages Docker stuff. Creates and runs Docker Containers.

Image Layer - Images are built in layers.

  • Layers receive an ID, calculated via a SHA256 hash of the layer contents.
  • Tags are references to IDs.
  • The format of the full tag name is: [REGISTRYHOST/] [USERNAME/]NAME[:TAG]

Dockerfile - a recipe for an image.

Docker Image to a Container is like a Java class and an instance. Dockerfile is like a .java source file.

Docker Compose to coordinate multiple Docker containers.

Docker Swarm orchestrates Docker containers over multiple servers (cloud). Alternative to Kubernetes, Openshift, Mesos.

Docker CLI

CLI communicates with Docker Engine deamon via REST.

  • DOCKER_HIDE_LEGACY_COMMANDS=true to hide legacy commands from cli help.

The docker command-line tool and dockerd deamon talk to each other over network sockets. Docker has registered two ports with IANA: TCP ports 2375 for unencrypted traffic and 2376 for encrypted SSL connection.

On a Mac or Windows a Linux virtual machine is neeeded in order to run dockerd server. On Linux dockerd can be run natively.

Execute a command inside a running container

docker container ls    # show running containers
docker container exec c1 echo Hello   # executes 'echo Hello' inside c1
docker container exec -it c1 sh       # executes an interactive command
^D

Detach from a container and attach

You can only detach from a container if you started it with the -it flags.

docker run -it --rm --name c1 alpine sh
^P^Q
docker container attach c1

Log from a container

Containers are using stdout/stderr (/dev/stdout and /dev/stderr) for its logging.

docker container logs c1
docker container logs -f c1    # follow the output

Redirect a log file into the stdout by linking in the Dockerfile:

RUN rm /var/log/myapp.log && ln -s /dev/stdout /var/log/myapp.log

Stop a running container

Sending ^C (SIGINT) to the Docker client and the client forwards it into the process with PID 1. The container terminates if and only if the process with PID 1 terminates.

docker container ls    # show running containers
docker container stop c1  # stops gracefully (SIGTERM first and then a SIGKILL)
docker container kill c1  # sends SIGKILL or specified signal

When running a custom script as CMD, you can make bash to run a command not as a new process, otherwise it does not stop (as bash doesn't forward signals):

#!/bin/sh
exec myapp   # replace the current process with the myapp

Generally saying, make sure that the right application becomes the process with the PID 1 and will receive signals.

In Dockerfile STOPSIGNAL specifies the signal that will be send to the PID 1 when the container is stopped:

STOPSIGNAL SIGQUIT

Bind mount to a container

Paths must be absolute.

docker run -v /home/ttulka/html:/var/www/html:ro ...   # local:container read-only (ro)
docker run -v $(pwd)/html:/var/www/html:ro ...   # within the working directory
docker run --mount type=bind,src="$(pwd)/html",dst=/var/www/html,readonly ... 

Volumes

Binds are dependent on the filesystem of the host, volumes are completely managed by Docker.

docker run -it --volume /data alpine
$ df -h
/
/data
...
^P^Q
docker volume ls   # show volumes included the just created anonymous volume.
  • Docker will automatically delete anonymous volumes that were created for containers started with the --rm flag.
  • When mounting a new volume to a path that already contains files and directories, those will be copied into the volume.
  • When mounting an existing volume to a path that already contains files and directories, the volume will contain only the data in the volume, not the pre-existing data.

Create a named volume inside a container mapped to /data:

docker run -it -v my-volume:/data alpine
docker run -it --mount type=volume,src=my-volume,dst=/data alpine

Starting a new container with an existing volume can access (and share) the persisted data in the volume with the container.

docker volume prune    # delete all unused volumes

A container can use volumes from another and access data simultaneously:

docker run --volumes-from c1 ...

Anonymous volume is created when a container based on the Dockerfile is created without mounting a volume:

# create a volume in the Dockerfile
VOLUME /var/lib/myapp/data
  • Use volumes for sensitive data.
  • Use volumes for persisting the data.
  • Use volumes for sharing data between containers (backups, configurations).

Network between containers

Working with networks over links the container resolves the IPs automatically using a DNS server (on 127.0.0.11).

  • You can use links to add an alias of a container (--link name:alias).
docker network ls               # show available networks, 'bridge' is default
docker network create testnet   # creates a 'testnet' network
docker network inspect testnet  # shows info about the network
# runs 'c1' container as a deamon
docker run -d --rm --name c1 --network testnet alpine tail -f /dev/null    
# runs shell of a 'c2' container
docker run --rm -it --name c2 --network testnet alpine sh    
$ ping c1    # c1 is a known network node now
^C^D
docker container stop c1

A container can be run without the whole default networking configuration on a per-container basis with the --net=host command-line switch.

  • Might be needed for high-throughput applications.

Pushing an image into a repository

docker login
docker image tag myimg:latest ttulka/myimg:latest    # tag the image
docker image push ttulka/myimg:latest

Setting environment variables

# changes the prompt text to host and current working path
docker run -it -e "PS1=\h:\w# " alpine sh
b305c9a948f5:/# cd /etc/opt/
b305c9a948f5:/etc/opt#
# setting an env var in the Dockerfile, could be overwritten with -e flag
ENV PS1 "\h:\w# "
ENV PS2 ">> "
# alternatively:
ENV PS1="\h:\w# " PS2=">> "
# shows the current env vars
docker run alpine env
# sets env vars from a local file
docker run --env-file app.conf alpine env

By setting an env var via the -e flag without a value Docker will look for an environment variable with the same name locally and use the value of that environment variable if it exists, otherwise will not be set:

docker run -e MYVAR alpine 

Container resources

docker container stats    # shows statistics about running containers
dcoker container run --memory 256M --cpus 0.25 ... # limits resources

Dockerfile

Layers represent filesystem changes. Each instruction COPY, ADD and RUN resolves in a different layer. CMD and others only add metadata.

docker image inspect myimage   # shows image's layers

Image size can be optimized by dropping all ghost layers:

docker build --squash .

Image arguments

ARG variables are only available in the build-time. Use ENV to be available in the run-time.

ARG MYARG1=Docker    # 'Docker' is the default value
ARG MYARG2           # no default value

ENV MYVAR1=$MYARG1      # resolves MYARG1 as the env var value
ENV MYVAR1=${MYARG1}_x  # explicit boundaries for the variable name
ENV MYVAR2=${MYARG2:-Mydef}  # explicit default value if MYARG2 not set
ENV MYVAR3=${MYARG2:+true}   # explicit value if MYARG2 is set
docker build -t myimage --build-arg MYARG=Abc .

Copying files into the image

Golang wild cars could be used.

  • COPY just copies files. Use this as default.
  • ADD behaves as COPY with additional functionality:
    • Extracts .tar and tar.gz archives
    • Downloads files if defines as a URL

The source can be relative to the build path of of the docker image build command.

Execute a command

  • CMD defines the default command for the container run. It could be overwritten by execution.
  • ENTRYPOINT executes always, even when a custom execution is set.

The CMD instruction will be passed to the ENTRYPOINT as arguments if the ENTRYPOINT uses the exec form

FROM alpine

COPY setup.sh /
COPY start.sh /

RUN chmod +x /setup.sh && chmod +x /start.sh

ENTRYPOINT ["./setup.sh"]   # executes always
CMD ["./start.sh"]          # default, used if no custom command provided
#!/bin/sh
# setup.sh
echo "setting up..."
exec "$@"    # run a custom command as the main process
#!/bin/sh
# start.sh
echo "starting..."
docker run -t myimage
> setting up...
> starting...
docker run -t myimage echo Hello    # with a custom command
> setting up...
> Hello

You can use ENTRYPOINT to simulate a command:

FROM alpine
RUN apk update && apk add curl
ENTRYPOINT ["/usr/bin/curl"]
CMD ["--help"]
docker run mycurl https://www.docker.com   # use a container as a command

.dockerignore

A list (new line separated) of files and folders not to include into the image.

Publishing ports

EXPOSE instruction documents ports the service offers.

# publishes image ports 80 and 81 into local ports 127.0.0.1:8080 and 0.0.0.0:8081
docker run -p 8080:80 -p 127.0.0.1:8081:81 ...  
# publishes ranges 80-81 
docker container run -p 80-81:80-81 ...
# publishes the port 80 and assign a random local port
docker run -p 80 ...
# publishes all exposed ports to random local ports
docker run -P ...

Running as a specific user

RUN adduser -u 1000 -D myuser    # add a new user
USER myuser

The user is used only for execution commands like RUN or CMD, the copied files don't change their owner, this must be done on the application level.

A new user inside a container is not a new user on the Docker host's main namespace and vice versa. But UID 0 (root) in a user namespace is not the same thing as UID 0 on the host, although running as root inside the container does increase the risk of potential security exploits.

The root user in the container is actually the root user on the system with some extra contrains.

Unless otherwise configured, Docker starts all services in container as root.

A container is not a substitute for good security practices: If the application would traditionally run as nonprivileged user on a server, then it should be run in the same manner inside the container.

Multi-Step Builds

We can use multiple base-image in a Dockerfile when we need a different image for a preparing phase of the image building and don't need all dependencies from those images in the final image.

Example: Final image dumps everything from the builder image:

FROM node:alpine as builder
WORKDIR /app
COPY package.json
RUN npm install
COPY . .
RUN npm run builder

FROM nginx
COPY --from=builder /app/build /usr/share/nginx/html

Docker Compose

For testing and development purposes.

# docker-compose.yml
version: '3.8'

services:
  webserver:
    image: nginx:lastest
    ports:
      - 80:80
    volumes:
      ./html:/usr/share/nginx/html
  database:
   image: postgres:9.6-alpine
   volumes: 
     - pg-data:/var/lib/postgres/data
  app:
    image: ttulka/myapp1
    build: 
      context: ./myapp1 
    ports:
      - 8080:8080
    depends_on:
      - webserver
      - database

volumes:    # just for named volumes
  pg-data:
docker-compose up -d --build --scale app=3
docker-compose ps
docker-compose down
docker-compose volume rm composeexample_pg-data
  • will automatically tag the image with the name specified in the Compose file.
  • will take care of creating a network alias app for every container of the app service.
docker-compose run --rm myapp mvn test    # run a command inside the container

Kubernetes

System for running many different containers over multiple different machines.

  • Expects all images already be built.
  • We have to manually set all networking.

Everything in Kubernetes is a desclarative configuration object that represents the desired state of the system. The service takes actions to ensure the desired state becomes the actual state.

Kinds of objects:

  • Pod: Groups of one or more closely related containers. Basic kind of object, the smallest thing to deploy. A grouping of one or more containers that must run together.
  • Service: Sets up networking in a Kubernetes cluster, load-balancing, naming, discovery.
    • ClusterIP
    • NodePort: Exposes a container to the outside world (DEV purposes only!).
    • LoadBalancer
    • Ingress
  • Namespace: Provides isolation and access control, so that each microservice can control the degree to which other services interact with it.
  • *Ingress: Provides an easy-to-use frontend that can combine multiple microservices into a single externalized API. Ingress is a HTTP-based load-balancing system implementing the "virtual hosting" pattern.
  • Deployment: Manages the release of new versions, enables to easily move from one version of code to the next.
  • Job: Made to once run short-lived, one-off tasks that run until successful termination (ie. exit with 0) such as database migrations or batch jobs. Automatically picks a unique label to identify the Pods it creates.
    • CronJob is responsible for creating a new Job objects at a particular interval.

Application liveness health check runs application-specific logic (eg. loading a web page) to verify that the application is not just still running, but is functioning properly.

Labels are used to identify and optionally group objects in a cluster. Also used in selector queries to provide flexible runtime grouping of objects such as Pods.

Annotations provide object-scoped key/value storage of metadata that can be used by automation tooling and client libraries.

DeamonSet ensures a copy of a Pod is running across a set of nodes in a cluster. Used to deploy system deamons such as log collectors and monitoring agents, which typically must run on every node.

  • Not traditional serving application, but rather additional capabilities and features of the cluster itself.

ReplicaSet should be used when the application is completely decoupled from the node and multiple copies can be run on a given node without special consideration. Used when a single copy of the application must run on all or a subset of the nodes in the cluster.

ConfigMaps and secrets are a great way to provide dynamic configuration in the application.

References