Cheatsheets - ovokpus/MLOps-Learn GitHub Wiki

Docker

Terminology

  • Container: environment that uns an applications that is not dependent on the OS. Kind of like a lightweight VM. Containers are stateless; if you need to update the components inside, create another container instead.
  • Image: template to create a container. Its components are defined by a Dockerfile.
  • Volume: storage area detached from the container for maintaining state.
  • Foreground/interactive vs background/detached: a detached container runs in the background whereas an interactive container will usually have a terminal of some sort for interacting with.

Commands

The commands are organized into common commands and a more exhaustive list grouped around the management of a specific component of Docker. You use each with a different syntax. For a common command and management command respectively, the usage is:

* docker command-name [options] 

* docker management-group command-name [options]

Try running a more complex container with some options specified:

docker run --name web-server -d -p 8080:80 nginx:1.12

This runs the nginx web server in a container using the official nginx image. The meanings of the command options are:

--name container_name: Label the container container_name. In the command above, the container is labeled web-server. This is much more manageable than the id, 31f2b6715... in the output above.

-d: Detach the container by running it in the background and print its container id. Without this, the shell would be attached to the running container command and you wouldn't have the shell returned to you to enter more commands.

-p host_port:container_port: Publish the container's port number container_port to the host's port number host_port. This connects the host's port 8080 to the container port 80 (http) in the nginx command. You again used the default command in the image, which runs the web server in this case.

List your local images

* docker images

Clean up images (many ways)
* docker images -q -f dangling=true
* docker image rm
* docker image prune

List your running containers
* docker ps

Run a Docker image inside a container

* docker run -it --rm image_name:tag
    * `-it` is a combination of `-i` (interactive mode) and `-t` (allocate a terminal).
    * `--rm` means that the container will be removed when exited.
    * You may find Docker images at the [Docker Hub](https://hub.docker.com/).
    * This command will use the entrypoint defined by the image. It won't necesarily open a terminal inside the container.

Run a Docker image inside a container and override the entrypoint
* docker run -it --rm --entrypoint=bash image_name:version
    * This will override the entrypoint of your image and open a bash terminal inside the container instead.

Run a Docker image inside a container and map a port in the container to a port in the host machine
* `docker run -it --rm -p 9696:9696 image_name:tag`

Create a Dockerfile with instructions to create a basic custom Docker image.

# set base image
FROM python:3.9

# set the working directory in the container
WORKDIR /app

# copy dependencies to the working directory
COPY requirements.txt .

# Install dependencies
RUN pip install -r requirements

# Copy code to the working directory
COPY . /app

# command to run on container start
CMD ["python", "./main.py"]
  • Docker will process each line as a layer. Some layers are cached, so in order to speed up build time, first copy and run immutable objects and then take care of your code/modules, as shown in this example.
  • Base images are useful because they save a lot of work and build time. Choose a lean base image and avoid unnecessary packages.
  • Each container should only have one concern. Decouple applications into multiple containers.

Create a slightly more complex Dockerfile with pipenv dependencies and specific entrypoints.

# set base image
FROM python:3.9

# (pipenv) install pipenv
RUN pip install pipenv

# set the working directory in the container
WORKDIR /app

# (pipenv) copy dependencies to the working directory
COPY ["Pipfile", "Pipfile.lock", "./"]

# (pipenv) Install dependencies
# (pipenv) We don't need a virtualenv in Docker, so we can install dependencies to the system
RUN pipenv install --system --deploy

# Copy the model
COPY ["predict.py", "model.bin", "./"]

# Expose a port on the container
# Remember to map the port to a port in the host when running the container!
EXPOSE 9696

# Specify entrypoint
ENTRYPOINT ["gunicorn", "--bind=0.0.0.0:9696", "predict:app"]
  • The COPY instruction has 2 forms, shown here. The second form (like for pipenv in this example) must be used if any paths may contain whitespaces. The last param is always the destination directoy, which may be . or ./ for copying to the directory specified by WORKDIR.

Build an image based on a Dockerfile

  • docker build -f Dockerfile -t my_image .
    • The default Dockerfile that the command will look for is $PATH/Dockerfile. If your Dockerfile is in the same directory that you will run the command and you have not named it something else, -f Dockerfile can be removed from the command.
    • my_image will be the name of your image. You may optionally tag it like so: my_image:my_tag.

Stop a running container

  • docker stop container_id

Docker compose

Example docker-compose.yaml file.

version: "3.9"
services:
  model-server:
    image: zoomcamp-10-model:v1
  gateway:
    image: zoomcamp-10-gateway:v2
    environment:
      - TF_SERVING_HOST=model-server:8500
    ports:
      - "9696:9696"
  • version is required by `docker-compose``
  • The app has 2 components: model-server and gateway
  • Each component must have a Docker image.
  • You may specify environment variables with environment and port mappings with ports
    • The dash (-) means that the entry is a list. In this example there are 2 lists with a single element each.

Run the app.

docker-compose up

Run the app in detached mode.

docker-compose up -d

Shut down the app

docker-compose down

Docker 2.0 (Advanced, for sample flask application)

The dockerfile

#CODE1.0:
#use FROM to configure the base container image to build on
#see - https://docs.docker.com/engine/reference/builder/#from
FROM python:3

#CODE1.1:
#use RUN to install the flask python library using the pip command
#see - https://docs.docker.com/engine/reference/builder/#run
RUN pip install flask

#CODE1.2:
#use RUN to create an empty directory to host the python flask main.py application file
#see - https://docs.docker.com/engine/reference/builder/#run
RUN mkdir -p /corp/app

#CODE1.3:
#use WORKDIR to change the current working directory
#see - https://docs.docker.com/engine/reference/builder/#workdir
WORKDIR /corp/app

#CODE1.4:
#use COPY to copy across the main.py file into the current working directory
#see - https://docs.docker.com/engine/reference/builder/#copy
COPY main.py .

#CODE1.5:
#use ENV to set the FLASK_APP environment variable - tells the flask runtime where to start
#see - https://docs.docker.com/engine/reference/builder/#env
ENV FLASK_APP=/corp/app/main.py

#CODE1.6:
#use ENV to set the APP_NAME environment variable - ref and used in the main.py file 
#see - https://docs.docker.com/engine/reference/builder/#env
ENV APP_NAME=CloudAcademy.DevOps.Dockerfile

#CODE1.7:
#use the CMD to set the default execution for the container when launched
#see - https://docs.docker.com/engine/reference/builder/#cmd
CMD ["flask", "run", "--host=0.0.0.0"]

Use the docker build command to build and create a new custom docker image. Navigate to the directory containing the updated Dockerfile for the Flask web application. In the terminal enter the following command

cd lab/code/App/lab-code/flaskapp/

# Build a new docker image and tag it. In the terminal enter the following command
docker build -t cloudacademydevops/flaskapp .

# Query the local docker images to see the newly built image. In the terminal enter the following command: 
docker images

# Launch a new container instance off the newly built image. In the terminal enter the following command: 
docker run --name lab1 --rm -d -p 3000:5000 cloudacademydevops/flaskapp:latest

# Run a curl command against the newly launched container instance. In the terminal enter the following command: 
curl -i http://localhost:3000

# Query, store, and echo out the PUBLIC IP address of the workstation which is hosting the launched container instance. In the terminal enter the following commands: 
PUBLIC_IP=`curl -s ifconfig.co`
echo $PUBLIC_IP

# Run another curl command against the launched container instance using the PUBLIC IP address of the workstation. In the terminal enter the following command: 
curl -i http://$PUBLIC_IP:3000

# Query, store, and echo out the PRIVATE IP address of the launched container instance. In the terminal enter the following commands: 
PRIVATE_CONTAINER_IP=`docker inspect -f '{{ .NetworkSettings.IPAddress }}' lab1`
echo $PRIVATE_CONTAINER_IP

# Run another curl command against the launched container instance using the PRIVATE IP assigned to the container. In the terminal enter the following command: 
curl -i http://$PRIVATE_CONTAINER_IP:5000

# Examine the docker logs collected for the launched container instance. In the terminal enter the following command:
docker logs lab1

docker-compose.yaml

version: '3.1'

services:

  #CODE2.0:
  #configure the upstream NGINX reverse proxy container and mount the nginx.conf file
  #see - https://docs.docker.com/compose/compose-file/
  nginx:
  image: nginx:1.13.7
    container_name: nginx
    depends_on:
        - flask
    volumes:
        - ./nginx.conf:/etc/nginx/conf.d/default.conf
    networks:
        - cloudacademy
    ports:
        - 80:80

  #CODE2.1:
  #configure the downstream FLASK application container and configure environment vars
  #see - https://docs.docker.com/compose/compose-file/
  flask:
    image: cloudacademydevops/flaskapp:latest
    container_name: flask
    environment:
        - FLASK_APP=/corp/app/main.py
        - APP_NAME=CloudAcademy.DevOps.DockerCompose
    command: flask run --host=0.0.0.0
    networks:
        cloudacademy:
        aliases:
            - flask-app
    ports:
        - 5000:5000

networks:
  cloudacademy:
    driver: bridge

You are now ready to use the docker-compose command to validate and build the dual container setup. Navigate to the directory containing the updated docker-compose.yaml file. In the terminal enter the following command:

cd /cloudacademy/lab/code/App/lab-code/dockercompose

# Validate the docker-compose.yaml file. In the terminal enter the following command:
docker-compose config

# Use the docker-compose up --detach command to launch the dual container setup. In the terminal enter the following command:
docker-compose up --detach

# Use the curl command to test the dual container setup by sending an HTTP request to the Nginx container port 80 - which will in turn proxy the HTTP request downstream to the Flask container. In the terminal enter the following command 
curl -i http://localhost

Kubernetes

Kind

Create local cluster

kind create cluster

Delete local cluster

kind delete cluster

Load an image to the local cluster

kind load docker-image docker-image:tag

eksctl

Create a default cluster on EKS.

eksctl create cluster

Create a cluster with a config YAML file

eksctl create cluster -f eks-config.yaml

Example eks-config.yaml

apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig

metadata:
  name: mlzoomcamp-eks
  region: eu-west-1

nodeGroups:
  - name: ng-m5-xlarge
    instanceType: m5.xlarge
    desiredCapacity: 1
  • metadata contains both the name of the cluster as well as the AWS region.
  • nodeGroups contains a list of node groups. In this example the list has a single entry.
    • desiredCapacity contains the amount of nodes inside the node group.
    • instanceType is the desired AWS EC2 instance type for the node group. All nodes will be of that instance type.

Delete a cluster

eksctl delete cluster -f eks-config.yaml

kubectl

kubectl command cheatsheet

Example deployment.yamlfile

apiVersion: apps/v1
kind: Deployment
metadata:
  name: <deployment-name>
spec:
  replicas: 1
  selector:
    matchLabels:
      app: <app-name>
  template:
    metadata:
      labels:
        app: <app-name>
    spec:
      containers:
      - name: <my-container>
        image: my-component-image:some-tag
        resources:
          limits:
            memory: "128Mi"
            cpu: "100m"
        ports:
        - containerPort: 9696
        env:
          - name: TF_SERVING_HOST
            value: <service-name>.<namespace>.svc.cluster.local:8500
  • kind must be Deployment
  • metadata.name contains the name of the deployment
  • spec.replicas states how many pods should be replicated in the deployment. This example file only states 1 replica.
  • spec.selector defines how the deployment finds which pods to manage. spec.selector.matchLabels is a rule that will match a label in the pod template (the label in this case is app:<app-name>)
  • spec.template contains the blueprint for the pods:
    • metadata in this example contains the labels we use for the pods so that the deployment can find and manage them.
    • ..spec.containers contains a plethora of info:
      • name is the name of the containers inside the pod.
      • image is the Docker image to be used by the containers.
      • resources states the physical resource limits
        • For CPU, 100m means 100 milliCPUs, or 10% of the available CPU computing time.
      • ports contains the ports to use by the containers.
      • env contains names and values for nvironment variables, useful for apps to be able to find other containers by their internal cluster URL.
        • When defining a service, Kubernetes publishes a DNS entry inside the Cluster to make it possible for pods to find other pods. These DNS entries follow the <service-name>.<namespace>.svc.cluster.local:<port> format.
        • The default namespace is default.

Example service.yaml file.

apiVersion: v1
kind: Service
metadata:
  name: <service-name>
spec:
  type: LoadBalancer
  selector:
    app: <app-name>
  ports:
  - port: 80
    targetPort: 9696
  • kind must be Service
  • metadata.name contains the name of the service
  • spec.type specifies the type of Service.
    • Internal services are of type ClusterIP. This is the default service type if this field is not stated in the file.
    • External services are of type LoadBalancer and are assigned an external IP.
  • spec.selector contains the label to find the deployment to which it belongs to.
  • spec.ports contains both the port of the service (port) as well as the port of the deployment (targetPort).

Basic git

  1. Make sure your local copy of the selected branch is updated.
    • git pull
  2. Check your repo branches
    1. Local branches
      • git branch
    2. All branches on remote repo
      • git branch -r
  3. Create a branch and access it
    1. Normal way
      1. git branch new_branch
      2. (2 ways)
        • git checkout new_branch
        • git switch new_branch
    2. Shortcut (2 ways)
      • git checkout -b new_branch
      • git switch -c new_branch
  4. Get some work done lol
  5. Check the status of your work
    • git status
  6. Add changes to staging in order to prepare your commit
    1. Add a single file
      • git add new_file.txt
    2. Add all changed files
      • git add . -p
  7. Did you screw up? Reset the staging
    • git reset
  8. Commit
    • git commit -m "This is a commit message"
  9. Check the commit history of the branch you're in
    • git log
  10. Make sure you upload your commits to the remote repo! If your local branch is brand new, you must add it to the remote repo.
    1. New branch
      • git push -u origin new_branch
    2. Previously existing branch
      • git push
  11. Move to another branch
    • git checkout another_branch
  12. Merge some branch into your current branch (assuming default behavior of pull is merge)
    • git pull branch_that_will_be_merged_into_current_branch

For more info check the GitHub Git Cheat Sheet

Advanced git

The following are some best practices that may be useful, taken from this blog post

  1. While working on a branch, if you need to pull commits from the remote repo to your local repo, use rebase instead of merge to reduce the amount of commits
    • git pull --rebase
    • If you want to make rebasing the default behavior when doing git pull, do so with git config --global --bool pull.rebase true
  2. Before pushing your changes to the remote repo, perform basic houseleeping (squash related commits together, rewording messages, etc)
    • git rebase -i @{u}
  3. Merge (do not rebase) changes from master/main into your branch, in order to update the branch with the latest features and solve any compatibility issues and/or conflicts
    1. git merge main
    2. git pull --merge main
  4. Enforce merge commit when merging feature branch into main, even if a merge commit isn't necessary (check next point for exception), in order to make it easier to see the where and when of changes. Assuming you're in main:
    • git merge --no-ff branch_that_will_be_merged_into_main
  5. Exception to point 4: if you only need to merge a single commit (typical for stuff such as bugfixes). Assuming you're in main:
    • git cherry-pick branch_that_only_has_a_single_commit
  6. Delete merged branch:
    1. Delete locally
      • git branch -d branch_that_has_been_merged
    2. Delete on remote repo
      • git push origin :branch_that_has_been_merged

Create a remote repo (local folder as remote repo)

Official method

Source

  1. Make sure you've got a local commit. You may initialize a local repo with git init on any project folder and making sure that it has at least one commit, or you may use an already existing local repo.
  2. On a separate folder, run:
    git clone --bare path/to/local/project project.git
    • This will create a folder with name project.git on the folder you're running the command.
    • Remote repo folders use the .git extension as a standard.
    • This folder is a bare repository. It does not contain a working folder, only the git files.
  3. Move the project.git folder to the final destination. Ideally, a shared folder such as a networked drive that everyone has access to "locally".
    • You may combine steps 2 and 3 by creating the bare repo directly on the final folder.
  4. You should now be able to clone the repo:
    git clone path/to/remote/repo/project.git
  5. The original repo that we bare-cloned does not have an origin repo to push to. If you want to keep using it, set up a remote like this:
    git remote add origin path/to/remote/repo/project.git

Alternative method

Source

  1. On remote folder:
    mkdir my_repo
    cd my_repo
    git init --bare
  2. On local folder:
    cd my_repo
    git init
    git remote add origin ssh://myserver/my_repo
    git add .
    git commit -m "Initial commit"
    git push -u origin master

Multipass

Tool to run Ubuntu VM's easily with command-line interface.

List available instances

  • multipass list

Create and launch a new instance using the latest LTS release

  • multipass launch --name my_instance

Access the instance shell

  • multipass shell my_instance

Mount a shared folder in the instance

  • multipass mount path/to/local/folder my_instance:path/to/instance/folder

Unmount all mounted folders of instance

  • multipass umount my_instance

Stop an instance

  • multipass stop my_instance

Start a previously created instance

  • multipass start my_instance

Get info on a specific instance

  • multipass info my_instance

Delete an instance (send it to the recycle bin)

  • multipass delete my_instance

Recover a deleted instance

  • multipas recover my_instance

Permanently delete all deleted instances

  • multipass purge
⚠️ **GitHub.com Fallback** ⚠️