17 ‐ Docker Swarm - CloudScope/DevOpsWithCloudScope GitHub Wiki
Docker Swarm is Docker's native orchestration tool, used for managing a cluster of Docker nodes (machines) in a distributed environment. It enables high availability, scaling, and management of containerized applications. As a DevOps engineer, understanding Docker Swarm is essential for orchestrating containerized workloads and deploying applications in production environments. Here's a summary of key concepts and practical notes for working with Docker Swarm.
Docker Swarm is Docker's built-in orchestration tool that allows you to deploy and manage containers across multiple Docker hosts. It turns a group of Docker engines into a cluster, making it easier to manage large-scale applications by automating container deployment, scaling, and load balancing.
- Cluster: A group of machines (physical or virtual) running Docker Swarm.
- Node: A single Docker engine that is part of a Swarm cluster.
- Manager Node: Controls the swarm, handles the scheduling of tasks, and manages the Swarm state.
- Worker Node: Runs container tasks as assigned by the manager node.
- Swarm Mode: Docker's native clustering and orchestration feature that can be activated with a single command.
- Service: A containerized application that you want to run in the Swarm cluster. It defines how containers should run, including replicas, update policies, and scaling.
- Task: A running instance of a service. It represents a container managed by Swarm.
-
Stack: A group of services defined in a
docker-compose.yml
file that can be deployed together as a unit in the swarm.
To start using Docker Swarm, you need at least one manager node and one or more worker nodes.
docker swarm init
This command initializes the Swarm cluster and designates the current machine as the manager node. After running this, you'll see a token that can be used to join other nodes to the cluster.
On each worker node, run the command provided after initializing the swarm on the manager node:
docker swarm join --token <WORKER-TOKEN> <MANAGER-IP>:2377
This command will join the worker node to the Swarm cluster.
docker node ls
This command shows the status of all nodes in the Swarm cluster, including manager and worker nodes.
Docker Swarm uses services to define containers that should be deployed and managed. A service is typically defined in a docker-compose.yml
file and can scale horizontally across nodes.
docker service create --name <service-name> --replicas <number-of-replicas> <image-name>
- Example: Running an Nginx service with 3 replicas:
docker service create --name nginx --replicas 3 -p 80:80 nginx
You can scale the number of containers for a service:
docker service scale <service-name>=<new-replica-count>
Example:
docker service scale nginx=5
To see the active services in the Swarm:
docker service ls
docker service inspect <service-name>
You can update a service with a new image or configuration:
docker service update --image <new-image> <service-name>
docker service rm <service-name>
Docker Swarm provides built-in networking features that make it easier to connect containers running on different nodes.
An overlay network allows containers across different nodes to communicate securely. It's created automatically when you use docker service create
.
docker network create --driver overlay <network-name>
Services can be exposed using published ports or load balancers. When you create a service, you can expose specific ports on the manager node, and Docker Swarm will automatically distribute traffic across the replicas.
docker service create --name <service-name> -p <host-port>:<container-port> <image-name>
For example, to expose a web service:
docker service create --name webapp -p 80:80 myapp-image
By default, Swarm enables internal DNS for containers to communicate with each other using service names. For example, one container can communicate with another container using http://<service-name>:<port>
.
You can define and deploy a complete multi-service application in Docker Swarm using Docker Compose files.
Example of a stack definition in docker-compose.yml
:
version: "3.8"
services:
web:
image: nginx
deploy:
replicas: 3
ports:
- "80:80"
app:
image: myapp
deploy:
replicas: 2
docker stack deploy -c docker-compose.yml <stack-name>
This command deploys all services defined in the compose file as a stack to the swarm cluster.
docker stack ls
docker stack rm <stack-name>
Docker Swarm supports rolling updates for services, ensuring zero downtime during deployment.
docker service update --image <new-image> <service-name>
Swarm will update the service gradually by stopping old containers and starting new ones to avoid downtime.
If there are issues with the update, you can roll back to the previous version:
docker service rollback <service-name>
Docker Swarm supports the management of sensitive data (e.g., API keys, passwords) using Docker Secrets.
To create a secret:
echo "my_secret_password" | docker secret create my_secret_password -
To use a secret in a service:
services:
myapp:
image: myapp
secrets:
- my_secret_password
To list secrets:
docker secret ls
To ensure the health and performance of your Swarm cluster, monitoring and logging are critical:
-
Logs: Use
docker service logs <service-name>
to view the logs for a service. -
Health Checks: Define health checks in your service definition to ensure that the containers are running properly.
healthcheck: test: ["CMD", "curl", "-f", "http://localhost"] interval: 30s retries: 3
For advanced monitoring, integrate with tools like Prometheus, Grafana, or ELK stack.
10. To remove a node from a Docker Swarm cluster, you need to follow different steps depending on whether the node is a manager node or a worker node.
Here’s how you can remove a node from Docker Swarm:
Before removing a worker node, you may want to "drain" it, which means Docker will stop scheduling new tasks on this node and will reschedule any existing tasks to other nodes in the cluster.
To drain a node, run the following command on the manager node:
docker node update --availability drain <node-name>
This prevents new tasks from being assigned to the node and attempts to move running tasks to other nodes in the Swarm.
Now, remove the worker node from the swarm by running the following command on the manager node:
docker node rm <node-name>
This will remove the worker node from the Swarm cluster.
On the worker node itself, you can run the following command to leave the Swarm cluster:
docker swarm leave
This command will remove the worker node from the swarm and effectively take it out of the cluster.
Note: If the node is a manager node, you can use
docker node rm
to remove it from the manager set. But, a manager node must have at least one manager left in the swarm to avoid leaving the swarm in a "single manager" state.
If you are removing a manager node, you need to ensure that there are other manager nodes available in the cluster to maintain quorum.
It’s a good idea to drain the manager node before removing it, so no new tasks are assigned to it:
docker node update --availability drain <manager-node-name>
If you just want to demote the manager node (i.e., make it a worker node), you can do this:
docker node demote <manager-node-name>
This will convert the manager node into a worker node, and it will stop participating in manager operations. If you want to keep the node in the Swarm but as a worker, this is the step to use.
If you want to completely remove the manager node from the Swarm, you can run the following on the manager node:
docker node rm <manager-node-name>
This command removes the manager node from the cluster, and if it's the only manager left, Docker will fail to remove it.
If you are removing the manager node entirely (not demoting it), run the following command on the manager node itself:
docker swarm leave --force
The --force
flag is required to leave the Swarm when the node is a manager, as it will ensure the node leaves the cluster even if it is managing the Swarm.
Important: You cannot remove the last manager node unless you have more than one manager node in the Swarm. Swarm requires a quorum of manager nodes to maintain the cluster.
In rare cases where a node is unresponsive or you are unable to remove it using the standard steps, you can force remove a node from the Swarm using the --force
flag on the manager node.
docker node rm <node-name> --force
This will forcibly remove the node from the Swarm, but use this carefully as it can lead to data inconsistencies if the node was still part of active services.
After removing the node, you can verify that it has been removed by listing the nodes in the cluster:
docker node ls
This will show the current status of all nodes in the Swarm. The removed node should no longer appear in the list.
-
Drain Worker Node (optional but recommended):
docker node update --availability drain <node-name>
-
Remove Worker Node (on the manager node):
docker node rm <node-name>
-
Leave Swarm (on the worker node):
docker swarm leave
-
Demote Manager Node (if needed):
docker node demote <manager-node-name>
-
Remove Manager Node (on the manager node):
docker node rm <manager-node-name>
-
Leave Swarm (on the manager node):
docker swarm leave --force
-
Force Remove Node (in case of failure):
docker node rm <node-name> --force
- High Availability: Ensure you have at least three manager nodes to maintain high availability and fault tolerance.
- Resource Constraints: Use resource limits (CPU, memory) for containers to avoid overconsumption.
- Backup and Restore: Regularly back up the Swarm manager state and services configuration.
- Use Secrets and Configs: Always store sensitive information like credentials and configuration files using Docker Secrets and Configs for enhanced security.