General steps to shutdown a Open shift clutser - unix1998/technical_notes GitHub Wiki

Shutting down an entire OpenShift cluster typically involves stopping all the nodes (masters and workers) that make up the cluster. This process can vary depending on the underlying infrastructure (e.g., bare metal, virtual machines, cloud providers). OpenShift itself doesn't provide a single oc command to shut down the entire cluster because it's an orchestrated set of components running on multiple nodes. Instead, the process involves stopping the underlying nodes that host the OpenShift components.

General Steps to Shut Down an OpenShift Cluster

1. Drain and Stop the Nodes:

Before shutting down the nodes, you should drain them to gracefully handle the workloads. This ensures that any running pods are properly terminated or rescheduled.

Drain the Nodes:

# Drain all worker nodes
oc get nodes -l node-role.kubernetes.io/worker= | awk '{print $1}' | tail -n +2 | xargs -I {} oc adm drain {} --ignore-daemonsets --delete-local-data --force

# Drain all master nodes
oc get nodes -l node-role.kubernetes.io/master= | awk '{print $1}' | tail -n +2 | xargs -I {} oc adm drain {} --ignore-daemonsets --delete-local-data --force

Note: Draining nodes ensures that no new pods are scheduled on them and any running pods are safely evicted.

2. Stop the Nodes:

You will need to stop the nodes at the infrastructure level. This can involve:

  • Bare Metal: Shutting down the physical servers.
  • Virtual Machines: Stopping the VMs via the hypervisor or cloud provider interface.
  • Cloud Instances: Stopping the instances via the cloud provider console or CLI.

Example using AWS CLI:

# Stop all instances in a specific region and cluster
aws ec2 describe-instances --filters "Name=tag:Cluster,Values=my-openshift-cluster" --query "Reservations[*].Instances[*].InstanceId" --output text | xargs -I {} aws ec2 stop-instances --instance-ids {}

Example using Azure CLI:

# Stop all VMs in a resource group
az vm stop --ids $(az vm list -g myResourceGroup --query "[].id" -o tsv)

3. Verify the Shutdown:

Ensure that all nodes are stopped and the cluster is no longer running. You can check this from your infrastructure management console or CLI.

Notes:

  • Graceful Shutdown: Always aim for a graceful shutdown by draining nodes first to avoid data loss and ensure that workloads are handled properly.
  • Persistent Storage: Ensure that any persistent storage used by the cluster is handled appropriately, as shutting down the cluster might affect access to persistent volumes.
  • Configuration Management: Keep a backup of your cluster configuration and state before shutting down, to facilitate easier restarts.

Summary

There isn't a single oc command to shut down an entire OpenShift cluster. Instead, you must:

  1. Drain the nodes using oc adm drain.
  2. Stop the nodes at the infrastructure level (bare metal, VM, cloud provider).
  3. Verify that all components are stopped.

The exact commands and steps will depend on your specific infrastructure setup.