Taints & Tolerations - q-uest/notes-doc-k8s-docker-jenkins-all-else GitHub Wiki

  • Usually, managed by Cluster Admins....

Node affinity, is a property of Pods that attracts them to a set of nodes (either as a preference or a hard requirement). Taints are the opposite -- they allow a node to repel a set of pods.

Tolerations are applied to pods, and allow (but do not require) the pods to schedule onto nodes with matching taints.

It could be seen an alternate to using PodAffinity/PodAntiAffinity. When you want to use certain nodes only in a cluster for a specific needs, rather than asking the Developers team to include PodAffinity/PodAntiAffinity or "NodeSeletors" with every Pod specification, you could use Taints & Tolerations, such that only those Pods with the matching toleration with the nodes will get scheduled.

Taints and tolerations work together to ensure that pods are not scheduled onto inappropriate nodes. One or more taints are applied to a node; this marks that the node should not accept any pods that do not tolerate the taints.

  • To place a taint on node node1:
kubectl taint nodes node1 nodeflag=donotuse:NoSchedule

The taint has key "nodeflag", and the value "donotuse", and taint effect "NoSchedule" above. This means that no pod will be able to schedule onto node1 unless it has a matching toleration.

  • place a different taint on node node2:
kubectl taint node node2 someflag=somevalue:NoSchedule

To get list of taints use below command


kubectl get nodes -o go-template='{{printf "%-50s %-12s\n" "Node" "Taint"}}{{- range .items}}{{- if $taint := (index .spec "taints") }}{{- .metadata.name }}{{ "\t" }}{{- range $taint }}{{- .key }}={{ .value }}:{{ .effect }}{{ "\t" }}{{- end }}{{- "\n" }}{{- end}}{{- end}}'

The output of the above command:

Node Taint
k8smaster node-role.kubernetes.io/master=:NoSchedule
node1 nodeflag=donotuse:NoSchedule
node2 someflag=somevalue:NoSchedule

To remove the taint added by the command above, you can run:

kubectl taint nodes node1 nodeflag=donotuse:NoSchedule-

Example1:

A pod is created with a toleration matching to the taint given to the node (node1).

apiVersion: v1
kind: Pod
metadata:
  name: nginx
  labels:
    env: test
spec:
  containers:
  - name: nginx
    image: nginx
    imagePullPolicy: IfNotPresent
  tolerations:
  - key: "nodeflag"
    operator: "Exists"
    effect: "NoSchedule"

The given toleration matches with the taint given for node1 above, and thus a pod would be able to schedule there. The pod would never be created on Node2 as the toleration in the pod's spec does not match with the node.

NAME    READY   STATUS    RESTARTS   AGE   IP            NODE    NOMINATED NODE   READINESS GATES
nginx   1/1     Running   0          3s    10.244.3.66   node1   <none>           <none>

Even in case if Node1 is unavailable and the only node left in the cluster is node2, still the pod won't be created there.

Similar to the above example with a different toleration specification:

  tolerations:
  - key: "nodeflag"
    operator: "Equal"
    value: "donotuse"
    effect: "NoSchedule"

The default value for operator is Equal.

A toleration "matches" a taint if the keys are the same and the effects are the same, and:

  • the operator is Exists (in which case no value should be specified), or
  • the operator is Equal and the values are equal.

The above example used effect of "NoSchedule". Alternatively, you can use effect of "PreferNoSchedule".

You can put multiple taints on the same node and multiple tolerations on the same pod.

The way Kubernetes processes multiple taints and tolerations is like a filter: start with all of a node's taints, then ignore the ones for which the pod has a matching toleration; the remaining un-ignored taints have the indicated effects on the pod. In particular,

  • if there is at least one un-ignored taint with effect "NoSchedule" then Kubernetes will not schedule the pod onto that node.
  • if there is no un-ignored taint with effect NoSchedule but there is at least one un-ignored taint with effect PreferNoSchedule then Kubernetes will try to not schedule the pod onto the node
  • if there is at least one un-ignored taint with effect NoExecute then the pod will be evicted from the node (if it is already running on the node), and will not be scheduled onto the node (if it is not yet running on the node).

=======

Example2 (with Multiple Taints):

kubectl taint node node1 nodeflag2=prefernoschedule:PreferNoSchedule     nodeflag=donotuse:NoSchedule

Now, if you create the same example pod above, the pod will be created on to Node1. Why because,

The pod is not scheduled on to Node2, as the pod does not have the matching toleration to the taint on it, but it goes to node1 as the node's un-matching taint with the pod's spec - "nodeflag2" has "PreferNoSchedule" only in its effect part and hence it could schedule it there.

=======

Example3 (adding a taint with "NoExecute" effect on the node which has a running pod):

kubectl taint node node1 nodeflag3=noexecution:NoExecute

Currently, the node has the below taints:

Node                                               Taint
node1   nodeflag3=noexecution:NoExecute nodeflag2=prefernoschedule:PreferNoSchedule     nodeflag=donotuse:NoSchedule

The running pod on the node1 gets evicted, rather it got disappeared. The only way to get it back on the node was by executing the manifest file only.

======

The node controller automatically taints a Node when certain conditions are true. The following taints are built in:

  • node.kubernetes.io/not-ready: Node is not ready. This corresponds to the NodeCondition Ready being "False".
  • node.kubernetes.io/unreachable: Node is unreachable from the node controller. This corresponds to the NodeCondition Ready being "Unknown".
  • node.kubernetes.io/out-of-disk: Node becomes out of disk.
  • node.kubernetes.io/memory-pressure: Node has memory pressure.
  • node.kubernetes.io/disk-pressure: Node has disk pressure.
  • node.kubernetes.io/network-unavailable: Node's network is unavailable.
  • node.kubernetes.io/unschedulable: Node is unschedulable.
  • node.cloudprovider.kubernetes.io/uninitialized: When the kubelet is started with "external" cloud provider, this taint is set on a node to mark it as unusable. After a controller from the cloud-controller-manager initializes this node, the kubelet removes this taint.
⚠️ **GitHub.com Fallback** ⚠️