Node Affinity - q-uest/notes-doc-k8s-docker-jenkins-all-else GitHub Wiki
Node affinity is conceptually similar to nodeSelector -- it allows you to constrain which nodes your pod is eligible to be scheduled on, based on labels of the node.
There are currently two types of node affinity, called
-
requiredDuringSchedulingIgnoredDuringExecution ( Hard )
specifies rules that must be met for a pod to be scheduled onto a node, similar to node selector, but if the node's label is changed/deleted later, it does not affect the running pod.
-
preferredDuringSchedulingIgnoredDuringExecution ( Soft )
scheduler will try to enforce but will not guarantee. It will look for nodes with the given Label there and schedule the pod if it finds one, if it does not find any nodes then the pod will be scheduled on any available nodes in the cluster.
It is allowed to provide both the above types while creating a Pod (connected those types OR or AND condition).
Here's an example of a pod that uses node affinity:
kind: Pod
metadata:
name: with-node-affinity
spec:
affinity:
nodeAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 10
preference:
matchExpressions:
- key: most-pref-node
operator: In
values:
- "yes"
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 1
preference:
matchExpressions:
- key: least-pref-node
operator: In
values:
- "yes"
containers:
- name: with-node-affinity
image: k8s.gcr.io/pause:2.0
The weight field in "preferredDuringSchedulingIgnoredDuringExecution" can have a value in the range 1-100. At the time of Node sorting, the scheduler gives weights 10 and 1 to those nodes having the given labels above most-pref-node=yes & least-pref-node=yes respectively.
NAME STATUS ROLES AGE VERSION LABELS
k8smaster Ready control-plane,master 49d v1.21.0 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=k8smaster,kubernetes.io/os=linux,node-role.kubernetes.io/control-plane=,node-role.kubernetes.io/master=,node.kubernetes.io/exclude-from-external-load-balancers=
node1 Ready <none> 45d v1.23.1 another-node-label-key=another-node-label-value,app=blue,beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/e2e-az-name=e2e-az1,kubernetes.io/hostname=node1,kubernetes.io/os=linux,least-pref-node=yes,most-pref-node=yes
node2 Ready <none> 45d v1.23.1 app=blue,beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=node2,kubernetes.io/os=linux,least-pref-node=yes
The pod has been scheduled on to node1. In case if node1 is unavailable during the scheduling, the pod will be scheduled onto node2 though its weight is lesser than the node1's.
apiVersion: v1
kind: Pod
metadata:
name: with-node-affinity
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/e2e-az-name
operator: In
values:
- e2e-az1
- e2e-az2
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 1
preference:
matchExpressions:
- key: another-node-label-key
operator: In
values:
- another-node-label-value
containers:
- name: with-node-affinity
image: k8s.gcr.io/pause:2.0
This node affinity rule says the pod can only be placed on a node with a label whose key is kubernetes.io/e2e-az-name and whose value is either e2e-az1 or e2e-az2. In addition, among nodes that meet that criteria, nodes with a label whose key is another-node-label-key and whose value is another-node-label-value should be preferred.
You can see the operator "In" being used in the example. The new node affinity syntax supports the following operators: In, NotIn, Exists, DoesNotExist, Gt, Lt.
You can use NotIn and DoesNotExist to achieve node anti-affinity behavior, or use node taints to repel pods from specific nodes.
If you specify both nodeSelector and nodeAffinity, both must be satisfied for the pod to be scheduled onto a candidate node.
If you specify multiple nodeSelectorTerms associated with nodeAffinity types, then the pod can be scheduled onto a node if one of the nodeSelectorTerms can be satisfied.
-
topologyKey is the key of node labels. If two Nodes are labelled with this key and have identical values for that label, the scheduler treats both Nodes as being in the same topology. The scheduler tries to place a balanced number of Pods into each topology domain.
-
The topologyKey in podAffinity determines the scope of where the pod should be scheduled to.
-
Consider a cluster with Nodes that are labeled with their hostname, zone name, and region name. Then you can set the topologyKeys values of a service to direct traffic as follows.
- Only to endpoints on the same node, failing if no endpoint exists on the node: ["kubernetes.io/hostname"]. - Preferentially to endpoints on the same node, falling back to endpoints in the same zone, followed by the same region, and failing otherwise: ["kubernetes.io/hostname", "topology.kubernetes.io/zone", "topology.kubernetes.io/region"]. This may be useful, for example, in cases where data locality is critical. - Preferentially to the same zone, but fallback on any available endpoint if none are available within this zone: ["topology.kubernetes.io/zone", "*"].