Node Affinity - q-uest/notes-doc-k8s-docker-jenkins-all-else GitHub Wiki

Node affinity is conceptually similar to nodeSelector -- it allows you to constrain which nodes your pod is eligible to be scheduled on, based on labels of the node.

There are currently two types of node affinity, called

  1. requiredDuringSchedulingIgnoredDuringExecution ( Hard )

    specifies rules that must be met for a pod to be scheduled onto a node, similar to node selector, but if the node's label is changed/deleted later, it does not affect the running pod.

  2. preferredDuringSchedulingIgnoredDuringExecution ( Soft )

    scheduler will try to enforce but will not guarantee. It will look for nodes with the given Label there and schedule the pod if it finds one, if it does not find any nodes then the pod will be scheduled on any available nodes in the cluster.

It is allowed to provide both the above types while creating a Pod (connected those types OR or AND condition).

Here's an example of a pod that uses node affinity:

kind: Pod
metadata:
  name: with-node-affinity
spec:
  affinity:
    nodeAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:
      - weight: 10
        preference:
          matchExpressions:
          - key: most-pref-node
            operator: In
            values:
            - "yes"
      preferredDuringSchedulingIgnoredDuringExecution:
      - weight: 1
        preference:
          matchExpressions:
          - key: least-pref-node
            operator: In
            values:
            - "yes"
  containers:
  - name: with-node-affinity
    image: k8s.gcr.io/pause:2.0

The weight field in "preferredDuringSchedulingIgnoredDuringExecution" can have a value in the range 1-100. At the time of Node sorting, the scheduler gives weights 10 and 1 to those nodes having the given labels above most-pref-node=yes & least-pref-node=yes respectively.

NAME        STATUS   ROLES                  AGE   VERSION   LABELS
k8smaster   Ready    control-plane,master   49d   v1.21.0   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=k8smaster,kubernetes.io/os=linux,node-role.kubernetes.io/control-plane=,node-role.kubernetes.io/master=,node.kubernetes.io/exclude-from-external-load-balancers=
node1       Ready    <none>                 45d   v1.23.1   another-node-label-key=another-node-label-value,app=blue,beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/e2e-az-name=e2e-az1,kubernetes.io/hostname=node1,kubernetes.io/os=linux,least-pref-node=yes,most-pref-node=yes
node2       Ready    <none>                 45d   v1.23.1   app=blue,beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=node2,kubernetes.io/os=linux,least-pref-node=yes

The pod has been scheduled on to node1. In case if node1 is unavailable during the scheduling, the pod will be scheduled onto node2 though its weight is lesser than the node1's.

apiVersion: v1
kind: Pod
metadata:
  name: with-node-affinity
spec:
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: kubernetes.io/e2e-az-name
            operator: In
            values:
            - e2e-az1
            - e2e-az2
      preferredDuringSchedulingIgnoredDuringExecution:
      - weight: 1
        preference:
          matchExpressions:
          - key: another-node-label-key
            operator: In
            values:
            - another-node-label-value
  containers:
  - name: with-node-affinity
    image: k8s.gcr.io/pause:2.0

This node affinity rule says the pod can only be placed on a node with a label whose key is kubernetes.io/e2e-az-name and whose value is either e2e-az1 or e2e-az2. In addition, among nodes that meet that criteria, nodes with a label whose key is another-node-label-key and whose value is another-node-label-value should be preferred.

You can see the operator "In" being used in the example. The new node affinity syntax supports the following operators: In, NotIn, Exists, DoesNotExist, Gt, Lt.

You can use NotIn and DoesNotExist to achieve node anti-affinity behavior, or use node taints to repel pods from specific nodes.

If you specify both nodeSelector and nodeAffinity, both must be satisfied for the pod to be scheduled onto a candidate node.

If you specify multiple nodeSelectorTerms associated with nodeAffinity types, then the pod can be scheduled onto a node if one of the nodeSelectorTerms can be satisfied.

Toplogykey

  • topologyKey is the key of node labels. If two Nodes are labelled with this key and have identical values for that label, the scheduler treats both Nodes as being in the same topology. The scheduler tries to place a balanced number of Pods into each topology domain.

  • The topologyKey in podAffinity determines the scope of where the pod should be scheduled to.

  • Consider a cluster with Nodes that are labeled with their hostname, zone name, and region name. Then you can set the topologyKeys values of a service to direct traffic as follows.

     - Only to endpoints on the same node, failing if no endpoint exists on the node: ["kubernetes.io/hostname"].
     - Preferentially to endpoints on the same node, falling back to endpoints in the same zone, followed by the same region, and failing otherwise: 
       ["kubernetes.io/hostname", "topology.kubernetes.io/zone", "topology.kubernetes.io/region"]. This may be useful, for example, in cases where data 
       locality is critical.
     - Preferentially to the same zone, but fallback on any available endpoint if none are available within this zone: ["topology.kubernetes.io/zone", 
       "*"].
    
⚠️ **GitHub.com Fallback** ⚠️