AKS Upgrade Logic and precautions - Icybiubiubiu/icyaks GitHub Wiki

AKS Upgrade Details

https://docs.azure.cn/zh-cn/aks/upgrade-cluster

For VMSS pods will be restarted twice, and details are as follows

Scenario 1: AKS with VM Scale Set

VMSS cluster created with 2 nodes {0,1}
When the upgrade started, Node {2*} was added with the desired version
Node {0} was cordoned, the running pods got scheduled on the new node {2*}
Node {0} was reimaged - to serve as buffer for the next node {1}
Node {1} was cordoned, the running pods were scheduled on node {0} - that is already running the new version
Node {1} was reimaged
Node {2*} was cordoned, the running pods were scheduled on node {1} - that is already running the new version (second restart for the same pods that were originally running on node 0)
Node {2*} was removed; since the cluster now has the same original 2 nodes {0,1} running the new version
NOTE: In this scenario, the newly added buffer node was removed, maintaining the same previously existing instances o Before the upgrade: NAME STATUS ROLES AGE VERSION aks-agentpool-28016909-vmss000000 Ready agent 14m v1.19.11 aks-agentpool-28016909-vmss000001 Ready agent 14m v1.19.11

o After the upgrade: NAME STATUS ROLES AGE VERSION aks-agentpool-28016909-vmss000000 Ready agent 6m58s v1.20.9 aks-agentpool-28016909-vmss000001 Ready agent 3m15s v1.20.9

Scenario 2: AKS with VM Availability Set

VMAS cluster created with 3 nodes {0,1,2}
When the upgrade started, Node {3*} was added with the desired version
Node {0} was cordoned, the running pods got scheduled on the new node {3*}
Node {0} was reimaged - to serve as buffer for the next node {1}
Node {1} was cordoned, the running pods got scheduled on node {0} - that is already running the new version
Node {1} was reimaged - to serve as buffer for the next node {2}
Node {2} was cordoned, the running pods were scheduled on node {1} - that is already running the new version
Node {2} was removed; since the cluster now has 3 nodes {0,1,3} running the new version

NOTE: In this scenario, the latest cordoned/old node was removed, as described in our documents. o Before the upgrade: NAME STATUS ROLES AGE VERSION aks-nodepool1-32040186-0 Ready agent 11m v1.19.11 aks-nodepool1-32040186-1 Ready agent 10m v1.19.11 aks-nodepool1-32040186-2 Ready agent 11m v1.19.11

o After the upgrade: NAME STATUS ROLES AGE VERSION aks-nodepool1-32040186-0 Ready agent 10m v1.20.9 aks-nodepool1-32040186-1 Ready agent 4m v1.20.9 aks-nodepool1-32040186-3 Ready agent 15m v1.20.9

Summary: In summary, from the above behavior we can conclude that the pods would be restarted only once on AKS cluster with VM Availability Set. However, on AKS cluster with VM Scale Set, the pods can be restarted multiple times during the upgrade.

Precautions

For CNI based AKS, pay attention about subnet address

https://docs.azure.cn/zh-cn/aks/configure-azure-cni#plan-ip-addressing-for-your-cluster

PDB in cluster

#kubectl get poddisruptionbudgets

https://kubernetes.io/docs/tasks/run-application/configure-pdb/

resource limitations

#kubectl top