AKS Upgrade Logic and precautions - Icybiubiubiu/icyaks GitHub Wiki
AKS Upgrade Details
https://docs.azure.cn/zh-cn/aks/upgrade-cluster
For VMSS pods will be restarted twice, and details are as follows
Scenario 1: AKS with VM Scale Set
-
VMSS cluster created with 2 nodes {0,1}
-
When the upgrade started, Node {2*} was added with the desired version
-
Node {0} was cordoned, the running pods got scheduled on the new node {2*}
-
Node {0} was reimaged - to serve as buffer for the next node {1}
-
Node {1} was cordoned, the running pods were scheduled on node {0} - that is already running the new version
-
Node {1} was reimaged
-
Node {2*} was cordoned, the running pods were scheduled on node {1} - that is already running the new version (second restart for the same pods that were originally running on node 0)
-
Node {2*} was removed; since the cluster now has the same original 2 nodes {0,1} running the new version
-
NOTE: In this scenario, the newly added buffer node was removed, maintaining the same previously existing instances o Before the upgrade: NAME STATUS ROLES AGE VERSION aks-agentpool-28016909-vmss000000 Ready agent 14m v1.19.11 aks-agentpool-28016909-vmss000001 Ready agent 14m v1.19.11
o After the upgrade:
NAME STATUS ROLES AGE VERSION
aks-agentpool-28016909-vmss000000 Ready agent 6m58s v1.20.9
aks-agentpool-28016909-vmss000001 Ready agent 3m15s v1.20.9
Scenario 2: AKS with VM Availability Set
- VMAS cluster created with 3 nodes {0,1,2}
- When the upgrade started, Node {3*} was added with the desired version
- Node {0} was cordoned, the running pods got scheduled on the new node {3*}
- Node {0} was reimaged - to serve as buffer for the next node {1}
- Node {1} was cordoned, the running pods got scheduled on node {0} - that is already running the new version
- Node {1} was reimaged - to serve as buffer for the next node {2}
- Node {2} was cordoned, the running pods were scheduled on node {1} - that is already running the new version
- Node {2} was removed; since the cluster now has 3 nodes {0,1,3} running the new version
NOTE: In this scenario, the latest cordoned/old node was removed, as described in our documents. o Before the upgrade: NAME STATUS ROLES AGE VERSION aks-nodepool1-32040186-0 Ready agent 11m v1.19.11 aks-nodepool1-32040186-1 Ready agent 10m v1.19.11 aks-nodepool1-32040186-2 Ready agent 11m v1.19.11
o After the upgrade: NAME STATUS ROLES AGE VERSION aks-nodepool1-32040186-0 Ready agent 10m v1.20.9 aks-nodepool1-32040186-1 Ready agent 4m v1.20.9 aks-nodepool1-32040186-3 Ready agent 15m v1.20.9
Summary: In summary, from the above behavior we can conclude that the pods would be restarted only once on AKS cluster with VM Availability Set. However, on AKS cluster with VM Scale Set, the pods can be restarted multiple times during the upgrade.
Precautions
For CNI based AKS, pay attention about subnet address
https://docs.azure.cn/zh-cn/aks/configure-azure-cni#plan-ip-addressing-for-your-cluster
PDB in cluster
#kubectl get poddisruptionbudgets
https://kubernetes.io/docs/tasks/run-application/configure-pdb/
resource limitations
#kubectl top