eks auto mode - juancamilocc/virtual_resources GitHub Wiki
In this guide, you will learn how to use EKS Auto Mode, customizing node types, networking, storage, and more.
First, we need to enable the feature. To do this, go to your EKS cluster and navigate to Overview > EKS Auto Mode > Manage
.
You will see a warning indicating that it's necessary to upgrade the authentication mode. Click on Manage Access
.
Choose the EKS API and ConfigMap
option and click on save changes.
Next, create a role to manage EKS Auto Mode by going to IAM > Roles > Create role
.
Attach the following policies to the role.
- AmazonEKSBlockStoragePolicy
- AmazonEKSComputePolicy
- AmazonEKSLoadBalancingPolicy
- AmazonEKSNetworkingPolicy
They should look like this.
Also, under Trust relationships, you must add the "sts:TagSession"
action.
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Service": "ec2.amazonaws.com"
},
"Action": [
"sts:AssumeRole",
"sts:TagSession"
]
}
]
}
Return to Manage EKS Auto Mode
. Choose system and general-purpose
node pools, select the role you just created and click on save changes.
You can verify that EKS Auto Mode is enabled going to Compute > Node configuration
.
You can also verify this using the following commands.
kubectl get nodeclass
# NAME ROLE READY AGE
# default AmazonEKSAutoModeTesting True 21h
kubectl get nodepool
# NAME NODECLASS NODES READY AGE
# general-purpose default 0 True 21h
# system default 0 True 21h
Now that EKS Auto Mode is enabled, let’s test it by deploying a service with high resource requirements. We'll use the following deployment as an example.
apiVersion: apps/v1
kind: Deployment
metadata:
name: test-auto-mode
spec:
replicas: 4
selector:
matchLabels:
app: test-auto-mode
template:
metadata:
labels:
app: test-auto-mode
spec:
containers:
- name: test
image: alpine:latest
resources:
requests:
cpu: "2"
memory: "2Gi"
Apply the deployment.
kubectl apply -f test-auto-mode.yaml
# deployment.apps/test-auto-mode created
Check the nodes.
kubectl get nodes
# NAME STATUS ROLES AGE VERSION
# i-0806e14f9a6e83457 Ready <none> 2m13s v1.32.2-eks-677bac1
# ip-197-167-124-175.us-west-2.compute.internal Ready <none> 5d17h v1.32.3-eks-473151a
# ip-197-167-125-65.us-west-2.compute.internal Ready <none> 4d9h v1.32.3-eks-473151a
# ip-197-167-135-199.us-west-2.compute.internal Ready <none> 5d17h v1.32.3-eks-473151a
# ip-197-167-143-164.us-west-2.compute.internal Ready <none> 5d17h v1.32.3-eks-473151a
# ip-197-167-164-38.us-west-2.compute.internal Ready <none> 5d17h v1.32.3-eks-473151a
# ip-197-167-183-91.us-west-2.compute.internal Ready <none> 5d17h v1.32.3-eks-473151a
# ip-197-167-186-43.us-west-2.compute.internal Ready <none> 5d17h v1.32.3-eks-473151a
# ip-197-167-97-168.us-west-2.compute.internal Ready <none> 5d17h v1.32.3-eks-473151a
You should see a newly created node like i-0806e14f9a6e83457
automatically provisioned to meet the resource demand.
kubectl get pods
# NAME READY STATUS RESTARTS AGE
# test-auto-mode-57979ff4fd-6pqpd 1/1 Running 0 5m22s
# test-auto-mode-57979ff4fd-n6htx 1/1 Running 0 5m22s
# test-auto-mode-57979ff4fd-s666r 1/1 Running 0 5m22s
# test-auto-mode-57979ff4fd-x9ptg 1/1 Running 0 5m22s
Now, let's delete our test deployment to verify if the EKS cluster scales down automatically.
kubectl delete deploy test-auto-mode
# deployment.apps "test-auto-mode" deleted
Watch the cluster events to confirm the nodes are being terminated.
kubectl get events --sort-by=.metadata.creationTimestamp -w
# .
# .
# .
# 0s Normal Killing pod/test-auto-mode-57979ff4fd-x9ptg Stopping container pause
# 0s Normal Killing pod/test-auto-mode-57979ff4fd-6pqpd Stopping container pause
# 0s Normal Killing pod/test-auto-mode-57979ff4fd-n6htx Stopping container pause
# 0s Normal Killing pod/test-auto-mode-57979ff4fd-s666r Stopping container pause
# 0s Normal DisruptionBlocked node/i-0806e14f9a6e83457 Node is deleting or marked for deletion
# 0s Normal DisruptionBlocked nodeclaim/general-purpose-q8g9n Node is deleting or marked for deletion
# 0s Normal DisruptionTerminating node/i-0806e14f9a6e83457 Disrupting Node: Empty/Delete
# 0s Normal DisruptionTerminating nodeclaim/general-purpose-q8g9n Disrupting NodeClaim: Empty/Delete
# 0s Warning TerminationGracePeriodExpiring nodeclaim/general-purpose-q8g9n All pods will be deleted by 2025-05-14T19:54:44Z
# 0s Warning TerminationGracePeriodExpiring node/i-0806e14f9a6e83457 All pods will be deleted by 2025-05-14T19:54:44Z
# 0s Warning FailedDraining node/i-0806e14f9a6e83457 Failed to drain node, 1 pods are waiting to be evicted
# 0s Warning InstanceTerminating nodeclaim/general-purpose-q8g9n Instance is terminating
# 0s Warning InstanceTerminating node/i-0806e14f9a6e83457 Instance is terminating
# 0s Normal Finalized node Finalized karpenter.sh/termination
# 0s Normal Finalized nodeclaim Finalized karpenter.sh/termination
# 0s Normal RemovingNode node/i-0806e14f9a6e83457 Node i-0806e14f9a6e83457 event: Removing Node i-0806e14f9a6e83457 from Controller
In the previous logs, we can verify how the pods were stopped and EKS Auto Mode decided to remove the node, ensuring the correct scale down when the nodes is unused or unnecessary.
We have seen how to configure and work EKS Auto Mode. However, if you need more control, such as defining specific networks, storage configurations, or restricting instance types, you can customize the NodeClass and NodePool.
For this, we will base in the following NodePool and Nodeclass.
To avoid public exposure, restrict the NodeClass to only private subnets. To identify them:
aws ec2 describe-subnets \
--filters \
"Name=tag:kubernetes.io/role/internal-elb,Values=1" \
"Name=vpc-id,Values=<YOUR_EKS_VPC_ID>" \
--query "Subnets[].{ID:SubnetId}" \
--output table
# ------------------------------
# | DescribeSubnets |
# +----------------------------+
# | ID |
# +----------------------------+
# | subnet-0af0c0401d3d3d6cc |
# | subnet-0cbb4c97a0f303246 |
# | subnet-0bfe83853f0a0d238 |
# +----------------------------+
So, we will use these subnets in our nodeclass avoiding public access and improving security, as follows.
Example nodeclass.yaml
apiVersion: eks.amazonaws.com/v1
kind: NodeClass
metadata:
labels:
app.kubernetes.io/managed-by: eks
name: private-nodeclass
spec:
ephemeralStorage:
iops: 3000
size: 80Gi
throughput: 125
networkPolicy: DefaultAllow
networkPolicyEventLogs: Disabled
role: AmazonEKSAutoModeTesting
securityGroupSelectorTerms:
- id: sg-05e79576e465a418a
snatPolicy: Random
subnetSelectorTerms:
- id: subnet-0bfe83853f0a0d238
- id: subnet-0af0c0401d3d3d6cc
- id: subnet-0cbb4c97a0f303246
Key fields:
-
ephemeralStorage: Configuration for the ephemeral storage (instance store) for the instances.
- iops: The number of input/output operations per second (IOPS) that will be provisioned for the ephemeral storage.
- size: The size of the ephemeral storage in Gibibytes (Gi).
- throughput: The throughput in MiB/s for the ephemeral storage.
-
securityGroupSelectorTerms: Defines the selection terms for the security groups of the EC2 instances.
-
subnetSelectorTerms: Defines the selection terms for the subnets in which Karpenter can launch the EC2 instances. Several subnet IDs are listed.
Example nodepool.yaml
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
labels:
app.kubernetes.io/managed-by: eks
name: general-purpose-customized
spec:
template:
metadata:
labels:
type: general-purpose
spec:
nodeClassRef:
name: default
kind: NodeClass
group: eks.amazonaws.com
expireAfter: 336h
requirements:
- key: "kubernetes.io/arch"
operator: In
values: ["amd64"]
- key: karpenter.sh/capacity-type
operator: In
values: ["on-demand"]
- key: "node.kubernetes.io/instance-type"
operator: In
values: ["r6i.xlarge", "r5a.xlarge", "c6a.4xlarge"]
- key: "topology.kubernetes.io/zone"
operator: In
values: ["us-west-2a", "us-west-2c", "us-west-2d"]
limits:
cpu: 100
memory: 500Gi
disruption:
consolidationPolicy: WhenEmptyOrUnderutilized
consolidateAfter: 30s
Key fields:
-
disruption: Configures how Karpenter should handle the disruption of nodes in this NodePool for consolidation or replacement.
- consolidateAfter: After 30 seconds of a node becoming eligible for consolidation.
- consolidationPolicy: The consolidation policy that Karpenter should follow. In this case, only nodes that are empty or underutilized will be considered for consolidation.
-
expireAfter: Nodes provisioned by this NodePool will expire and be replaced after 336 hours (14 days). This is useful for node rotation.
-
requirements: A list of requirements that the EC2 instances must meet to be provisioned by this NodePool. These requirements are used to select appropriate instance types.
- key: kubernetes.io/arch: Requires the node architecture to be amd64.
- key: karpenter.sh/capacity-type: Requires the instance capacity type, it can be on-demand or spot.
- key: node.kubernetes.io/instance-type: Defines a list of specific instance types.
- key: topology.kubernetes.io/zone: Defines a list of specific zones.
Above we are only indicating to create new nodes with "r6i.xlarge", "r5a.xlarge", "c6a.4xlarge"
instance types.
Return to EKS cluster > Compute > Node configuration > Manage
, delete general-purpose and system
.
IMPORTANT: It is necessary to delete the default configuration beacuse although apply the customized nodeclass and nodepool EKS Auto Mode will restablish default values later. This way we are ensuring to use our configuration.
Apply the changes.
kubectl apply -f nodeclass.yaml
kubectl apply -f nodepool.yaml
This way, you can customize the EKS Auto Mode behavior, being more restrictive according to your needs.
Pro Tip:
When you are using CNI VPC plugin you have a limited amount of pods depending to the instance type. Here you can check the limit list. For example for a r6i.xlarge
the maximun pods are 58. So even your pods aren't consuming a lot resources this will not assign new pods by node.
To solve this, enable prefix delegation
allowing you to include more pods per node.
kubectl set env daemonset aws-node -n kube-system ENABLE_PREFIX_DELEGATION=true
kubectl set env daemonset aws-node -n kube-system WARM_PREFIX_TARGET=1
This way, the number of nodes will reduce due to is possible to run more pods per node.
EKS Auto Mode with Karpenter offers a powerful and flexible way to manage Kubernetes nodes automatically. In this guide, we explored how it scales resources efficiently and how you can customize NodeClass and NodePool to meet specific requirements like using private subnets or selecting instance types.