eks auto mode - juancamilocc/virtual_resources GitHub Wiki

Setting Up EKS Auto Mode

In this guide, you will learn how to use EKS Auto Mode, customizing node types, networking, storage, and more.

Enable EKS Auto Mode

First, we need to enable the feature. To do this, go to your EKS cluster and navigate to Overview > EKS Auto Mode > Manage.

Upgrade Auth Mode

You will see a warning indicating that it's necessary to upgrade the authentication mode. Click on Manage Access.

Enable API Access

Choose the EKS API and ConfigMap option and click on save changes.

Create IAM Role for EKS Auto Mode

Next, create a role to manage EKS Auto Mode by going to IAM > Roles > Create role.

Create Role Create Role Create Role

Attach the following policies to the role.

  • AmazonEKSBlockStoragePolicy
  • AmazonEKSComputePolicy
  • AmazonEKSLoadBalancingPolicy
  • AmazonEKSNetworkingPolicy

They should look like this.

Policies for role EKS Auto Mode

Also, under Trust relationships, you must add the "sts:TagSession" action.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "Service": "ec2.amazonaws.com"
            },
            "Action": [
                "sts:AssumeRole",
                "sts:TagSession"
            ]
        }
    ]
}

Trust relationships

Enable EKS Auto Mode in the Cluster

Return to Manage EKS Auto Mode. Choose system and general-purpose node pools, select the role you just created and click on save changes.

Enable EKS Auto Mode

You can verify that EKS Auto Mode is enabled going to Compute > Node configuration.

Check EKS Auto Mode is enabled

You can also verify this using the following commands.

kubectl get nodeclass
# NAME      ROLE                       READY   AGE
# default   AmazonEKSAutoModeTesting   True    21h

kubectl get nodepool
# NAME              NODECLASS   NODES   READY   AGE
# general-purpose   default     0       True    21h
# system            default     0       True    21h

Testing EKS Auto Mode

Now that EKS Auto Mode is enabled, let’s test it by deploying a service with high resource requirements. We'll use the following deployment as an example.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: test-auto-mode
spec:
  replicas: 4
  selector:
    matchLabels:
      app: test-auto-mode
  template:
    metadata:
      labels:
        app: test-auto-mode
    spec:
      containers:
        - name: test
          image: alpine:latest
          resources:
            requests:
              cpu: "2"
              memory: "2Gi"

Apply the deployment.

kubectl apply -f test-auto-mode.yaml
# deployment.apps/test-auto-mode created

Check the nodes.

kubectl get nodes
# NAME                                            STATUS   ROLES    AGE     VERSION
# i-0806e14f9a6e83457                             Ready    <none>   2m13s   v1.32.2-eks-677bac1   
# ip-197-167-124-175.us-west-2.compute.internal   Ready    <none>   5d17h   v1.32.3-eks-473151a  
# ip-197-167-125-65.us-west-2.compute.internal    Ready    <none>   4d9h    v1.32.3-eks-473151a   
# ip-197-167-135-199.us-west-2.compute.internal   Ready    <none>   5d17h   v1.32.3-eks-473151a   
# ip-197-167-143-164.us-west-2.compute.internal   Ready    <none>   5d17h   v1.32.3-eks-473151a   
# ip-197-167-164-38.us-west-2.compute.internal    Ready    <none>   5d17h   v1.32.3-eks-473151a   
# ip-197-167-183-91.us-west-2.compute.internal    Ready    <none>   5d17h   v1.32.3-eks-473151a   
# ip-197-167-186-43.us-west-2.compute.internal    Ready    <none>   5d17h   v1.32.3-eks-473151a   
# ip-197-167-97-168.us-west-2.compute.internal    Ready    <none>   5d17h   v1.32.3-eks-473151a   

You should see a newly created node like i-0806e14f9a6e83457 automatically provisioned to meet the resource demand.

kubectl get pods
# NAME                              READY   STATUS    RESTARTS   AGE
# test-auto-mode-57979ff4fd-6pqpd   1/1     Running   0          5m22s
# test-auto-mode-57979ff4fd-n6htx   1/1     Running   0          5m22s
# test-auto-mode-57979ff4fd-s666r   1/1     Running   0          5m22s
# test-auto-mode-57979ff4fd-x9ptg   1/1     Running   0          5m22s

Auto-Scaling Down

Now, let's delete our test deployment to verify if the EKS cluster scales down automatically.

kubectl delete deploy test-auto-mode
# deployment.apps "test-auto-mode" deleted

Watch the cluster events to confirm the nodes are being terminated.

kubectl get events --sort-by=.metadata.creationTimestamp -w
# .
# .
# .
# 0s          Normal    Killing                           pod/test-auto-mode-57979ff4fd-x9ptg    Stopping container pause
# 0s          Normal    Killing                           pod/test-auto-mode-57979ff4fd-6pqpd    Stopping container pause
# 0s          Normal    Killing                           pod/test-auto-mode-57979ff4fd-n6htx    Stopping container pause
# 0s          Normal    Killing                           pod/test-auto-mode-57979ff4fd-s666r    Stopping container pause
# 0s          Normal    DisruptionBlocked                 node/i-0806e14f9a6e83457               Node is deleting or marked for deletion
# 0s          Normal    DisruptionBlocked                 nodeclaim/general-purpose-q8g9n        Node is deleting or marked for deletion
# 0s          Normal    DisruptionTerminating             node/i-0806e14f9a6e83457               Disrupting Node: Empty/Delete
# 0s          Normal    DisruptionTerminating             nodeclaim/general-purpose-q8g9n        Disrupting NodeClaim: Empty/Delete
# 0s          Warning   TerminationGracePeriodExpiring    nodeclaim/general-purpose-q8g9n        All pods will be deleted by 2025-05-14T19:54:44Z
# 0s          Warning   TerminationGracePeriodExpiring    node/i-0806e14f9a6e83457               All pods will be deleted by 2025-05-14T19:54:44Z
# 0s          Warning   FailedDraining                    node/i-0806e14f9a6e83457               Failed to drain node, 1 pods are waiting to be evicted
# 0s          Warning   InstanceTerminating               nodeclaim/general-purpose-q8g9n        Instance is terminating
# 0s          Warning   InstanceTerminating               node/i-0806e14f9a6e83457               Instance is terminating
# 0s          Normal    Finalized                         node                                   Finalized karpenter.sh/termination
# 0s          Normal    Finalized                         nodeclaim                              Finalized karpenter.sh/termination
# 0s          Normal    RemovingNode                      node/i-0806e14f9a6e83457               Node i-0806e14f9a6e83457 event: Removing Node i-0806e14f9a6e83457 from Controller

In the previous logs, we can verify how the pods were stopped and EKS Auto Mode decided to remove the node, ensuring the correct scale down when the nodes is unused or unnecessary.

Customizing your NodePools and NodeClass

We have seen how to configure and work EKS Auto Mode. However, if you need more control, such as defining specific networks, storage configurations, or restricting instance types, you can customize the NodeClass and NodePool.

For this, we will base in the following NodePool and Nodeclass.

Restricting to Private Subnets Only

To avoid public exposure, restrict the NodeClass to only private subnets. To identify them:

aws ec2 describe-subnets \
  --filters \
    "Name=tag:kubernetes.io/role/internal-elb,Values=1" \
    "Name=vpc-id,Values=<YOUR_EKS_VPC_ID>" \
  --query "Subnets[].{ID:SubnetId}" \
  --output table

# ------------------------------
# |       DescribeSubnets      |
# +----------------------------+
# |             ID             |
# +----------------------------+
# |  subnet-0af0c0401d3d3d6cc  |
# |  subnet-0cbb4c97a0f303246  |
# |  subnet-0bfe83853f0a0d238  |
# +----------------------------+

So, we will use these subnets in our nodeclass avoiding public access and improving security, as follows.

Example nodeclass.yaml

apiVersion: eks.amazonaws.com/v1
kind: NodeClass
metadata:
  labels:
    app.kubernetes.io/managed-by: eks
  name: private-nodeclass
spec:
  ephemeralStorage:
    iops: 3000
    size: 80Gi
    throughput: 125
  networkPolicy: DefaultAllow
  networkPolicyEventLogs: Disabled
  role: AmazonEKSAutoModeTesting
  securityGroupSelectorTerms:
  - id: sg-05e79576e465a418a
  snatPolicy: Random
  subnetSelectorTerms:
  - id: subnet-0bfe83853f0a0d238   
  - id: subnet-0af0c0401d3d3d6cc
  - id: subnet-0cbb4c97a0f303246

Key fields:

  • ephemeralStorage: Configuration for the ephemeral storage (instance store) for the instances.

    • iops: The number of input/output operations per second (IOPS) that will be provisioned for the ephemeral storage.
    • size: The size of the ephemeral storage in Gibibytes (Gi).
    • throughput: The throughput in MiB/s for the ephemeral storage.
  • securityGroupSelectorTerms: Defines the selection terms for the security groups of the EC2 instances.

  • subnetSelectorTerms: Defines the selection terms for the subnets in which Karpenter can launch the EC2 instances. Several subnet IDs are listed.

Example nodepool.yaml

apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  labels:
      app.kubernetes.io/managed-by: eks
  name: general-purpose-customized
spec:
  template:
    metadata:
      labels:
        type: general-purpose
    spec:
      nodeClassRef:
        name: default
        kind: NodeClass
        group: eks.amazonaws.com
      expireAfter: 336h 
      requirements:
        - key: "kubernetes.io/arch"
          operator: In
          values: ["amd64"]

        - key: karpenter.sh/capacity-type
          operator: In
          values: ["on-demand"]
        
        - key: "node.kubernetes.io/instance-type"
          operator: In
          values: ["r6i.xlarge", "r5a.xlarge", "c6a.4xlarge"]
        
        - key: "topology.kubernetes.io/zone"
          operator: In
          values: ["us-west-2a", "us-west-2c", "us-west-2d"]
  limits:
    cpu: 100
    memory: 500Gi
  disruption:
    consolidationPolicy: WhenEmptyOrUnderutilized
    consolidateAfter: 30s

Key fields:

  • disruption: Configures how Karpenter should handle the disruption of nodes in this NodePool for consolidation or replacement.

    • consolidateAfter: After 30 seconds of a node becoming eligible for consolidation.
    • consolidationPolicy: The consolidation policy that Karpenter should follow. In this case, only nodes that are empty or underutilized will be considered for consolidation.
  • expireAfter: Nodes provisioned by this NodePool will expire and be replaced after 336 hours (14 days). This is useful for node rotation.

  • requirements: A list of requirements that the EC2 instances must meet to be provisioned by this NodePool. These requirements are used to select appropriate instance types.

    • key: kubernetes.io/arch: Requires the node architecture to be amd64.
    • key: karpenter.sh/capacity-type: Requires the instance capacity type, it can be on-demand or spot.
    • key: node.kubernetes.io/instance-type: Defines a list of specific instance types.
    • key: topology.kubernetes.io/zone: Defines a list of specific zones.

Above we are only indicating to create new nodes with "r6i.xlarge", "r5a.xlarge", "c6a.4xlarge" instance types.

Return to EKS cluster > Compute > Node configuration > Manage, delete general-purpose and system.

Unable Default EKS Auto Mode Configuration Unable Default EKS Auto Mode Configuration

IMPORTANT: It is necessary to delete the default configuration beacuse although apply the customized nodeclass and nodepool EKS Auto Mode will restablish default values later. This way we are ensuring to use our configuration.

Apply the changes.

kubectl apply -f nodeclass.yaml
kubectl apply -f nodepool.yaml

This way, you can customize the EKS Auto Mode behavior, being more restrictive according to your needs.

Pro Tip: When you are using CNI VPC plugin you have a limited amount of pods depending to the instance type. Here you can check the limit list. For example for a r6i.xlarge the maximun pods are 58. So even your pods aren't consuming a lot resources this will not assign new pods by node.

To solve this, enable prefix delegation allowing you to include more pods per node.

kubectl set env daemonset aws-node -n kube-system ENABLE_PREFIX_DELEGATION=true
kubectl set env daemonset aws-node -n kube-system WARM_PREFIX_TARGET=1

This way, the number of nodes will reduce due to is possible to run more pods per node.

Conclusions

EKS Auto Mode with Karpenter offers a powerful and flexible way to manage Kubernetes nodes automatically. In this guide, we explored how it scales resources efficiently and how you can customize NodeClass and NodePool to meet specific requirements like using private subnets or selecting instance types.

⚠️ **GitHub.com Fallback** ⚠️