EKS in A Cloud Guru playground - lago-morph/chiller GitHub Wiki

Deprecated

This is obsolete, and should only be referred to for historical curiosity. This is the current up-to-date version.

Old stuff

Create a new playground on A Cloud Guru.

Create cluster using AWS CLI and Terraform on local Linux machine.

On local Linux machine:

aws configure

And type in key info from playground launch page.

Make sure to only use valid EC2 instance types in the playground, and not create too many instances (<= 9). Needs to be t3.large or similar (see this page in the EC2 section).

(this takes 12-17 minutes)

cd ~/chiller-iac/exp/aws
terraform init
terraform apply --auto-approve

A resource to refer to later: AWS EKS Terraform Workshop.

aws eks update-kubeconfig --region $(terraform output -raw region) --name $(terraform output -raw eks_cluster_name)

Here is a great writeup on how EKS networking really works, in terms of endpoints connecting the worker nodes and the control plane, and providing access to the control plane from outside via kubectl.

Ok, now we have the basic Kubernetes cluster running on AWS, and we can control it from our local Linux machine.

In order to finish automating the environment instantiation process the next steps are (using Terraform):

  • Install the AWS Load Balancer Controller to have AWS provision an ALB whenever you create a k8s ingress resource
  • Install the Prometheus Controller and Grafana
  • Create a configuration file as part of the Grafana install to create the chiller dashboard in Grafana
  • Create first draft of the IaC documentation in chiller-doc

Then get so we can install chiller in its own environment (default namespace) using Terraform:

  • Modify the Chiller helm files to create an ingress upon installation (does the LBC support gateway api?)
  • Install chiller helm file using terraform

Then set up the simulated load part:

  • Maybe this should come from another VPC to simulate the entire path, including IGW and ALB
  • Create locust container
  • Deploy to simulate load. This does not have to be part of a k8s cluster - it could be in ECS or maybe even a Lambda function if the tests are short enough

Then create version that only does simulated load, in its own namespace, for testing. Perhaps each namespaced instance gets its own ALB (and corresponding IP)

  • Modify grafana, prometheus operator, and chiller helm chart to support independent testing of multiple instances of the application in different namespaces
  • Install Argo Rollouts and set up tests querying prometheus
  • GitHub Action to automate this process according the to CDel design

Having difficulty...

At some point my IaC Terraform config for AWS EKS just stopped working.

Here is the diagnostics I do to see that pods can talk to each other (works fine in Minikube)

kubectl run nginx --image=nginx
kubectl expose pod nginx --port=80
kubectl run -it --rm test --image=busybox -- /bin/sh

Then in the test pod:

wget -O - -q nginx

This should get the default index.html for nginx. For some reason, this doesn't work with my EKS Terraform configuration. Note that this DOES work if I first forward the ngingx service to my local machine using kubectl port-forward, then run it locally. Going to follow the below instructions to create an EKS cluster using eksctl and CloudFormation, and see if it works there. Then work my way backwards I guess.

Note, I got the following message while installing:

2024-11-11 02:19:19 [!]  recommended policies were found for "vpc-cni" addon, but since OIDC is disabled on the cluster, eksctl cannot configure the requested permissions; the recommended way to provide IAM permissions for "vpc-cni" addon is via pod identity associations; after addon creation is completed, add all recommended policies to the config file, under `addon.PodIdentityAssociations`, and run `eksctl update addon`

Note that with this cluster the test procedure worked.

One difference I'm noting is that this cluster has only the public endpoint enabled, rather than both public and private. Even when changing this setting, the test routine still worked.

Try deleting this cluster, then creating with Terraform. See if there is some left over roles or something that made it work before. I also removed the .terraform directory. Maybe one of the providers is wonky? Commented out the providers except for AWS. I may have restructured the providers.tf directory since the last time it worked.

Test routine did not work on my mainline TF config, even with a bunch of stuff commented out.

Also, created another cluster using a tutorial AWS EKS configuration. Maybe one will work and the other won't, giving me a clue. It also does not work with the tutorial terraform config. WTF is happening.

I will bring up both clusters, the one with CloudFormation that seems to work, and the one with Terraform that seems to not work, and compare configs in side-by-side windows.

Aaaand... the problem is that the default node-to-node security group was only open on port 443 (among some others), but not port 80. Great. So I opened that port, and now the diagnostic routine works. Sigh.

Create cluster using CloudShell and eksctl Log into account with new incognito window. Open up a CloudShell. AWS CLI is already configured properly. If using a remote terminal, will need to use `aws configure` and type in the credentials from the playground launch page.

Check the kubectl version (it should already be installed).

kubectl version --client

You will match the version of kubectl to the version you use for EKS. Optionally can update kubectl using these instructions.

Install eksctl

# for ARM systems, set ARCH to: `arm64`, `armv6` or `armv7`
ARCH=amd64
PLATFORM=$(uname -s)_$ARCH
curl -sLO "https://github.com/eksctl-io/eksctl/releases/latest/download/eksctl_$PLATFORM.tar.gz"
tar -xzf eksctl_$PLATFORM.tar.gz -C /tmp && rm eksctl_$PLATFORM.tar.gz
sudo mv /tmp/eksctl /usr/local/bin

Ideally we would also create a custom role for the cluster, but the quickstart doesn't have that step.

Install Helm

curl -fsSL -o get_helm.sh https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3
chmod 700 get_helm.sh
./get_helm.sh
helm repo add bitnami https://charts.bitnami.com/bitnami
helm repo add eks https://aws.github.io/eks-charts
helm repo update eks

Create a file called cluster-config.yaml with the following contents (note this will target us-east-1):

apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig

metadata:
  name: web-quickstart
  region: us-east-1

managedNodeGroups:
  - name: eks-mng
    instanceType: t3.medium
    desiredCapacity: 2

iam:
  withOIDC: true
  serviceAccounts:
  - metadata:
      name: aws-load-balancer-controller
      namespace: kube-system
    wellKnownPolicies:
      awsLoadBalancerController: true

addons:
  - name: aws-ebs-csi-driver
    wellKnownPolicies: # Adds an IAM service account
      ebsCSIController: true

cloudWatch:
 clusterLogging:
   enableTypes: ["*"]
   logRetentionInDays: 30

Then create the cluster (this takes about 20 minutes):

eksctl create cluster -f cluster-config.yaml

Install the aws-load-balancer-controller

export CLUSTER_REGION=us-east-1
export CLUSTER_VPC=$(aws eks describe-cluster --name web-quickstart --region $CLUSTER_REGION --query "cluster.resourcesVpcConfig.vpcId" --output text)       
helm install aws-load-balancer-controller eks/aws-load-balancer-controller \
    --namespace kube-system \
    --set clusterName=web-quickstart \
    --set serviceAccount.create=false \
    --set region=${CLUSTER_REGION} \
    --set vpcId=${CLUSTER_VPC} \
    --set serviceAccount.name=aws-load-balancer-controller

Install the 2048 game

kubectl create namespace game-2048 --save-config
kubectl apply -n game-2048 -f https://raw.githubusercontent.com/kubernetes-sigs/aws-load-balancer-controller/v2.8.0/docs/examples/2048/2048_full.yaml
kubectl get ingress ingress-2048 -n game-2048 -o json | jq .status.loadBalancer.ingress[0].hostname | sed -e "s/\"//g"

It will take a few minutes to provision the load balancer. You can then open up the game using the ADDRESS field from the kubectl get ingress -n game-2048 command.

Continue here after interview https://docs.aws.amazon.com/eks/latest/userguide/quickstart.html#quickstart-persist-data

What this cloud formation template creates:

  • VPC

  • 4 subnets in 2 AZs (public and private in each)

  • 2 Route tables for private subnets

  • 2 Routes for private subnets pointing at NAT gateway

  • 2 Route table associations for these routes for private subnets

  • 1 Public route pointing at IGW

  • 1 Public route table

  • 2 Route table associations for public routes for public subnets

  • NAT Gateway

  • Elastic IP for NAT gateway

  • Internet Gateway

  • IGW Attachment to VPC

  • IAM role for EKS control plane, includes AWS policies AmazonEKSClusterPolicy and AmazonEKSVPCResourceController, allows the EKS service to assume this role

  • EKS Cluster

  • Security group for Nodes

  • Security group for cluster

  • SG ingress rule for nodes to talk to each other

  • SG ingress rule for nodes to talk to control plane

  • SG ingress rule for control plane to talk to nodes

  • Stuff for load balancer controller (not enumerated right now)

  • Role for CNI - allows (I think) service account to do something on AWS (the action is sts:AssumeRoleWithWebIdentity) - role includes policy AmazonEKS_CNI_Policy

  • Role for nodes with AWS managed roles AmazonEC2ContainerRegistryReadOnly, AmazonEKSWorkerNodePolicy and AmazonSSMManagedInstanceCore

  • Managed node group for nodes, placed in the two public subnets, AMI is AL2023_x86_64_STANDARD, "ClusterName" is name of EKS cluster

  • Launch template for nodes with user data as follows:

#!/bin/bash

set -o errexit
set -o pipefail
set -o nounset

touch /run/xtables.lock
⚠️ **GitHub.com Fallback** ⚠️