Kubernetes EKS - rajeshamdev/containers-orchestration GitHub Wiki

Intro and Core Concepts:

The word Kubernetes comes from the Greek word for helmsman or pilot. In kubernetes icon, we see a helm or a steering wheel of a ship. Kubernetes as the platform acting as the pilot, field workloads, while you set the destination.

Definition: Kubernetes is a portable, extensible, open-source platform for managing containerized workloads and services that facilitate both declarative configuration and automation.

k8 is going to help with the deployment, maintenance, and scaling of your applications. We can interact with k8 declaratively. A Kubernetes cluster runs on many nodes. To launch container - describe to k8 the desired state. Controllers observe the current state of the cluster and work to make the changes needed to bring the system to the desired state. The desired state is described using k8 objects, like Pods and services. States of these objects described in to a YAML file, and check these into source control. Having my objects stored in to source control opens the door for a lot of good DevOps-type practices. Source control now gives me a full paper trail of the history of changes made to the desired state.

The k8 platform has been designed to be extensible. There are multiple locations in k8 that can be configured to work with many extension points. The kubectl binary is how you spend most of your time with the k8 API. The kubectl can be extended with plugins, like inserting your own form of authentication. The API itself can be extended with a custom controller. With a custom controller, you add a new endpoint to the Kubernetes API, and now you work with your new resource declaratively, just as you would with in-built k8 resources. In fact, some of the core features of k8 are implemented as custom controllers.

AWS controllers from k8 is an open-source project from AWS, which has been implemented as a custom controller for AWS service resources. With AWS Controllers for k8 installed in a cluster, you can define and use AWS resources, like an S3 bucket, directly from Kubernetes. There are many more forms of extensions for scheduling, storage, and networking. k8 comes with a lot of power and flexibility. This also does come with a bit of a learning curve and some operational overhead. Amazon Elastic Kubernetes Service (EKS) fully managed, 100-percent-upstream implementation of k8 from AWS.

Core Concepts

In a k8 cluster, your containers are run on worker nodes. A cluster may have one or many worker nodes. When a k8 creating cluster in AWS, EC2 instances are used as nodes in cluster. If you are self-managing k8, you are also turning on hosts for your control plane. The control plane hosts the k8 API. You interact with the API. In turn, the control plane deploys and manages containers on the worker nodes. For a highly available k8s cluster, you'll need multiple control plane nodes to protect against failure. There are utilities, like kops and kubeadm, you can use to build a self-managed cluster.

Namespace:

A namespace is used to isolate groups of resources in a cluster. You can think of it as a cluster within a cluster. If two applications never interact with each other, it can make sense to create each application resource in different namespaces. This provides some isolation within a single cluster and prevents any naming clashes. Some k8 services themselves run in a special namespace.

pod:

A Pod is the smallest k8 object. A Pod represents one or more containers running on your cluster. One container per Pod is a pretty common use case. If application requires multiple tightly coupled containers, you can define a Pod with multiple containers. We scale applications by scaling the number of Pods for an application`. Scaling a multiple container Pod, we'll need to create new Pods with all the containers in the Pod. You may choose to do this for a sidecar pattern. A sidecar pattern involves moving some peripheral tasks, like proxying and logging, away from application code and into a container that is always deployed with your application.

You probably aren't creating Pods directly in a cluster yourself. Pods are considered ephemeral. If you ask k8 to create a Pod, the Pod will be scheduled to run on a node in your cluster. If the node exits, failed, or is removed to recover resources, it won't be launched again. Working with a workload resource, like a deployment or a job, will probably be a better match for your requirements.

Managing Pods with Deployment object

Load Balance across Pods using Service object

Summary of core concepts:

  • A k8 cluster is made up of worker nodes where your containers run
  • You interact with a k8 API inside the control plane nodes to describe the desired state of the containers and services in your cluster.
  • The control plane coordinates the containers running on worker nodes.
  • A Pod is the smallest deployable unit in Kubernetes. A Pod is home for one or more containers running on your worker nodes. You probably aren't directly creating Pods.
  • We can use a workload resource, like a deployment to describe to Kubernetes our requirements

From there, Kubernetes handles a lot of the operational work for you.

Horizontal Pod Autoscaler (HPA):

k8 automatically scales the number of Pods in a Deployment or ReplicaSet based on observed CPU utilization or other select metrics.

Here’s a sample YAML configuration for an HPA that scales a Deployment based on CPU usage:

apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
  name: example-hpa
  namespace: default
spec:
  scaleTargetRef:
  apiVersion: apps/v1
  kind: Deployment
  name: example-deployment  # Name of the Deployment to scale
minReplicas: 1  # Minimum number of Pods
maxReplicas: 10  # Maximum number of Pods
metrics:
- type: Resource
  resource:
    name: CPU
    target:
      type: Utilization
      averageUtilization: 50  # Target CPU utilization percentage

Breakdown of the Specification:

  • apiVersion: autoscaling/v2beta2: Specifies the API version used for the HorizontalPodAutoscaler. The v2beta2 version provides advanced metrics options.
  • kind: HorizontalPodAutoscaler: Indicates that this resource is an HPA.
  • metadata: Contains metadata about the HPA, including its name and namespace.
  • spec: Defines the desired behavior of the HPA.
    • scaleTargetRef: Specifies the resource to scale.
      • apiVersion: API version of the resource to scale.
      • kind: Type of resource to scale (e.g., Deployment, ReplicaSet).
      • name: Name of the resource to scale.
    • minReplicas: Minimum number of Pods to maintain.
    • maxReplicas: Maximum number of Pods to scale up to.
    • metrics: Defines the metrics to use for scaling.
      • type: Type of metric. Here, Resource indicates resource utilization metrics.
      • resource: Specifies the resource type.
        • name: Resource type (e.g., cpu, memory).
        • target: Defines the target value for the metric.
          • type: Type of target (e.g., Utilization for CPU or memory).
          • averageUtilization: The target average CPU utilization percentage across Pods.

k8 cluster auto scaler:

Kubernetes scaling and service discovery

Kubernetes is designed to scale with your workloads. You can scale in two different dimensions:

  • the number of Pods hosting your application
  • the numbers of nodes that form your cluster

pods scaling options:

  • HorizontalPodAutoscaler can observe metrics to increase (or decrease) the numbers of Pods for a service
  • VerticalPodAutoscaler can automatically adjust the resource requirements for your Pods to better inform the Kubernetes scheduler.

worker node scaling options:

Imagine that you are hosting a service over multiple Pods — the number of Pods might grow or shrink. How will dependent applications discover the location to communicate with your services?

Services inside a Kubernetes cluster are accessed through their clusterIP address. The clusterIP is a load balanced IP address. Traffic to this IP address will be forwarded to the matching Pods for the service. The clusterIP is proxied by the kube-proxy that runs on each of your Kubernetes nodes. kube-proxy contains rules that are updated as Pods and services are created and removed.

The clusterIP can be discovered by using environment variables set by k8 or by Domain Name System (DNS) based service discovery. For more information about discovering services, see Discovering services in the k8 documentation.

An application is run on multiple Pods across multiple nodes. k8 is built to scale with workloads in more than one dimension. We can scale:

  • Pods and
  • cluster nodes

As the demand for applications increases, applications may see OOM (out of memory) or over CPU utilization. In such cases, scale the application horizontally by increasing the replica count, or the number of Pods used by my application. HorizontalPodAutoscalerSpec is one solution to tell k8 the min and max number of replicas. The Horizontal Pod Autoscaler will grow or shrink the number of Pods to try and match your target metrics. The scheduler will scale the number of Pods using the current ratio of metric to the desired metric. For example, if four Pods are causing 80 percent CPU utilization, and we have a desired CPU utilization of 40 percent, 80 divided by 40 is two, so the algorithm will double the number of Pods.

We also have of a Vertical Pod Autoscaler, but before we talk about the Vertical Pod Autoscaler, let's talk about resource management with requests and limits.

A PodSpec contains one or more container definitions. You can also include a resource requirements objects in your container definitions. Resource requirements and resource requests are limits for CPU and memory. The request setting is used by the Kubernetes scheduler to find a node to run your Pod.

In the example, the request property is telling Kubernetes my container wants to run on a node where there is 200 MB of memory and 500 millicpu. The Kubernetes scheduler ensures my container will be placed on a node where the requested resources are available. A container can exceed the request resources. A container cannot exceed the limit resources. For example, your container requests 200 MB of memory and a limit of 400 MB of memory. The Kubernetes scheduler will find a node with enough memory to satisfy the 200 MB memory request. If your container exceeds the 400 MB limit, the container runtime will enforce the limit, and your container processes will be terminated within out-of-memory error.

You can set resource requests manually, or consider the Vertical Pod Autoscaler. The Vertical Pod Autoscaler observes usage for your containers and uses this to set the request value for your containers. A better-informed request setting will allow Kubernetes to better place your Pods within the cluster. We've said the Kubernetes scheduler looks for a node that has the capacity to meet your requested resources for a container. The resources in my cluster are not infinite. If I'm attempting to create a new Pod and there are insufficient resources, I will see a failure. We can now think about scaling on another dimension, the number of worker nodes. Like a lot of things in Kubernetes, there are many ways to approach scaling your cluster resources. We can talk about two different options. Kubernetes Cluster Autoscaler can automatically add nodes when the cluster is failing to launch Pods because the scheduler cannot find resources, and shrinks nodes when underutilization is detected. Kubernetes Autoscaler is part of the core Kubernetes project and runs in the cluster control plane. Kubernetes Cluster Autoscaler is cloud-provider aware. In AWS, EC2 Auto Scaling groups are used by the autoscaler to grow and shrink your worker nodes. Our second option is Karpenter. Karpenter is an open-source project in the AWS GitHub account. Like the Kubernetes Cluster Autoscaler, Karpenter detects Pod schedule errors and turns on nodes to host them. With Karpenter, you configure a provisioner that defines properties, like time to live seconds until expired, the period to expire, and terminate the node after launch. Time to live seconds after empty, the number of seconds a node with no Pods will run before it is scaled down. Requirements used to determine the type of node launch, for example, a list of instance types. Karpenter looks at scheduling constraints on your Pod definitions, and the requirements in the provisioner to turn on an instance that matches your requirements. Karpenter schedules and binds Pods to a node when the node is launched. This gives you an improvement over the Kubernetes Cluster Autoscaler for node startup latency.

Okay, we've gone over a few different options for scaling the Pods and the cluster nodes that are hosting your applications. If I'm hosting a service over multiple Pods that may grow and shrink, how will dependent applications discover the location to communicate with your services? This can be done with service discovery.

A service that is only accessed from inside the cluster will be created with a cluster IP. The cluster IP is a load balanced IP address. Traffic to this IP address will be forwarded to the matching Pods for the service. The cluster IP is proxied by the kube-proxy running on each of your Kubernetes nodes. Kube-proxy contains rules that are updated as the Pods and services are created and removed. It's important to note that we are relying on proxying to route our communications into Pods, not round-robin DNS. Round-robin DNS has historically caused issues with clients ignoring time-to-live settings and caching DNS records. How does a consuming service discover a services cluster IP? Kubernetes offers two ways to do this, environment variables and DNS records. Your Pods in a Kubernetes cluster will be created with environment variables with information about every active service in the cluster. If you are running a DNS service in your cluster, like the CoreDNS add-on, you can query for service information via DNS-based service discovery. Each service will have an A record that contains the cluster IP, and an SRV record, which contains information like priority, weight, and port number. So now, your applications running in a cluster have two ways to find a cluster IP for a service, with environment variables and DNS records. Remember traffic to a cluster IP is proxied to multiple Pods on multiple nodes via queue proxy running on your cluster nodes. When everything is configured to use service discovery, your applications don't need to worry about any of the details. Your applications can keep doing what they're doing, and Kubernetes will handle a lot of the networking for you.