Kubernetes: Real‐World Examples - pcont/aws_sample GitHub Wiki
Kubernetes (K8s) for DevOps Architects: Real-World Examples
Let me enhance the explanation with real-world examples for each component to help you better understand how they function in production environments.
Control Plane Components in Action
API Server
Real-world example: When you run kubectl apply -f deployment.yaml
, the kubectl command sends a REST request to the API Server, which validates your YAML manifest before storing it in etcd. During a Black Friday sale, when your team needs to scale up services quickly, all those scaling commands are processed through this component.
etcd
Real-world example: In a multinational e-commerce platform, etcd stores critical information like which Pods are running on which nodes, what Services exist, and their configurations. If your team updates a Deployment from 5 to 10 replicas, etcd maintains this desired state, even if the control plane temporarily fails.
Scheduler
Real-world example: For a machine learning platform, when data scientists submit new model training jobs as Pods with GPU requirements, the Scheduler identifies nodes with available GPUs and assigns the Pods accordingly, considering resource constraints and affinity rules.
Controller Manager
Real-world example: In a banking application, if a node running payment processing Pods suddenly fails, the Node Controller (part of Controller Manager) detects the failure, marks the node as unhealthy, and the Deployment Controller ensures new Pods are scheduled to maintain the desired state of payment services.
Cloud Controller Manager
Real-world example: In an AWS-hosted Kubernetes cluster, when you create a LoadBalancer Service for your customer-facing API, the Cloud Controller Manager provisions an AWS Elastic Load Balancer automatically and configures it to route traffic to your Service.
Node Components in Practice
Kubelet
Real-world example: During a rolling update of a Netflix-like streaming service, the Kubelet on each node receives instructions to terminate old version Pods and start new ones. It handles the graceful shutdown procedures, ensuring in-flight requests complete before container termination.
Kube-proxy
Real-world example: In a microservices architecture, when your "user-profile" service needs to communicate with the "payment-history" service, kube-proxy maintains the network rules that allow this internal communication using either iptables or IPVS on each node.
Container Runtime
Real-world example: In a CI/CD pipeline environment, after developers push new code, containerd (a common container runtime) pulls the newly built container images from your private Docker registry and runs them within Pods according to defined resource limits.
Kubernetes Objects in Production
Pod
Real-world example: At Spotify, individual microservices like the "playlist-manager" might run as Pods, with the main application container paired with a sidecar container that handles metrics collection for observability.
Service
Real-world example: In a SaaS application, the "authentication" Service maintains a stable endpoint (auth.internal.service) that other services can reliably call, even as the underlying authentication Pods scale up and down or get redeployed during updates.
Volume
Real-world example: For a media processing application, a temporary Volume might be mounted to multiple containers in a Pod - one container downloads media files, another processes them, and a third uploads the processed files to cloud storage.
Namespace
Real-world example: A financial services company might create separate Namespaces for "trading", "reporting", and "customer-portal" teams, each with their own resource quotas and access permissions to maintain separation of concerns.
Deployment
Real-world example: For an e-commerce website, the frontend application runs as a Deployment with 10 replicas across multiple nodes. During a new feature release, DevOps engineers perform a rolling update with kubectl set image deployment/frontend frontend=v2.1.3
, progressively replacing old Pods with new ones.
StatefulSet
Real-world example: A MongoDB replica set in production would be deployed as a StatefulSet named "mongodb" with 3 replicas, ensuring each MongoDB instance gets a predictable name (mongodb-0, mongodb-1, mongodb-2) and persistent storage that follows the Pod if it's rescheduled to another node.
DaemonSet
Real-world example: Datadog's monitoring agent runs as a DaemonSet, ensuring every node in your cluster has exactly one monitoring Pod that collects metrics, logs, and traces from all containers running on that node.
Job/CronJob
Real-world example: A retail company might use a CronJob to run inventory reconciliation at midnight, while another Job might be triggered after a product import to regenerate search indexes.
Ingress
Real-world example: A media company uses an NGINX Ingress Controller to route traffic based on path and hostname: requests to api.example.com go to the API service, while web.example.com routes to the frontend service, with TLS termination handled automatically.
Networking Solutions
Real-world example: A large financial institution might choose Calico as their CNI plugin because it supports network policies for security isolation between banking, investment, and insurance services running on the same cluster.
Storage in Action
Persistent Volumes (PV)
Real-world example: In a medical imaging application, a 500GB PV provisioned on high-performance AWS EBS volumes stores scan data that must persist even when the processing Pods are restarted or rescheduled.
Persistent Volume Claims (PVC)
Real-world example: A content management system's database might use a PVC requesting 100GB of storage with specific performance characteristics, which gets bound to an appropriately sized PV by the cluster.
Storage Classes
Real-world example: An enterprise might define Storage Classes like "fast-ssd" (using NVMe drives) for databases, "standard-hdd" for backups, and "replicated-storage" for critical data, allowing teams to choose the appropriate storage type for their workloads.
Security Implementations
Authentication
Real-world example: A healthcare organization integrates Kubernetes with their existing Active Directory using OIDC, allowing developers to authenticate to the cluster using their corporate credentials.
Authorization (RBAC)
Real-world example: In a multi-tenant platform, the Platform team creates specific Roles like "developer", "operator", and "auditor" with increasing levels of permissions, then assigns these roles to users through RoleBindings.
Admission Control
Real-world example: A regulated industry uses the PodSecurityPolicy admission controller to enforce that all Pods must run as non-root users and cannot mount the host filesystem, preventing potential security breaches.
Network Policies
Real-world example: In a payment card processing environment, Network Policies ensure that only the authorized "payment-processor" Pods can communicate with the "card-vault" Pods, and only on specific ports.
Secret Management
Real-world example: A B2B SaaS application stores API keys, database credentials, and encryption keys as Kubernetes Secrets, which are then mounted as environment variables or files in the appropriate Pods.
Advanced Implementations
Service Mesh
Real-world example: Lyft uses Envoy (the basis for many service meshes) to handle inter-service communication, providing circuit breaking, rate limiting, and observability without changing application code.
GitOps
Real-world example: Weaveworks (creators of Flux) manage their own infrastructure using GitOps principles - infrastructure changes must be committed to Git, and automated controllers reconcile the cluster state with the Git repository state.
Operators
Real-world example: The Prometheus Operator automates the deployment and management of Prometheus monitoring instances, handling details like configuration, persistent storage, and high availability setups in a Kubernetes-native way.
flowchart TD
subgraph "Control Plane Components"
api[API Server\nProcesses kubectl commands] --> etcd[etcd\nStores cluster state]
api --> scheduler[Scheduler\nAssigns ML training jobs to GPU nodes]
api --> cm[Controller Manager\nRestores payment Pods after node failure]
api --> ccm[Cloud Controller Manager\nProvisions AWS ELB for Services]
end
subgraph "Worker Node 1"
kubelet1[Kubelet\nManages container lifecycle] --> container1[Container Runtime\nRuns containerd/Docker]
kp1[Kube Proxy\nManages iptables rules] --> container1
container1 --> pod11[Pod: Payment Service\nMain + Sidecar containers]
container1 --> pod12[Pod: User Profile API\nClaims 2 CPU, 4GB RAM]
end
subgraph "Worker Node 2"
kubelet2[Kubelet\nEnforces resource limits] --> container2[Container Runtime\nPulls images from registry]
kp2[Kube Proxy\nEnables Service discovery] --> container2
container2 --> pod21[Pod: MongoDB-0\nStatefulSet member]
container2 --> pod22[Pod: Datadog Agent\nFrom DaemonSet]
end
api <--> kubelet1
api <--> kubelet2
api <--> kp1
api <--> kp2
User[DevOps Engineer] --> api
subgraph "External Components"
dns[CoreDNS\nResolves service.namespace.svc.cluster.local]
ingress[NGINX Ingress Controller\nRoutes traffic by hostname]
lb[AWS Load Balancer\nProvided by Cloud Controller]
end
api --> dns
api --> ingress
ingress --> lb
lb --> External[External Traffic\nCustomer requests]
subgraph "Storage & Persistence"
sc[Storage Classes\nfast-ssd, standard-hdd]
pv[Persistent Volumes\n500GB EBS volume]
pvc[PVC\nRequested by MongoDB StatefulSet]
end
api --> sc
sc --> pv
pv --> pvc
pvc --> pod21
subgraph "Real Business Workloads"
dep[Deployment: E-commerce Frontend\n10 replicas with rolling updates]
ss[StatefulSet: MongoDB Cluster\n3 ordered replicas]
ds[DaemonSet: Logging Agent\nOne per node]
cj[CronJob: Nightly Backup\nRuns at 2 AM]
end
api --> dep
api --> ss
api --> ds
api --> cj
DevOps Best Practices with Examples
Infrastructure as Code
Real-world example: At Monzo Bank, the entire Kubernetes infrastructure is defined in Terraform modules and versioned in Git. When they need to create a new environment, they simply apply the same code with different variables.
CI/CD Integration
Real-world example: At Shopify, when developers merge code to the main branch, their CI pipeline automatically builds container images, runs security scans, updates Kubernetes manifests with the new image tag, and applies the changes to a staging cluster before promoting to production.
Monitoring and Observability
Real-world example: A ride-sharing company uses Prometheus to scrape metrics from all services, Grafana dashboards to visualize performance, and Jaeger to trace requests as they flow from the mobile app through the backend services to the driver matching algorithm.
Disaster Recovery
Real-world example: Netflix regularly tests their disaster recovery procedures by using tools like Velero to back up their entire Kubernetes cluster state and restore it to a different region, ensuring they can recover from region-wide outages.
Resource Management
Real-world example: An AI company sets memory requests and limits for their model training Pods based on profiling data, and implements horizontal pod autoscaling to handle variable loads, optimizing cluster resource utilization.
Multi-environment Strategy
Real-world example: Zalando maintains separate Kubernetes clusters for development, staging, and production, but uses the same Helm charts with environment-specific values to ensure consistency between environments.
Real-world Challenges and Solutions
Complexity
Real-world example: Airbnb initially struggled with Kubernetes complexity, so they started with a managed EKS service and gradually built expertise before adding custom components and optimizations.
Networking
Real-world example: A global gaming company with strict latency requirements chose Cilium as their CNI for its eBPF-based performance optimizations and integrated service mesh capabilities.
Stateful Applications
Real-world example: Shopify runs MySQL databases on Kubernetes using Vitess Operator, which handles sharding, connection pooling, and failover, demonstrating that even complex stateful workloads can thrive in Kubernetes with the right architecture.
Scalability
Real-world example: During Black Friday, an e-commerce platform uses Horizontal Pod Autoscalers based on custom metrics (order queue length) to scale checkout services independently from product catalog services, handling 20x normal traffic efficiently.
Security
Real-world example: A cryptocurrency exchange uses Gatekeeper (OPA) to enforce security policies across all deployments, automatically rejecting any Pod that tries to run as root or mount sensitive host paths, while scanning all images for vulnerabilities before deployment.
Would you like me to elaborate further on any specific real-world implementation or provide more detailed examples for a particular component?