EN_Kubernetes - somaz94/DevOps-Engineer GitHub Wiki
A summary of Kubernetes concepts for DevOps engineers. Click each link for full details.
- kube-apiserver: Central entry point for all cluster operations — exposes the Kubernetes API
- etcd: Distributed key-value store for all cluster data (Raft consensus algorithm)
- kube-scheduler: Selects the optimal node to run each Pod
- kube-controller-manager: Runs Deployment, ReplicaSet, and other controllers
- kubelet: Manages Pod and container state on each node
- kube-proxy: Implements service abstraction and manages network rules
- Container Runtime: Software that runs containers (containerd, CRI-O)
- ClusterIP: Virtual IP accessible only within the cluster (exists only in iptables/IPVS rules)
- NodePort: Exposes a service on a specific port of each node
- LoadBalancer: Provisions a cloud provider's external load balancer
- ExternalTrafficPolicy: Controls external traffic routing (Cluster: all nodes, Local: only nodes with Pods)
- CNI: Standard interface for container networking (Calico, Cilium, Flannel)
- NetworkPolicy: Kubernetes resource to control Ingress/Egress traffic between Pods
- Ingress Controller: Routes HTTP/HTTPS traffic to services (NGINX, Traefik, Istio)
- CoreDNS: Cluster DNS server — resolves Service and Pod DNS names
- PV (Persistent Volume): Cluster-level storage resource
- PVC (Persistent Volume Claim): User's storage request — binds to a PV
- StorageClass: Defines dynamic provisioning method
- CSI (Container Storage Interface): Standard interface between orchestrators and storage systems
- Retain / Delete: PV behavior after PVC deletion (Reclaim Policy)
- WaitForFirstConsumer: Binds PV at Pod scheduling time (considers topology constraints)
- Access Modes: RWO (single node R/W), ROX (multi-node read-only), RWX (multi-node R/W)
- Node Affinity: Pod placement rules based on node labels
- Taint / Toleration: Node restriction / Pod permission to schedule on tainted nodes
- nodeSelector: Simple label-based node selection
- Requests / Limits: Minimum guaranteed / maximum allowed resources
- QoS Classes: Guaranteed > Burstable > BestEffort (BestEffort terminated first under pressure)
- HPA: Scales Pod count automatically / VPA: Adjusts resource requests / Cluster Autoscaler: Scales node count
- PriorityClass / Preemption: Define Pod priority and evict lower-priority Pods
- Deployment: Manages ReplicaSets and provides declarative updates
- StatefulSet: Controller for stateful apps — stable network IDs and persistent PVCs
- DaemonSet: Runs exactly one Pod on each (or selected) node
- CronJob: Runs Jobs on a Cron schedule
- RollingUpdate: Zero-downtime gradual update via maxUnavailable and maxSurge
- Blue-Green / Canary: Full switch / partial traffic new-version testing
- RBAC: Permission control via Role/ClusterRole + RoleBinding
- ServiceAccount: Authentication credential for Pods to communicate with the API server
- PSS (Pod Security Standards): Privileged / Baseline / Restricted security levels
- PSA (Pod Security Admission): Enforces PSS with enforce / audit / warn modes
- Secret: Stores sensitive data encoded in base64
- imagePullSecrets: Private registry authentication for pulling images
- Liveness Probe: Checks if a container is running (restarts on failure)
- Readiness Probe: Checks if a container is ready for traffic (removed from Service on failure)
- Startup Probe: Checks if an app has started (disables Liveness until success)
- Metrics Server: Collects CPU/memory resource usage
- Prometheus: Full-stack monitoring system
- kubectl logs / describe / exec: Core debugging commands
- Graceful Shutdown: SIGTERM → preStop Hook → terminationGracePeriodSeconds → SIGKILL
- imagePullPolicy: Always / IfNotPresent / Never
- Garbage Collection: Automatic cleanup of unused resources via Owner References and Finalizers
- CRD (Custom Resource Definition): Defines custom resource types
- Operator: Custom controller that automates application management
- ImagePullBackOff / ErrImagePull: Image pull failure states
- CrashLoopBackOff: Container repeatedly failing and restarting
- OOMKilled: Container forcibly terminated due to memory limit exceeded
- Pending: Pod not yet scheduled to a node
- Terminating: Pod in the process of being deleted
Covers why Kubernetes is used (orchestration, scaling, HA, self-healing), the roles of control plane and node components, and the full Pod creation flow from API Server → etcd → Scheduler → kubelet → Container Runtime.
→ Details
Explains the API Group structure (core vs named groups), RBAC with Role/ClusterRole/RoleBinding, Secret management, Pod Security contexts and capabilities, and how ServiceAccounts authenticate with the API server using tokens.
→ Details
Describes Service types (ClusterIP, NodePort, LoadBalancer, ExternalName), ExternalTrafficPolicy (Cluster vs Local), ARP Proxy in same-node Pod communication, VXLAN encapsulation, iptables vs IPVS performance, kube-proxy chain structure, CoreDNS caching and ndots, Calico BGP and Route Reflectors, mTLS without Service Mesh, Ingress Controller comparison (NGINX/Traefik/Istio), AWS LoadBalancer Controller, source IP preservation, full packet flow, CNI Plugin IPAM, sidecar proxy iptables, MTU mismatch, NodePort SNAT, and Dual-Stack (IPv4/IPv6) configuration.
→ Details
Covers the PV/PVC binding lifecycle, StorageClass dynamic provisioning, CSI drivers, access modes (RWO/ROX/RWX), reclaim policies (Retain/Delete), and volume snapshots.
→ Details
Covers HPA/VPA/KEDA autoscaling, Liveness/Readiness/Startup Probe configuration, Affinity/Anti-Affinity/Taint/Toleration scheduling, the Operator pattern, CRD definitions, and Kubernetes Garbage Collection via Owner References and Finalizers.
→ Details
Explains Graceful Shutdown (SIGTERM → preStop hook → terminationGracePeriodSeconds → SIGKILL), imagePullPolicy behavior (Always/IfNotPresent/Never), and deployment strategies including RollingUpdate, Recreate, Blue-Green, and Canary.
→ Details
Covers CPU Throttling and CFS Quota optimization, DNS resolution delays from ndots=5, ImagePullBackOff vs ErrImagePull and Private Registry authentication, Node NotReady diagnosis and recovery procedures, and resolving Persistent Volumes stuck in Terminating state.
→ Details
Covers Prometheus Service Discovery and relabel_configs, OpenTelemetry Collector architecture and distributed tracing, Kubernetes Events long-term retention strategies, Metrics Server vs Prometheus vs Custom Metrics API comparison, and Grafana Dashboard as Code with Jsonnet.
→ Details
Covers Pod Priority and Preemption strategies, Topology Spread Constraints vs Pod Anti-Affinity selection guide, Taint Effect types (NoSchedule/PreferNoSchedule/NoExecute), Scheduler Profiles and Multiple Schedulers use cases, and performance tuning for large-scale clusters (1000+ nodes).
→ Details