EN_Kubernetes - somaz94/DevOps-Engineer GitHub Wiki

Kubernetes Deep Dive: Interview Q&A

A summary of Kubernetes concepts for DevOps engineers. Click each link for full details.


Glossary

Architecture & Components

  • kube-apiserver: Central entry point for all cluster operations — exposes the Kubernetes API
  • etcd: Distributed key-value store for all cluster data (Raft consensus algorithm)
  • kube-scheduler: Selects the optimal node to run each Pod
  • kube-controller-manager: Runs Deployment, ReplicaSet, and other controllers
  • kubelet: Manages Pod and container state on each node
  • kube-proxy: Implements service abstraction and manages network rules
  • Container Runtime: Software that runs containers (containerd, CRI-O)

Networking

  • ClusterIP: Virtual IP accessible only within the cluster (exists only in iptables/IPVS rules)
  • NodePort: Exposes a service on a specific port of each node
  • LoadBalancer: Provisions a cloud provider's external load balancer
  • ExternalTrafficPolicy: Controls external traffic routing (Cluster: all nodes, Local: only nodes with Pods)
  • CNI: Standard interface for container networking (Calico, Cilium, Flannel)
  • NetworkPolicy: Kubernetes resource to control Ingress/Egress traffic between Pods
  • Ingress Controller: Routes HTTP/HTTPS traffic to services (NGINX, Traefik, Istio)
  • CoreDNS: Cluster DNS server — resolves Service and Pod DNS names

Storage

  • PV (Persistent Volume): Cluster-level storage resource
  • PVC (Persistent Volume Claim): User's storage request — binds to a PV
  • StorageClass: Defines dynamic provisioning method
  • CSI (Container Storage Interface): Standard interface between orchestrators and storage systems
  • Retain / Delete: PV behavior after PVC deletion (Reclaim Policy)
  • WaitForFirstConsumer: Binds PV at Pod scheduling time (considers topology constraints)
  • Access Modes: RWO (single node R/W), ROX (multi-node read-only), RWX (multi-node R/W)

Scheduling & Resource Management

  • Node Affinity: Pod placement rules based on node labels
  • Taint / Toleration: Node restriction / Pod permission to schedule on tainted nodes
  • nodeSelector: Simple label-based node selection
  • Requests / Limits: Minimum guaranteed / maximum allowed resources
  • QoS Classes: Guaranteed > Burstable > BestEffort (BestEffort terminated first under pressure)
  • HPA: Scales Pod count automatically / VPA: Adjusts resource requests / Cluster Autoscaler: Scales node count
  • PriorityClass / Preemption: Define Pod priority and evict lower-priority Pods

Workload Resources

  • Deployment: Manages ReplicaSets and provides declarative updates
  • StatefulSet: Controller for stateful apps — stable network IDs and persistent PVCs
  • DaemonSet: Runs exactly one Pod on each (or selected) node
  • CronJob: Runs Jobs on a Cron schedule
  • RollingUpdate: Zero-downtime gradual update via maxUnavailable and maxSurge
  • Blue-Green / Canary: Full switch / partial traffic new-version testing

Security & Authentication

  • RBAC: Permission control via Role/ClusterRole + RoleBinding
  • ServiceAccount: Authentication credential for Pods to communicate with the API server
  • PSS (Pod Security Standards): Privileged / Baseline / Restricted security levels
  • PSA (Pod Security Admission): Enforces PSS with enforce / audit / warn modes
  • Secret: Stores sensitive data encoded in base64
  • imagePullSecrets: Private registry authentication for pulling images

Monitoring & Observability

  • Liveness Probe: Checks if a container is running (restarts on failure)
  • Readiness Probe: Checks if a container is ready for traffic (removed from Service on failure)
  • Startup Probe: Checks if an app has started (disables Liveness until success)
  • Metrics Server: Collects CPU/memory resource usage
  • Prometheus: Full-stack monitoring system
  • kubectl logs / describe / exec: Core debugging commands

Operations

  • Graceful Shutdown: SIGTERM → preStop Hook → terminationGracePeriodSeconds → SIGKILL
  • imagePullPolicy: Always / IfNotPresent / Never
  • Garbage Collection: Automatic cleanup of unused resources via Owner References and Finalizers
  • CRD (Custom Resource Definition): Defines custom resource types
  • Operator: Custom controller that automates application management

Troubleshooting

  • ImagePullBackOff / ErrImagePull: Image pull failure states
  • CrashLoopBackOff: Container repeatedly failing and restarting
  • OOMKilled: Container forcibly terminated due to memory limit exceeded
  • Pending: Pod not yet scheduled to a node
  • Terminating: Pod in the process of being deleted

Q&A List

Q1-Q3: Kubernetes Basics

Covers why Kubernetes is used (orchestration, scaling, HA, self-healing), the roles of control plane and node components, and the full Pod creation flow from API Server → etcd → Scheduler → kubelet → Container Runtime.

Details


Q4-Q5, Q11, Q13: RBAC, Security & Service Account

Explains the API Group structure (core vs named groups), RBAC with Role/ClusterRole/RoleBinding, Secret management, Pod Security contexts and capabilities, and how ServiceAccounts authenticate with the API server using tokens.

Details


Q6, Q11-Q21: Service Types & Advanced Networking

Describes Service types (ClusterIP, NodePort, LoadBalancer, ExternalName), ExternalTrafficPolicy (Cluster vs Local), ARP Proxy in same-node Pod communication, VXLAN encapsulation, iptables vs IPVS performance, kube-proxy chain structure, CoreDNS caching and ndots, Calico BGP and Route Reflectors, mTLS without Service Mesh, Ingress Controller comparison (NGINX/Traefik/Istio), AWS LoadBalancer Controller, source IP preservation, full packet flow, CNI Plugin IPAM, sidecar proxy iptables, MTU mismatch, NodePort SNAT, and Dual-Stack (IPv4/IPv6) configuration.

Details


Q7: Persistent Volumes & Storage

Covers the PV/PVC binding lifecycle, StorageClass dynamic provisioning, CSI drivers, access modes (RWO/ROX/RWX), reclaim policies (Retain/Delete), and volume snapshots.

Details


Q8-Q10, Q12, Q14-Q15: Workloads & Scheduling

Covers HPA/VPA/KEDA autoscaling, Liveness/Readiness/Startup Probe configuration, Affinity/Anti-Affinity/Taint/Toleration scheduling, the Operator pattern, CRD definitions, and Kubernetes Garbage Collection via Owner References and Finalizers.

Details


Q16-Q18: Operations & Deployment

Explains Graceful Shutdown (SIGTERM → preStop hook → terminationGracePeriodSeconds → SIGKILL), imagePullPolicy behavior (Always/IfNotPresent/Never), and deployment strategies including RollingUpdate, Recreate, Blue-Green, and Canary.

Details


Q31-Q35: Troubleshooting in Practice

Covers CPU Throttling and CFS Quota optimization, DNS resolution delays from ndots=5, ImagePullBackOff vs ErrImagePull and Private Registry authentication, Node NotReady diagnosis and recovery procedures, and resolving Persistent Volumes stuck in Terminating state.

Details


Q41-Q45: Monitoring & Observability

Covers Prometheus Service Discovery and relabel_configs, OpenTelemetry Collector architecture and distributed tracing, Kubernetes Events long-term retention strategies, Metrics Server vs Prometheus vs Custom Metrics API comparison, and Grafana Dashboard as Code with Jsonnet.

Details


Q46-Q50: Advanced Scheduling

Covers Pod Priority and Preemption strategies, Topology Spread Constraints vs Pod Anti-Affinity selection guide, Taint Effect types (NoSchedule/PreferNoSchedule/NoExecute), Scheduler Profiles and Multiple Schedulers use cases, and performance tuning for large-scale clusters (1000+ nodes).

Details


Reference

⚠️ **GitHub.com Fallback** ⚠️