Kubernetes Deep Dive: Interview Q&A

A summary of Kubernetes concepts for DevOps engineers. Click each link for full details.

Glossary

Architecture & Components

kube-apiserver: Central entry point for all cluster operations — exposes the Kubernetes API
etcd: Distributed key-value store for all cluster data (Raft consensus algorithm)
kube-scheduler: Selects the optimal node to run each Pod
kube-controller-manager: Runs Deployment, ReplicaSet, and other controllers
kubelet: Manages Pod and container state on each node
kube-proxy: Implements service abstraction and manages network rules
Container Runtime: Software that runs containers (containerd, CRI-O)

Networking

ClusterIP: Virtual IP accessible only within the cluster (exists only in iptables/IPVS rules)
NodePort: Exposes a service on a specific port of each node
LoadBalancer: Provisions a cloud provider's external load balancer
ExternalTrafficPolicy: Controls external traffic routing (Cluster: all nodes, Local: only nodes with Pods)
CNI: Standard interface for container networking (Calico, Cilium, Flannel)
NetworkPolicy: Kubernetes resource to control Ingress/Egress traffic between Pods
Ingress Controller: Routes HTTP/HTTPS traffic to services (NGINX, Traefik, Istio)
CoreDNS: Cluster DNS server — resolves Service and Pod DNS names

Storage

PV (Persistent Volume): Cluster-level storage resource
PVC (Persistent Volume Claim): User's storage request — binds to a PV
StorageClass: Defines dynamic provisioning method
CSI (Container Storage Interface): Standard interface between orchestrators and storage systems
Retain / Delete: PV behavior after PVC deletion (Reclaim Policy)
WaitForFirstConsumer: Binds PV at Pod scheduling time (considers topology constraints)
Access Modes: RWO (single node R/W), ROX (multi-node read-only), RWX (multi-node R/W)

Scheduling & Resource Management

Node Affinity: Pod placement rules based on node labels
Taint / Toleration: Node restriction / Pod permission to schedule on tainted nodes
nodeSelector: Simple label-based node selection
Requests / Limits: Minimum guaranteed / maximum allowed resources
QoS Classes: Guaranteed > Burstable > BestEffort (BestEffort terminated first under pressure)
HPA: Scales Pod count automatically / VPA: Adjusts resource requests / Cluster Autoscaler: Scales node count
PriorityClass / Preemption: Define Pod priority and evict lower-priority Pods

Workload Resources

Deployment: Manages ReplicaSets and provides declarative updates
StatefulSet: Controller for stateful apps — stable network IDs and persistent PVCs
DaemonSet: Runs exactly one Pod on each (or selected) node
CronJob: Runs Jobs on a Cron schedule
RollingUpdate: Zero-downtime gradual update via maxUnavailable and maxSurge
Blue-Green / Canary: Full switch / partial traffic new-version testing

Security & Authentication

RBAC: Permission control via Role/ClusterRole + RoleBinding
ServiceAccount: Authentication credential for Pods to communicate with the API server
PSS (Pod Security Standards): Privileged / Baseline / Restricted security levels
PSA (Pod Security Admission): Enforces PSS with enforce / audit / warn modes
Secret: Stores sensitive data encoded in base64
imagePullSecrets: Private registry authentication for pulling images

Monitoring & Observability

Liveness Probe: Checks if a container is running (restarts on failure)
Readiness Probe: Checks if a container is ready for traffic (removed from Service on failure)
Startup Probe: Checks if an app has started (disables Liveness until success)
Metrics Server: Collects CPU/memory resource usage
Prometheus: Full-stack monitoring system
kubectl logs / describe / exec: Core debugging commands

Operations

Graceful Shutdown: SIGTERM → preStop Hook → terminationGracePeriodSeconds → SIGKILL
imagePullPolicy: Always / IfNotPresent / Never
Garbage Collection: Automatic cleanup of unused resources via Owner References and Finalizers
CRD (Custom Resource Definition): Defines custom resource types
Operator: Custom controller that automates application management

Troubleshooting

ImagePullBackOff / ErrImagePull: Image pull failure states
CrashLoopBackOff: Container repeatedly failing and restarting
OOMKilled: Container forcibly terminated due to memory limit exceeded
Pending: Pod not yet scheduled to a node
Terminating: Pod in the process of being deleted

Q&A List

Q1-Q3: Kubernetes Basics

Covers why Kubernetes is used (orchestration, scaling, HA, self-healing), the roles of control plane and node components, and the full Pod creation flow from API Server → etcd → Scheduler → kubelet → Container Runtime.

→ Details

Q4-Q5, Q11, Q13: RBAC, Security & Service Account

Explains the API Group structure (core vs named groups), RBAC with Role/ClusterRole/RoleBinding, Secret management, Pod Security contexts and capabilities, and how ServiceAccounts authenticate with the API server using tokens.

→ Details

Q6, Q11-Q21: Service Types & Advanced Networking

Describes Service types (ClusterIP, NodePort, LoadBalancer, ExternalName), ExternalTrafficPolicy (Cluster vs Local), ARP Proxy in same-node Pod communication, VXLAN encapsulation, iptables vs IPVS performance, kube-proxy chain structure, CoreDNS caching and ndots, Calico BGP and Route Reflectors, mTLS without Service Mesh, Ingress Controller comparison (NGINX/Traefik/Istio), AWS LoadBalancer Controller, source IP preservation, full packet flow, CNI Plugin IPAM, sidecar proxy iptables, MTU mismatch, NodePort SNAT, and Dual-Stack (IPv4/IPv6) configuration.

→ Details

Q7: Persistent Volumes & Storage

Covers the PV/PVC binding lifecycle, StorageClass dynamic provisioning, CSI drivers, access modes (RWO/ROX/RWX), reclaim policies (Retain/Delete), and volume snapshots.

→ Details

Q8-Q10, Q12, Q14-Q15: Workloads & Scheduling

Covers HPA/VPA/KEDA autoscaling, Liveness/Readiness/Startup Probe configuration, Affinity/Anti-Affinity/Taint/Toleration scheduling, the Operator pattern, CRD definitions, and Kubernetes Garbage Collection via Owner References and Finalizers.

→ Details

Q16-Q18: Operations & Deployment

Explains Graceful Shutdown (SIGTERM → preStop hook → terminationGracePeriodSeconds → SIGKILL), imagePullPolicy behavior (Always/IfNotPresent/Never), and deployment strategies including RollingUpdate, Recreate, Blue-Green, and Canary.

→ Details

Q31-Q35: Troubleshooting in Practice

Covers CPU Throttling and CFS Quota optimization, DNS resolution delays from ndots=5, ImagePullBackOff vs ErrImagePull and Private Registry authentication, Node NotReady diagnosis and recovery procedures, and resolving Persistent Volumes stuck in Terminating state.

→ Details

Q41-Q45: Monitoring & Observability

Covers Prometheus Service Discovery and relabel_configs, OpenTelemetry Collector architecture and distributed tracing, Kubernetes Events long-term retention strategies, Metrics Server vs Prometheus vs Custom Metrics API comparison, and Grafana Dashboard as Code with Jsonnet.

→ Details

Q46-Q50: Advanced Scheduling

Covers Pod Priority and Preemption strategies, Topology Spread Constraints vs Pod Anti-Affinity selection guide, Taint Effect types (NoSchedule/PreferNoSchedule/NoExecute), Scheduler Profiles and Multiple Schedulers use cases, and performance tuning for large-scale clusters (1000+ nodes).

→ Details

EN_Kubernetes - somaz94/DevOps-Engineer GitHub Wiki

Kubernetes Deep Dive: Interview Q&A

Glossary

Architecture & Components

Networking

Storage

Scheduling & Resource Management

Workload Resources

Security & Authentication

Monitoring & Observability

Operations

Troubleshooting

Q&A List

Q1-Q3: Kubernetes Basics

Q4-Q5, Q11, Q13: RBAC, Security & Service Account

Q6, Q11-Q21: Service Types & Advanced Networking

Q7: Persistent Volumes & Storage

Q8-Q10, Q12, Q14-Q15: Workloads & Scheduling

Q16-Q18: Operations & Deployment

Q31-Q35: Troubleshooting in Practice

Q41-Q45: Monitoring & Observability

Q46-Q50: Advanced Scheduling

Reference

⚠️ GitHub.com Fallback ⚠️

EN_Kubernetes - somaz94/DevOps-Engineer GitHub Wiki

Kubernetes Deep Dive: Interview Q&A

Glossary

Architecture & Components

Networking

Storage

Scheduling & Resource Management

Workload Resources

Security & Authentication

Monitoring & Observability

Operations

Troubleshooting

Q&A List

Q1-Q3: Kubernetes Basics

Q4-Q5, Q11, Q13: RBAC, Security & Service Account

Q6, Q11-Q21: Service Types & Advanced Networking

Q7: Persistent Volumes & Storage

Q8-Q10, Q12, Q14-Q15: Workloads & Scheduling

Q16-Q18: Operations & Deployment

Q31-Q35: Troubleshooting in Practice

Q41-Q45: Monitoring & Observability

Q46-Q50: Advanced Scheduling

Reference

⚠️ **GitHub.com Fallback** ⚠️

⚠️ GitHub.com Fallback ⚠️