kubernetes autoscaling vpe hpa - ghdrako/doc_snipets GitHub Wiki

Vertical Pod Autoscaler (VPA)

Dla aplikacji Statefulset - wiaze sie ze restartem poda

The VPA controller observes the resource usage of an application. Then, using that usage information as a baseline, VPA recommends a lower bound, an upper bound, and target values for resource requests for those application pods.

Depending on how you configure VPA, it can either:

  • Apply the recommendations directly by updating/recreating the pods (updateMode = auto).
  • Store the recommended values for reference (updateMode = off).
  • Apply the recommended values to newly created pods only (updateMode = initial).

Keep in mind that updateMode = auto is ok to use in testing or staging environments but not in production. The reason is that the pod restarts when VPA applies the change, which causes a workload disruption.

We should set updateMode = off in production, feed the recommendations to a capacity monitoring dashboard such as Grafana, and apply the recommendations in the next deployment cycle.


  • Deployment
apiVersion: apps/v1
  kind: Deployment
    name: nginx-deployment
      app: nginx
    replicas: 2
        app: nginx
          app: nginx
        - name: nginx
          image: nginx:1.7.8
          - containerPort: 80
  • VPA resources
apiVersion: autoscaling.k8s.io/v1beta1
  kind: VerticalPodAutoscaler
    name: nginx-deployment-vpa
      apiVersion: "apps/v1"
      kind:       Deployment
      name:       nginx-deployment
      updateMode: "Off"

Note that the update mode is set to off. Once the configuration is applied, get the VPA recommendations by using the kubectl describe vpa nginx-deployment-vpa command.

The recommended resource requests will look like this:

  - containerName: nginx
      cpu: 40m
      memory: 3100k
      cpu: 60m
      memory: 3500k
      cpu: 831m
      memory: 8000k

VPA Limitations:

  1. VPA is not aware of Kubernetes cluster infrastructure variables such as node size in terms of memory and CPU. Therefore, it doesn't know whether a recommended pod size will fit your node. This means that the resource requests recommendation may be too large to fit any node, and therefore pods may go to a pending state because the resource request can’t be met. Some cloud providers such as GKE provide a cluster autoscaler to spin up more worker nodes addressing pod pending issues, but if the Kubernetes environment has no cluster autoscaler feature, then pods will remain pending, causing downtime.

2._ VPA does not support StatefulSets yet. The problem is scaling pods in StatefulSet is not simple._ Neither starting nor restarting can be done the way it’s done for a Deployment or ReplicaSet. Instead, the pods in StatefulSet are managed in a well-defined order. For example, a Postgres DB StatefulSet will first deploy the master pod and then deploy the slave or replication pods. The master pod can’t be simply replaced with just any other pod.

  1. In Kubernetes, the pod spec is immutable. This means that the pod spec can't be updated in place. To update or change the pod resource request, VPA needs to evict the pod and re-create it. This will disrupt your workload. As a result, running VPA in auto mode isn’t a viable option for many use cases. Instead, it is used for recommendations that can be applied manually during a maintenance window.

  2. VPA won't work with HPA using the same CPU and memory metrics because it would cause a race condition. Suppose HPA and VPA both use CPU and memory metrics for scaling decisions. HPA will try to scale out (horizontally) based on CPU and memory, while at the same time, VPA will try to scale the pods up (vertically). Therefore if you need to use both HPA and VPA together, you must configure HPA to use a custom metric such as web requests.

  3. VPA is not yet ready for JVM-based workloads. This shortcoming is due to its limited visibility into memory usage for Java virtual machine workloads,

  4. The performance of VPA is untested on large-scale clusters. Therefore, performance issues may occur when using VPA at scale. This is another reason why it’s not recommended to use VPA within large production environments.

  5. VPA doesn’t consider network and I/O. This is an important issue since ignoring I/O throughout (for writing to disk), and network bandwidth usage can cause application slow-downs and outages.

  6. VPA uses limited historical data. VPA requires eight days of historical data storage before it’s initiated. The limited use of only eight days of data would miss monthly, quarterly, annual, and seasonal fluctuations that could cause bottlenecks during peak usage.

  7. VPA requires configuration for each cluster. If you manage a dozen or more clusters, you would have to manage separate configurations for each cluster. More sophisticated optimization tools provide a governance workflow for approving and unifying configurations across multiple clusters.

  8. VPA policies lack flexibility. VPA uses a resource policy to control resource computations, and an update policy to control how to apply changes to Pods. The policy functionality is however limited. For example, the resource policy sets a higher and a lower value calculated based on historical CPU and memory measurements aggregated into percentiles (e.g., 95 percentile) and you can’t choose a more sophisticated machine learning algorithm to predict usage.

Horizontal Pod Autoscaler (HPA):

To use HPAs, the Kubernetes Metrics API must be available.

To deploy the latest release of the Kubernetes Metrics Server, do this:

$ kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml

Then verify that the deployment is ready:

$ kubectl get deployment metrics-server -n kube-system

W deploymencie powstanie nowa replica set i przepina nody

apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
  namespace: dev
  maxReplicas: 3
  minReplicas: 1
  apiVersion: apps/v1
  kind: Deployment
targetCPUutilizationPercentage: 70
kubectl -n hpa-test autoscale deployment php-apache --cpu-percent=50 --min=1 --max=5
  horizontalpodautoscaler.autoscaling/php-apache autoscaled
apiVersion: autoscaling/v1
  kind: HorizontalPodAutoscaler
   name: php-apache
   namespace: hpa-test
     apiVersion: apps/v1
     kind: Deployment
     name: php-apache
   minReplicas: 1
   maxReplicas: 10
   targetCPUUtilizationPercentage: 50


First, create an app—a PHP environment and server—that you can use as the target of the HPA:

$ kubectl create deployment appserver --image=registry.k8s.io/hpa-example --port 80
$ kubectl expose deployment appserver --port=80 --target-port=80
$ kubectl set resources deployment appserver -c=hpa-example --requests=cpu=200m

Next, create an HPA and define the trigger parameter --cpu-percent=40, which means that the CPU utilization should not exceed 40%:

$ kubectl autoscale deployment appserver --cpu-percent=40 --min=1 --max=5
$ kubectl get hpa --watch
appserver Deployment/appserver 1%/40% 1 5 1 2m29s

In a second terminal session, keep an eye on the deployment:

$ kubectl get deploy appserver --watch

Finally, in a third terminal session, launch the load generator:

$ kubectl run -i -t loadgen --rm --image=busybox:1.36 --restart=Never -- \
/bin/sh -c "while sleep 0.01; do wget -q -O- http://appserver; done"

Kubernetes HPA Limitations

  • HPA only works for stateless applications that support running multiple instances in parallel. Additionally, HPA can be used with stateful sets that rely on replica pods. For applications that can’t be run as multiple pods, HPA cannot be used.
  • HPA (and VPA) don’t consider IOPS, network, and storage in their calculations, exposing applications to the risk of slowdowns and outages.
  • HPA still leaves the administrators with the burden of identifying waste in the Kubernetes cluster created by the reserved but unused requested resources at the container level. Detecting container usage inefficiency is not addressed by Kubernetes and requires third-party tooling powered by machine learning.

Cluster Autoscaler (CA):

Moze zawisnąć na aplikacjach które są źle zaprojektowanie i nie chcą się zatrzymać

is supported by the major cloud platforms. Cluster Autoscaling is a “cloud-only” feature because on-prem deployments lack the APIs for automatic virtual machine creation and deletion required for the autoscaling process. Each cloud provider has its own implementation of Cluster Autoscaler with different limitations.

For example, on GKE the command below enables Cluster Autoscaler on a multi-zone cluster with a one-node per zone minimum and four-node per zone maximum:

gcloud container clusters create example-cluster 
    --num-nodes 2 
    --zone us-central1-a 
    --node-locations us-central1-a,us-central1-b,us-central1-f 
    --enable-autoscaling --min-nodes 1 --max-nodes 4
$ gcloud container clusters create supersizeme --zone=us-west1-a \
--machine-type=e2-small --num-nodes=1 \
--min-nodes=1 --max-nodes=3 --enable-autoscaling
