https://medium.com/infrastructure-adventures/vertical-pod-autoscaler-deep-dive-limitations-and-real-world-examples-9195f8422724

Vertical Pod Autoscaler deep dive, limitations and real-world examples

Unfortunately, there’s a lack of good, useful examples on the Internet regarding how the Kubernetes Vertical Pod Autoscaler (VPA) actually works; since I already had to spend a great deal of my time to figure things out, I hope this will prove useful to others as well.

Photo by Ronit Shaked on Unsplash

I’m assuming you already have the VPA software installed and running on your cluster. I’m also assuming you’re familiar with horizontal vs. vertical scaling.

Architecture

This application adds a custom resource called VerticalPodAutoscaler and registers an admission webhook in the K8s API. If this VPA object is defined for the pod, the hook will change its requests and limits in the pod template, according to the VPA recommender.

There are 3 Deployments running:

VPA admission hook

Every pod submitted to the cluster goes through this webhook automatically which checks whether a VerticalPodAutoscaler object is referencing this pod or one of its parents (a ReplicaSet, a Deployment, etc.)

VPA recommender

Connects to the metrics-server application in the cluster, fetches historical and current usage data (CPU and memory) for each VPA-enabled pod and generates recommendations for scaling up or down the requests and limits of these pods.

VPA updater

Runs every 1 minute. If a pod is not running in the calculated recommendation range, it evicts the currently running version of this pod, so it can restart and go through the VPA admission webhook which will change the CPU and memory settings for it, before it can start. Only the pod’s runtime settings will be changed.

The original Deployment specification will be untouched, so practically there will be a divergence with what the Deployment defines vs. what’s running in reality.

But this has a nice side-effect: for example, if you’re using Argo CD, it won’t detect any diff in the Deployment specification.

Requests vs. limits

In Kubernetes, every scheduling decision is always made based on the resource requests. Whatever number you put there, the scheduler will use it to allocate place for your pod. Optionally, you can configure resource limits, but these are never used by the scheduler. This is just a hard limit for the Kubelet to know when to throttle or kill your pod if it exceeds these cgroup limits. So since the only truly important things is the requests parameter, the Vertical Pod Autoscaler will always work with this. Whenever you define vertical autoscaling for your app, you are defining what the requests should be. It is possible to set a range for the autoscaling: minimum and maximum values, for the requests. So what happens to the limits parameter of your pod? Of course they will be also adapted, when you touch the requests line. The VPA will proportionally scale limits. Here’s an example: Your pod’s default settings: requests: cpu: 50m memory: 100Mi limits: cpu: 200m memory: 250Mi The recommendation engine will determine that you need 120m CPU and 300Mi memory for your pod to work correctly. So it will come up with the following new settings: requests: cpu: 120m memory: 300Mi limits: cpu: 480m memory: 750Mi As mentioned above, this is proportional scaling: in your default manifest, you had the following requests to limits ratio: CPU: 50m → 200m: 1:4 ratio memory: 100Mi → 250Mi: 1:2.5 ratio So when you get a scaling recommendation, it will respect and keep the same ratio you originally configured, and proportionally set the new values based on your original ratio. So if you want to ensure your memory limit never goes above 250Mi on your pod, here are some ideas: (always) configure minimum and maximum values for the request recommendation use 1:1 request-to-limit ratio, so even if you get the maximum request, the limit won’t go above either any combination of the above, play with the ratios But don’t forget, your limits are almost irrelevant, as the scheduling decision (and therefore, resource contention) will be always done based on the requests. Limits are only useful when there's resource contention or when you want to avoid uncontrollable memory leaks. How to write VPA manifests Never define more than 1 VPA object for the same Pod / ReplicaSet / Deployment / StatefulSet — the behaviour is unpredictable in such cases. Don’t use Horizontal Pod Autoscaler and Vertical Pod Autoscaler on the same pods neither. Recommendation mode (dry run) You need to define a new VPA object with updateMode: off, targeting your application: apiVersion: autoscaling.k8s.io/v1beta2 kind: VerticalPodAutoscaler metadata: name: kube-resource-report-recommender spec: targetRef: apiVersion: "apps/v1" kind: Deployment name: kube-resource-report updatePolicy: updateMode: "Off" After 2–3 minutes, you will be able to query the data: $ kubectl describe vpa kube-ops-view-recommender Name: kube-resource-report-recommender Namespace: medium API Version: autoscaling.k8s.io/v1beta2 Kind: VerticalPodAutoscaler [...] Spec: Target Ref: API Version: apps/v1 Kind: Deployment Name: kube-resource-report Update Policy: Update Mode: Off Status: Conditions: Last Transition Time: 2020-01-28T16:07:38Z Status: True Type: RecommendationProvided Recommendation: Container Recommendations: Container Name: kube-resource-report Lower Bound: Cpu: 12m Memory: 247917233 Target: Cpu: 63m Memory: 380258472 Uncapped Target: Cpu: 63m Memory: 380258472 Upper Bound: Cpu: 137m Memory: 561393834 Container Name: nginx Lower Bound: Cpu: 12m Memory: 131072k Target: Cpu: 12m Memory: 131072k Uncapped Target: Cpu: 12m Memory: 131072k Upper Bound: Cpu: 16m Memory: 131072k Events: You will see the following blocks: Container name: A Uncapped Target: what would be the resource request configured on your pod if you didn’t configure upper limits in the VPA definition. Target: this will be the actual amount configured at the next execution of the admission webhook. (If it already has this config, no changes will happen (your pod won’t be in a restart/evict loop). Otherwise, the pod will be evicted and restarted using this target setting.) Lower Bound: when your pod goes below this usage, it will be evicted and downscaled. Upper Bound: when your pod goes above this usage, it will be evicted and upscaled. Container name: B … same as above, container by container. TIP: Use recommendation mode even if you don’t want autoscaling! It is useful to have automatic prediction coming from actual data, then you can tune your hard-coded settings to those. It is not necessary to enable actual autoscaling, even if you keep collecting data and recommendations. Blacklisting containers from scaling Let’s say, I have a Prometheus deployment, which has 2 containers: the Prometheus app itself and a config reloader sidecar. I don’t want to have autoscaling for the sidecar pod: apiVersion: autoscaling.k8s.io/v1beta2 kind: VerticalPodAutoscaler metadata: name: prometheus-recommender spec: targetRef: apiVersion: "apps/v1" kind: Deployment name: prometheus-server updatePolicy: updateMode: "Off" resourcePolicy: containerPolicies: - containerName: "configmap-reload" mode: "Off" You can also set containerName: "*" and use as a catch-all, configure some other parameters for everything, then individually blacklist the rest what you don't want to autoscale. Configuring minimum and maximum values Continuing with the previous example, now I want to define minimum and maximum limits for the autoscaler: apiVersion: autoscaling.k8s.io/v1beta2 kind: VerticalPodAutoscaler metadata: name: prometheus-recommender spec: targetRef: apiVersion: "apps/v1" kind: Deployment name: prometheus-server updatePolicy: updateMode: "Off" resourcePolicy: containerPolicies: - containerName: "prometheus" minAllowed: cpu: "300m" memory: "512Mi" maxAllowed: cpu: "1800m" memory: "3600Mi" - containerName: "configmap-reload" mode: "Off" Reminder: the VPA minimum and maximum range settings are always for the requests parameter of your pod. Enable autoscaling Use updateMode: Auto. You can use all the rest of the settings (minimum, maximum, container selector, etc.) already in recommendation mode. (So the only difference is dry-run vs. actually doing the changes.) Limitations The minimum memory allocation the VPA can recommend is 250 MiB. If your requests is smaller, it will be automatically increased to fit this minimum. This minimum value can be globally configured though. This setting basically guides you not to use vertical autoscaler for too small applications. The Vertical Pod Autoscaler cannot be used on individual pods which don’t have an owner (= they are not part of a Deployment, etc.). By default you do not have the Prometheus integration enabled, which allows to watch for OOM events and fetch more historical performance data. Following the example above, we want to use VPA to scale Prometheus itself, so unless you have persistent storage configured for the metrics store of Prometheus, once it gets restarted, we would lose all the historical, useful data which VPA expects for making scaling decisions. The Service name must be vpa-webhook (and all the components must be deployed in kube-system if you generate the certs with the built-in scripts), due to some hardcoded things. Once this PR is released in a >0.6.3 version, we can use registerByURL+ webhookAddress, webhookPort flags in the admission-controller deployment. Make sure the SSL certificate accepts your URL as a CN! You can only enable vertical scaling on components which have at least 2 healthy replicas running. In case you have some apps which are not able to run in parallel, yet you want to autosize them, there’s a nice trick you can do in the vpa-updater Deployment: [...] spec: containers: - name: updater args: # These 2 lines are default: https://github.com/kubernetes/autoscaler/blob/vertical-pod-autoscaler-0.6.3/vertical-pod-autoscaler/pkg/updater/Dockerfile#L22 - --v=4 - --stderrthreshold=info # Allow Deployments with only 1 replica to be restarted with new settings - --min-replicas=1 [...] Something we learnt the hard way: once you deploy and enable the Vertical Pod Autoscaler in your cluster, everything goes through its webhook. Every single pod creation/restart/… event! So guess what happens when your VPA Admission Webhook deployment is down or extremely slow… None of your pods will be able to start, even if they don’t use vertical autoscaling at all! (This is not specific to VPA, it’s true for any Admission Webhooks.) There was a bug in Kubernetes which prevented the controller-manager to start your pods if one of these webhooks stopped responding — it did not handle timeouts correctly. Luckily now it is fixed, however it is only backported until Kubernetes 1.15 or above. If you use something older than this (I’m looking at you, latest EKS 1.14!), you might have a problem.

Metrics, monitoring If you enable scraping on the following ports (define them in the Deployment + add the Prometheus annotation), you can have some pretty nice internal metrics available: admission-controller: tcp/8944 vpa-updater: tcp/8943 vpa-recommender: tcp/8942 These are the metrics I found useful, as of writing this article: Recommender

HELP vpa_recommender_aggregate_container_states_count Number of aggregate container states being tracked by the recommender

TYPE vpa_recommender_aggregate_container_states_count gauge

HELP vpa_recommender_execution_latency_seconds Time spent in various parts of VPA Recommender main loop.

TYPE vpa_recommender_execution_latency_seconds histogram

HELP vpa_recommender_recommendation_latency_seconds Time elapsed from creating a valid VPA configuration to the first recommendation.

TYPE vpa_recommender_recommendation_latency_seconds histogram

HELP vpa_recommender_vpa_objects_count Number of VPA objects present in the cluster.

TYPE vpa_recommender_vpa_objects_count gauge

Updater

HELP vpa_updater_evicted_pods_total Number of Pods evicted by Updater to apply a new recommendation.

TYPE vpa_updater_evicted_pods_total counter

HELP vpa_updater_execution_latency_seconds Time spent in various parts of VPA Updater main loop.

TYPE vpa_updater_execution_latency_seconds histogram

Admission controller So far nothing implemented, it just listens with an empty default Prometheus endpoint. I hope some of the above mentioned things helped to clarify how the Vertical Pod Autoscaler works in the background and the hands-on examples proved useful. I personally really love VPA and I think it’s a very powerful tool, if used correctly. If you have ever had the chance to integrate it with Prometheus and have some first-hand experience, would be happy to hear about your insights!

VPA Medium Article - q-uest/notes-doc-k8s-docker-jenkins-all-else GitHub Wiki

Vertical Pod Autoscaler deep dive, limitations and real-world examples

Architecture

Requests vs. limits

HELP vpa_recommender_aggregate_container_states_count Number of aggregate container states being tracked by the recommender

TYPE vpa_recommender_aggregate_container_states_count gauge

HELP vpa_recommender_execution_latency_seconds Time spent in various parts of VPA Recommender main loop.

TYPE vpa_recommender_execution_latency_seconds histogram

HELP vpa_recommender_recommendation_latency_seconds Time elapsed from creating a valid VPA configuration to the first recommendation.

TYPE vpa_recommender_recommendation_latency_seconds histogram

HELP vpa_recommender_vpa_objects_count Number of VPA objects present in the cluster.

TYPE vpa_recommender_vpa_objects_count gauge

HELP vpa_updater_evicted_pods_total Number of Pods evicted by Updater to apply a new recommendation.

TYPE vpa_updater_evicted_pods_total counter

HELP vpa_updater_execution_latency_seconds Time spent in various parts of VPA Updater main loop.

TYPE vpa_updater_execution_latency_seconds histogram

⚠️ GitHub.com Fallback ⚠️

VPA Medium Article - q-uest/notes-doc-k8s-docker-jenkins-all-else GitHub Wiki

Vertical Pod Autoscaler deep dive, limitations and real-world examples

Architecture

Requests vs. limits

HELP vpa_recommender_aggregate_container_states_count Number of aggregate container states being tracked by the recommender

TYPE vpa_recommender_aggregate_container_states_count gauge

HELP vpa_recommender_execution_latency_seconds Time spent in various parts of VPA Recommender main loop.

TYPE vpa_recommender_execution_latency_seconds histogram

HELP vpa_recommender_recommendation_latency_seconds Time elapsed from creating a valid VPA configuration to the first recommendation.

TYPE vpa_recommender_recommendation_latency_seconds histogram

HELP vpa_recommender_vpa_objects_count Number of VPA objects present in the cluster.

TYPE vpa_recommender_vpa_objects_count gauge

HELP vpa_updater_evicted_pods_total Number of Pods evicted by Updater to apply a new recommendation.

TYPE vpa_updater_evicted_pods_total counter

HELP vpa_updater_execution_latency_seconds Time spent in various parts of VPA Updater main loop.

TYPE vpa_updater_execution_latency_seconds histogram

⚠️ **GitHub.com Fallback** ⚠️

⚠️ GitHub.com Fallback ⚠️