Resource Allocation Tuning - bcgov/common-service-showcase GitHub Wiki

Resource Allocation Tuning and Management

One of the key aspects of managing any containerized services and applications in a Kubernetes/Openshift platform is understanding the resource footprint your application has in terms of CPU and RAM usage. As container orchestration platforms tend to be multi-tenanted and will likely have many other applications running alongside your services, you will need to have a reasonable understanding of how your application or service will impact the node it is on.

Key Concepts

Resources are bounded by Requests and Limits. A resource request is the guaranteed minimum quantity of the resource that is reserved for the pod. Resource requests are used by the container orchestrator in order to answer scheduling questions within the cluster. A resource limit is the upper limit for what a pod can consume during a burst period if the node has extra capacity.

The general rule of thumb is to set your requests as low as possible to still allow your application to work well, but have a minimal footprint on the cluster, and to set your limit to the upper bound of anticipated spikes in workload. By requesting little, but allowing for reasonable limits, your application leaves a smaller footprint on the cluster, while still being able to use more resources when there is a spike in load.

It is important when setting resource requests and limits that you specify both of them explicitly. In the event you specify only a limit and no request, the platform will default your request to be equal to the limit. This will usually reserve an unreasonable amount of CPU and memory for your application and is not desirable. On the other hand, specifying a request without a limit will set the limit to be equal to the request, potentially starving and throttling your application's performance.

Additional Material

For a more in-depth look at resource management concepts, take a look at the following resources:

Observations

While each application deployed on a containerized environment will have its own unique CPU and memory utilization footprint, most processes will fall into one of a few categories:

  • Low CPU, Low Memory: A decent majority of microservices and appliances will fit into this category. While they may have an initial CPU and memory spike on startup, they normally settle down afterwards and take up very little resources afterwards. Things such as Nginx, Caddy, and small node.js and Python applications will usually fit in this profile.
  • Low CPU, High Memory: These applications tend to be written in Java. While their general behavior is similar to other microservices, they unfortunately use up a noticeable amount of memory due to the JVM. Applications such as Metabase and parts of the Elastic stack will fit under this category.
  • High CPU, Low Memory: Some microservices and applications may have a high computation load, but not use too much memory. Generally these applications will have a low CPU utilization while the application is idling, and then spike in use when there are requests. If applications fall under this category, you will want to ensure a reasonably low request, but have a generous enough limit to permit the compute spikes to function without being throttled.
  • High CPU, High Memory: Generally applications that fall under this category are very bursty in nature. A good example of this would be Jenkins jobs where they are tasked with testing, building and deploying applications potentially from a PR-based pipeline. In these scenarios, since we know the resource spikes are generally time limited, the use of the time-bound quota may be more appropriate than the standard long-running one.

Fundamentally, you will need to have a general grasp of how your application will behave in the short term, as well as its general behavior over a longer period of time in order to derive accurate request and limits for your deployment.

Recommendations

Below are some notes on some commonly used applications and their utilization patterns:

Databases

  • MongoDB: Relatively lightweight but appears unable to fully idle on CPU.

    • Observation: 160-200mb, 0.006-0.008 cores
    • Recommendation: Memory Request: 192mb, Memory Limit: 512mb, CPU Request: 50 millicores, CPU Limit: 500 millicores.
  • Patroni: All PSQL databases appear to have some degree of CPU burn, likely from the Patroni agent doing periodic synchronization checks.

    • Observation: 200-280mb, 0.02-0.04 cores used per replica
    • Recommendation: Memory Request: 256mb, Memory Limit: 512mb, CPU Request: 50 millicores, CPU Limit: 500 millicores.
  • Redis: Lightweight utilization - directly scales with number of keys in memory

    • Observation: 10-30mb, 0.007-0.008 cores
    • Recommendation: Memory Request: 64mb, Memory Limit: 256mb, CPU Request: 25 millicores, CPU Limit: 500 millicores.

Utilities

  • Caddy: Go application - extremely lightweight

    • Observation: 4-50mb, 0.001-0.003 cores
    • Recommendation: Memory Request: 8mb, Memory Limit: 256mb, CPU Request: 50 millicores, CPU Limit: 500 millicores.
  • Jenkins: Java application - spike behavior when working on jobs. Raise limits based on expected workload.

    • Observation: 300-900mb steady, 0.004-0.008 cores
    • Recommendation: Memory Request: 512mb, Memory Limit: 2048mb, CPU Request: 100 millicores, CPU Limit: 2 cores.
  • Metabase: Java application - relatively steady utilization outside of pod startup.

    • Observation: 640-750mb, 0.005-0.009 cores
    • Recommendation: Memory Request: 768mb, Memory Limit: 1536mb, CPU Request: 50 millicores, CPU Limit: 500 millicores.
  • SonarQube: Java application - high memory usage, minor spiky behavior with CPU.

    • Observation: 1700-2900mb, 0.015-0.025 cores
    • Recommendation: Memory Request: 1536mb, Memory Limit: 3072mb, CPU Request: 50 millicores, CPU Limit: 500 millicores.