Power vs Performance - HewlettPackard/LinuxKI GitHub Wiki

LinuxKI Warning

Power Savings vs. Performance on Linux
Date: 07/15/2016

Power Savings vs. Performance on Linux

Saving power is often thought to lower performance, and some white papers recommend disabling all power savings features in order to maximize performance. But for certain workloads, disabling all power savings features can actually lower performance, at the same time increasing power and cooling costs. So it is important to understand the effect of the various power savings features before you decide whether or not to disable some or all of them.

This article describes many of the most common power savings features available on most modern x86 systems. It describes these features in terms of the impact power savings vs. performance. This article does not go into discussions of how to implement or disable the various power savings features as that topic it is discussed in numerous documents, and it may vary based on Linux version, processor version, and BIOS/UEFI version.

There are two important factors when discussing power savings and performance - CPU idle latencies controlled by processors C-states, and CPU clock frequencies controlled by P-states.

CPU Idle latencies (C-states)

C-states are HW sleep states that power down different portions of the CPU peripheral circuitry and have multiple levels - each corresponding to greater power savings but also incurring longer latencies to wake up (return to C-state 0). A processor enters a C-state (cstate > 0) when it is idle through the MWAIT instruction. How do these C-states affect system performance if the processor is idle anyway? The key is the latency, or the amount of time, for the processor to wakeup after it calls MWAIT. The latency for each C-state can be seen by examining the /sys/devices/system/cpu/cpu*/cpuidle/state*/latency file. It should be the same for each processor on a specific system. Below is an example that displays the logical cstate, the Intel HW cstate, and the cstate latency time in usecs or microseconds:

$ cd /sys/devices/system/cpu/cpu0/cpuidle
$ for state in `ls -d state*` ; do echo c-$state `cat $state/name` `cat $state/latency` ; done
c-state0 C0 0
c-state1 C1-IVB 1
c-state2 C3-IVB 59
c-state3 C6-IVB 80 

Consider a program that performs an I/O to a low latency device, such as an SSD disk or PCIe flash drive. If an I/O completes on a CPU that just entered cstate3 above, then it could be another 80 usecs before the I/O completion is processed and the waiting thread is placed on a CPU RunQ for execution. If that target CPU has also just entered cstate3, it could be another 80 usecs before the CPU wakes up and starts to run the thread that was waiting on the I/O.

If the I/O typically takes 8 milliseconds, the extra 80 or 160 usecs may not be noticed. However, if the I/O is to a low latency device, such as a SSD driver or a PCIe flash drive, or perhaps a fast network request, the extra C-state latency of 59 or 80 usecs can have a big impact on performance. However, the 1 usec latency offered by cstate1 may not be noticeable.

The above is an example of why many technical resources recommend disabling C-states as the performance of low latency applications may be adversely affected.

CPU clock frequencies (P-states)

There is another side effect of allowing CPUs to enter the deeper C-states - The CPUs consume less power. A processor core can also use less power when it runs at a lower clock frequency. However, if some processor cores use less power while they are idle or running at lower clock frequencies, other processor cores on the same processor can use more power to run at higher core clock frequencies, essentially running faster. This is a feature commonly referred to as Intel (R) Turbo Boost Technology. P-states control the CPU clock frequency within a range of settings supported by the CPU. The adjustments of speed are usually load-related and are not real-time, and maybe slow to adapt depending on the processor and Linux OS version.

Consider the Intel E5-2699 v3 processor. It has a base CPU clock frequency of 2.3 GHz and a maximum Turbo Boost frequency of 3.6 GHz. If power savings are disabled, the CPU core frequency will be capped and in some cases remain static. For example, capping the maximum C-state to 1 sets the CPU core frequency to 2.6 GHz whether the system is idle or not. That's better than the base frequency of 2.3 GHz, but far from the CPU's maximum Turbo Boost frequency.

Consider a workload where there are one or two very CPU-intensive tasks executing on a processor. If the tasks can run at 3.6 GHz, they will run faster and complete sooner than if they ran at 2.6 GHz. However, a processor would not be able to run every core at 3.6 GHz due to the heat generated and the demand for power determined by its thermal design power (TDP) specification. This is the basic premise of Intel Turbo Boost technology - it allows some CPU cores to consume more power and run at higher clock speeds while other CPU cores are idle or running at lower clock speeds.

What's best for performance?

The answer to this question is "It depends". Here's a few points to keep in mind...

  • To reach the maximum Turbo Boost frequency, you need to allow deep C-states (above C-state 1), and enable CPU core frequency scaling (P-states)

  • A processor cannot run all cores at the maximum Turbo Boost frequency. The CPU frequencies will be capped due to power and heat concerns.

  • For extremely low latency workloads, limiting the system to a maximum C-state of 0 is best. Using cstate1 is often sufficient.

  • CPU frequency scaling (P-states) can hurt performance if a core is non-idle but running at a low frequency.

  • Disabling CPU frequency scaling can limit the maximum Turbo Boost frequency.

  • If a core is not idle, C-states are not a factor, but frequency is. Given the above, here's a few recommendations:

  • If the application workload causes only a few cores on a processor to remain busy, performance can be improved when using maximum power savings as busy CPUs can benefit from the higher Turbo Boost frequencies.

  • For ultra-high latency sensitive applications, disabling C-states (max_cstate=0) works best, but remember that this can limit the CPU frequency.

  • For many workloads with CPU usage spread across CPUs that need a balance of low latency with high performance, disabling P-states and setting the maximum C-state to 1 (max_cstate=1) is a good compromise. This allows some power savings along with some amount of Turbo Boost frequency, although you cannot reach the maximum frequency.

  • For benchmarks that try to drive the CPUs to maximum capacity, disabling CPU frequency scaling (P-states) and using max_cstate=1 works well as CPUs would not be able to run at max Turbo Boost frequencies anyway as the CPU cores would rarely enter an idle state.

  • For non-performance sensitive environments, full power savings is recommended to reduce power usage and heat in the IT datacenter.

How to set the C-states and P-states?

That's a good question and one that is a bit of a moving target. Sometimes, these can be controlled by BIOS settings, and sometimes by the OS. So the short answer is to check your BIOS (RBSU) or iLO settings for your server model as well as your Linux OS version for the settings specific to your system. A few general rules:

  • Setting the system to Static High Performance at the BIOS/RBSU or in the iLO will disable P-states and CPUs will run at the base CPU frequency with the ability run at higher Turbo Boost frequencies.
  • C-states can be disabled at the BIOS/RBSU. Be aware that the Linux OS can override the BIOS settings if the intel_idle driver is used.
  • C-states can be disabled or capped via the kernel boot string (grub).
  • C-states can also be controlled at the OS layer using utilities such as cpupower
  • P-states and C-states can be controlled at the OS layer as well via one of the following:
  • cpufreq driver using one of the governors - Performance, Powersave, On-Demand, Conservative. Be careful when using the On-Demand governor as it often negatively impacts performance. The Performance governor usually works best
  • tuned profiles - "throughput-performance" can maximize the Turbo Boost frequency with P-states and C-states enabled. The "latency-performance" and "network-latency" profiles disable P-states and sets the maximum C-state to 1.
  • cpupower command - can set CPU frequency and potentially the maximum C-state.
  • There are 2 idle drivers - acpi_idle and intel_idle. The intel_idle driver is the more common default on most modern Linux versions today. The intel_idle driver will ignore the C-state settings at the BIOS layer.
  • Note that adding the string "intel_idle.max_cstate=0" to the kernel boot parameters disables the intel_idle driver rather than limiting it to C-state 0.

How do I tell what my system is doing regarding idle latencies (C-states) and CPU frequencies?

There are several tools that can be used to identify. One popular tool is turbostat, which gives output similar to the following:

pk cor CPU    %c0  GHz  TSC SMI    %c1    %c3    %c6    %c7 CTMP PTMP   %pc2   %pc3   %pc6   %pc7
             0.91 3.05 2.60   0   2.96   0.06   0.00  96.08   48   48   0.00   0.00   0.00   0.00
 0   0   0   8.76 3.23 2.58  40  10.73   0.00   0.00  80.51   48   48   0.00   0.00   0.00   0.00
 0   0  16  11.48 3.18 2.60  41   8.61
 0   1   1   0.25 2.75 2.60  41   9.50   0.25   0.00  89.99   46
 0   1  17   0.51 2.17 2.60  41   9.24
 0   2   2   0.25 2.74 2.60  41  11.39   0.24   0.00  88.11   48
 0   2  18   0.45 2.28 2.60  41  11.20
 0   3   3   0.24 2.76 2.60  41   3.01   0.18   0.00  96.56   44
 0   3  19   0.41 2.24 2.60  41   2.84

In the example above, the base CPU frequency is 2.6 GHz (as shown in the TSC column), but you can see that some of the CPUs are executing at higher Turbo Boost frequencies and some CPUs are executing at lower CPU frequencies, as shown in the GHz column. Overall, CPUs are largely idle (spending the majority of time executing in C-states 1 through 7, as shown in the %c1 through %c7 columns).

For more information

For more information, please refer to the following:

RHEL 7 - Power Management Guide
https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/html/Power_Management_Guide/

SLES - Powersaving
https://en.opensuse.org/Powersaving

Configuring and tuning HPE Proliant Servers for low-latency applications
http://h20564.www2.hpe.com/hpsc/doc/public/display?docId=emr_na-c01804533&lang=en-us&cc=us