Runtime Power Management - linux-surface/surface-hotplug GitHub Wiki
The dGPU power is controlled by the kernel. To achieve the best results for your desired use-case, you may however need to configure a couple of things. This page describes how to configure basic runtime power-management for the dGPU.
For use with X11, the nvidia-settings application provides some options.
For some of the configuration below, you will need to access the dGPU in sysfs.
To find the correct sysfs path, you will need to know your bus, device, and function numbers.
You can get those via lspci | grep -i nvidia
.
This should yield something like
02:00.0 3D controller: NVIDIA Corporation GP107M [GeForce GTX 1050 Mobile] (rev a1)
or, more generic
<bus>:<device>.<function> <description>
where the bus, device, and function numbers are the 02:00.0
.
For this example, this yields the sysfs path
/sys/bus/pci/devices/0000:02:00.0/
or, again more generic
/sys/bus/pci/devices/0000:<bus>:<device>.<function>/
We will refer to this as
/sys/bus/pci/devices/<dgpu>/
later on.
The kernel can automatically turn the dGPU off when unused. This is referred to as runtime-PM or runtime-suspend and needs to be enabled explicitly.
First, let's look at why you should care about runtime PM.
Runtime PM can significantly reduce power consumption.
This does not only hold for the dGPU, but also for other devices, so consider setting up something like powertop
or custom udev
rules for that.
With respect to the dGPU, the following difference in power consumption can be observed without runtime PM:
dGPU state | Power draw (full device) |
---|---|
dGPU in D3cold (off) | 5W |
dGPU on without driver | 7W |
dGPU on with driver | 10+W |
In the worst case, the dGPU may double your power draw (and also get annoyingly warm) without you even using it.
Note that this testing is somewhat informal, and you can get better baseline results with further tuning of other devices.
The "dGPU on with driver" value represents the nvidia
driver loaded without power-management options enabled.
By setting appropriate power management options for the driver (discussed below), you can get values equivalent to "dGPU in D3cold".
For the device to actually enter runtime suspend, a couple of conditions need to be fulfilled:
- runtime suspend must be enabled for the device,
- the device must not be used, and
- if a driver is bound to the device, it must support runtime suspend, too.
Thus enabling runtime PM is not the only factor. Luckily for us, the nvidia driver has fairly decent runtime PM support, if properly configured.
You can enable runtime suspend by writing auto
to the device's power/control
file, i.e.
echo auto | sudo tee /sys/bus/pci/devices/<dgpu>/power/control
where <dgpu>
represents the bus, device, and function numbers of your dGPU (see above on how to find these).
Note that this has to be repeated each boot.
To automate this, you can rely on udev
or powertop
.
For powertop, see the respective documentation.
For udev, simply create a file /etc/udev/rules.d/30-pci_pm.rules
with the contents
ACTION=="add|change", SUBSYSTEM=="pci", ATTR{vendor}=="0x10de", ATTR{power/control}="auto"
This will enable runtime PM for all NVIDIA devices (after the next reboot).
You can drop the ATTR{vendor}=="0x10de"
to enable runtime suspend for all PCI devices (note that this may cause some issues, so please check without runtime suspend enabled before submitting any bugs, or explicitly state that this is caused by runtime suspend).
Runtime suspend will work with the changes above as long as you do not load any driver for the dGPU.
If you do not wish to use the dGPU, it is easiest if you just blacklist all drivers by creating /etc/modprobe.d/dgpu.conf
with the contents
blacklist i2c_nvidia_gpu
blacklist nouveau
blacklist nvidia
blacklist nvidia-drm
blacklist nvidia-modeset
blacklist nvidia_uvm
alias i2c_nvidia_gpu off
alias nouveau off
alias nvidia off
alias nvidia-drm off
alias nvidia-modeset off
alias nvidia_uvm off
If you want to use the dGPU, you might want to consider using the system76-power
utility, which does driver configuration for you (note: at the moment, this does not seem to blacklist nvidia_uvm
, so you may want to add a new blacklist/alias-off entry for that).
If you insist on configuring the driver manually for hybrid use, create /etc/modprobe.d/dgpu.conf
with the contents
blacklist i2c_nvidia_gpu
alias i2c_nvidia_gpu off
options nvidia NVreg_DynamicPowerManagement=0x02
options nvidia-drm modeset=1
Specifically, the option NVreg_DynamicPowerManagement=0x02
will enable proper runtime suspend support in the driver.
Note, however, that this will not let you detach the clipboard safely as long as the driver is loaded and you may not be able to unload the driver if you've started a desktop environment with the drivers loaded (XWayland and X11 will may the driver from unloading). Detaching the clipboard with drivers loaded can lead to lock-ups and crashes.
To ensure that the dGPU is turned off when it's not in use, you can query the power_state
attribute of the device, i.e. run
cat /sys/bus/pci/devices/<dgpu>/power_state
This will return the current power state of the device, which can be
-
D0
if in use -
D1
,D2
, orD3hot
if in a low-power state, or -
D3cold
if fully turned off.
Ideally, you want it to be in D3cold
when it's not in use.
Note, however, that if configured for hybrid mode as described above, this state may not be achievable due to XWayland/X11.
You can get around this by blacklisting all drivers at boot and then later removing that blacklist (e.g. via system76-power), which, however, has the drawback that you can't use PRIME offloading any more and need to rely on bumblebee again.
You can prevent wayland from hanging onto the nvidia driver by adding the following environment variables to /etc/environment
this will prevent programs from using nvidia by default
__EGL_VENDOR_LIBRARY_FILENAMES="/usr/share/glvnd/egl_vendor.d/50_mesa.json"
VK_ICD_FILENAMES=/usr/share/vulkan/icd.d/intel_icd.x86_64.json:/usr/share/vulkan/icd.d/intel_icd.i686.json
DXVK_FILTER_DEVICE_NAME="Intel"
VKD3D_FILTER_DEVICE_NAME="Intel"
__GLX_VENDOR_LIBRARY_NAME="mesa"
VDPAU_DRIVER=va_gl
CUDA_VISIBLE_DEVICES=""
Now if you unload the drivers like follows, your gpu should show as being in power state D3Cold
sudo rmmod nvidia_drm nvidia_modeset nvidia
If you want to launch a game simply reload the nvidia drivers and then use prime-run to set the right environment variables to start the relevant game.
sudo modprobe nvidia_drm
sudo modprobe nvidia_modeset
sudo modprobe nvidia
To use CUDA, unset CUDA_VISIBLE_DEVICES
.