Thermald setup and configuration - linux-surface/linux-surface GitHub Wiki

thermald is a Linux daemon used to prevent the overheating of platforms. This daemon monitors temperature and applies compensation using available cooling methods. (quote from archwiki)

it is useful to manage the temperature of your device while keeping the best performance.

however, thermald is only available to intel CPUs. AMD CPUs are not supported yet.

requirements:

  • thermald
  • Psensor ( for debugging and config file development)

installation:

debian / ubuntu : sudo apt install thermald

arch / manjaro : sudo pacman -S thermald

After you install thermald you can start it via systemd and it will start automaticly every boot : sudo systemctl enable --now thermald

You can check thermald status via: systemctl status thermald

Now thermald will try to prevent CPU temperature from reaching more than 40 C in a slimier way to windows.

this is the easiest way to prevent over heating but not the optimal yet.

A config file is needed to optimize thermal control

You need to configure thermald and tell it what temperature to watch, its temp limit, and how to cool.

This is done in thermald-conf.xml Either download a pre-made (specific to your device) file, rename it to thermal-conf.xml and copy it to /etc/thermald/thermal-conf.xml or develop one (below is a guide to help)

list of pre-made config files:

surface pro 4.txt

developing config file:

First turn off thermald daemon: sudo systemctl stop thermald

Then start thermald in logging mode on terminal: sudo thermald --loglevel=info --dbus-enable --adaptive --no-daemon

You will see something similar to: output.txt ,the most important part is :

[1700057511][INFO]sensor index:2 acpitz /sys/class/thermal/thermal_zone2/ Async:0 
[1700057511][INFO]sensor index:0 acpitz /sys/class/thermal/thermal_zone0/ Async:0 
[1700057511][INFO]sensor index:5 x86_pkg_temp /sys/class/thermal/thermal_zone5/ Async:1 
[1700057511][INFO]sensor index:3 acpitz /sys/class/thermal/thermal_zone3/ Async:0 
[1700057511][INFO]sensor index:1 acpitz /sys/class/thermal/thermal_zone1/ Async:0 
[1700057511][INFO]sensor index:4 pch_skylake /sys/class/thermal/thermal_zone4/ Async:0 
[1700057511][INFO]sensor index:6 hwmon /sys/class/hwmon/hwmon5/temp3_input Async:0 
[1700057511][INFO]sensor index:7 hwmon /sys/class/hwmon/hwmon5/temp1_input Async:0 
[1700057511][INFO]sensor index:8 hwmon /sys/class/hwmon/hwmon5/temp2_input Async:0 
[1700057511][INFO]sensor index:9 hwmon2 /sys/class/thermal/thermal_zone3/temp Async:0

These are the available sources to read temperature from, keep note of Async and the word after index number for later use.

Now open Psensor and configure it to show all data and compare them to cat /path to thermal zones/thermal zone replace the path with the targeted zone.

example : cat /sys/class/hwmon/hwmon5/temp2_input

You should get a number like 32000 which means 32 C,

If you get something else like is a directory then check inside that folder.

Then match the the zones with Psensor values.

After that try to figure out:

  • where every zone is located ( screen, heat pipe, back plate ,etc)
  • which thermal zones effect the fan

Then you need to select temperature limits for zones: Its best to find temperature limit that makes the fan barley spinning (for the zone or zones that are related to the fan) while keeping outer body and screen temperature comfortable to touch and hold (you will need to balance between the fan and touch).

Now you have collected all necessary information.

its time to write it to config file

rename the stock file: sudo mv /etc/thermald/thermal-conf.xml /etc/thermald/thermal-conf-old.xml

example_thermal-conf.txt

download example_thermal-conf.txt , rename it to thermal-conf.xml then copy it to /etc/thermald/: then open it: sudo nano /etc/thermald/thermal-conf.xml you can replace nano with any other text editors.

explanation:

<Preference>QUIET|PERFORMANCE</Preference> use quiet for passive cooling and performance if you can freely control cooling fan. <Name>Example Platform Name</Name> write any name you want inside.

after <ThermalSensors> you can list the thermal sensors that is needed for thermald to control

<ThermalSensors>
    <ThermalSensor>
 information for sensor 1
    <ThermalSensor>
    <ThermalSensor>
information for sensor 2
    <ThermalSensor>
<ThermalSensors>

this way you can list may sensors.

<Type>example_sensor_1</Type> give the sensor a name.

<Path>/some_path</Path> enter the path for the sensor example:<Path>/sys/class/hwmon/hwmon5/temp3_input</Path>

<AsyncCapable>0</AsyncCapable> either 1 or 0, get it from afterAsync that was mentioned earlier.

<ThermalZones> is similar to <ThermalSensors>

<ThermalZones>
    <ThermalZone>
info for zone 1
    <ThermalZone>
    <ThermalZone>
info for zone 2
    <ThermalZone>
<ThermalZones>

<Type>Example Zone type</Type> name the zone.

<TripPoints> is similar to within <TripPoint> are the sensors and maximum temperature to trigger cooling.

<SensorType>example_sensor_1</SensorType> fill it with the value of <Type>example_sensor_1</Type> for the sensor you want.

<Temperature> 75000 </Temperature> enter temperature limit that you want * 1000

<type>max</type> has 3 values: passive (thermal throttle), active (fan),max (both).

<ControlType>SEQUENTIAL</ControlType> has 2 values: PARALLEL, SEQUENTIAL: PARALLEL will trigger all mentioned cooling devices -that you will list next- SEQUENTIAL will use the first cooling device, if it cannot lower the temperature it will trigger the next cooling device and so on.

<CoolingDevice> the sector is to list the available cooling devices.

<type>example_cooling_device</type> passive cooling device are:

  • rapl_controller "Running Average Power Limit" (RAPL) more info's
  • intel_pstate "Performance Scaling Driver" more info's
  • cpufreq "CPU Performance Scaling" more info's
  • LCD uses the back of screen as heat sink (missing source).

each one is going to reduce the heat, but also reduce the performance. so, choose wisely.

<influence> 100 </influence> the% of how effective the cooling device is: 0 is not effective at all, 100 is the most effective. (can be removed).

<SamplingPeriod> 12 </SamplingPeriod> how long will the cooling device starts reducing the temperature in sec (can be removed).

any other options that are listed can be removed.

if you have any suggestions under this topic or want to share you config file: please send them here

⚠️ **GitHub.com Fallback** ⚠️