Eaton UPS - zbrewer/homelab GitHub Wiki
I installed an Eaton 5PX-2200 rack-mount UPS in my homelab to provide power-loss protection and as a major upgrade over my previous APC consumer unit. This UPS also came with an Eaton Network Card-MS
module installed that allows me to monitor and manage the UPS over the network without the addition of a Raspberry Pi (as I had done before) running NUT. This network card can be installed/configured per the manufacturer's user guide and includes a web UI for monitoring and management. This works great and is accessible via reverse-proxy; however, the network card also supports SNMP for monitoring/management via other systems.
Enable SNMP (ideally v3 since it supports authentication and privacy) from the Eaton web UI and set a username and password. Then test SNMP access from another machine with the following command (assuming authNoPriv mode):
$ snmpwalk -v3 -l authNoPriv -u <user> -A <password> <ip_address> 1.3.6.1.4.1.705.1
In this case, 1.3.6.1.4.1.705.1
is the base MIB OID for Eaton UPSs (found in the user guide linked above) and the snmpwalk
command will output all of the data under that root. More options can be found in the man pages here.
With SNMP set up, I installed Network UPS Tools (NUT) on the protected servers that I wanted to shutdown automatically per the instructions on the NUT page. Specifically, I set them all up in standalone
mode using the snmp-ups
driver. This means that each server communicates with the UPS directly and allows me to configure the UPS behavior directly through its web interface. This also means that each protected server running NUT will shut down when the configured low power threshold is reached but that I can also override this on a per-server basis to shutdown earlier. This allows me to, for example, shed the NAS load more quickly in order to save battery power and ensure a safe shutdown.
Telegraf can be configured to read UPS data from either NUT or SNMP directly and write it to InfluxDB. I opted to read from SNMP directly since it allowed me to have full control over the metrics I was reading (including fetching some NUT didn't report) although this did require manually specifying the metrics which was a bit more work.
Telegraf can be set up simply using this docker-compose file with this Telegraf config in the same directory (renamed to eaton-server-room-telegraf.conf
). Configure the InfluxDB key and the SNMP password in the .env
file (along with bucket name, server addresses, etc. in the Telegraf config) and remove unneeded container definitions from docker-compose.yaml
. Start the container with docker compose up -d
from the same directory. Data should then be populated in InfluxDB.
This dashboard can then be loaded in Grafana to display the data.
What happens when power is lost/restored can largely be configured on the Shutdown Parameters
page in the Eaton web interface. In my case, I have this configured as follows:
Output | On battery | System Shutdown | Restart |
---|---|---|---|
Master | If remaining time under 300s or if CAPACITY under 50% (after checkbox unchecked) |
Duration 180s and Turn off UPS after Sequential Shutdown enabled |
If capacity exceeds 70% |
Group1 | Switch off after 1800s or capacity under 60% | 180s | Switch on after 300s |
Group2 | Switch off after 600s or capacity under 0% | 5s | Switch on after 600s |
It is worth noting that my firewall, primary switch, and secondary switch are all attached to the master
group and my primary server is on Group1
. This leaves my primary NAS, backup NAS, secondary server, and some IoT devices on Group2
(the non-critical group). Note that the servers are on different outlet groups so they can be force rebooted/power cycled independently, if necessary.
The NASs and the secondary server are configured to power off after 300s on battery through their NUT configs (actually, the secondary server is configured to shutdown at 80% battery rather than using upssched.conf since it is easier - but close enough) so this ensures that there is ample time for them to shutdown before their outlet groups are turned off (at 600s since power loss). They also don't need a shutdown duration on the outlet groups since the servers should already be off when the outlet group cuts power.
On the other hand, when the UPS hits 60% power, it will trigger the primary server to shutdown and give it 180s to do so (the shutdown duration) before cutting power to Group1
.
Finally, when the UPS hits 50% capacity, it will start a 180s timer (the shutdown duration) before turning off the Master
outlet group, any other outlet groups that are somehow still on, and then itself. This will turn off the firewall and switches. The firewall isn't configured to gracefully shutdown since there isn't data loss to be concerned about at that point and so that it remains running as long as possible so that other shutdown commands/packets can be routed.
When power is restored, the UPS will turn the main outlet group back on when the battery charge reaches 70%. The firewall is set to always turn on when power is restored so it will boot, along with the switches. This will bring the primary network back up. After 5 min (300s) the primary server's power will be restored (outlet Group1
) and it will boot automatically since its power restored behavior is also set to power on
.
Finally, the power to outlet Group2
will be restored after another 5 minutes (300s). This will cause the secondary server to boot (it also has power on
behavior set when power is restored) and it will supply power to both NASs. This will enable the OOB management for both NASs but will not automatically boot them. This is because they are not needed for any remote management and they can be booted manually (remotely via OOB management) once it is confirmed that the power is stable.
I should also note that the APC UPS powering the fiber ONT from my ISP (and a few other things) will also shut itself off when its charge level reaches 50%. This will prevent the battery from being damaged by over-discharge and provides enough for another boot and shutdown if the power is restored briefly. This is accomplished by running a NUT server on a Raspberry Pi (as described here), overriding its low battery threshold, and ensuring that UPS shutdown is configured correctly and will actually shutdown the UPS. When power is restored, this UPS will also turn back on automatically, thereby bringing the ONT and Raspberry Pi running NUT back up.
In order for the above to work, there are a couple more settings that must be configured via the UPS control panel itself. The values that should be set are the default values but are documented here as well for completeness.
-
On/Off settings > Forced reboot
->Enable
-
On/Off settings > Auto restart
->Enable
-
On/Off settings > Sleep mode
->Disable
-
On/Off settings > Remote command
->Enable
-
Battery settings > Deep discharge protection
->Yes