Eaton UPS - zbrewer/homelab GitHub Wiki
I installed an Eaton 5PX-2200 rack-mount UPS in my homelab to provide power-loss protection and as a major upgrade over my previous APC consumer unit. This UPS also came with an Eaton Network Card-MS
module installed that allows me to monitor and manage the UPS over the network without the addition of a Raspberry Pi (as I had done before) running NUT. This network card can be installed/configured per the manufacturer's user guide and includes a web UI for monitoring and management. This works great and is accessible via reverse-proxy; however, the network card also supports SNMP for monitoring/management via other systems.
Enable SNMP (ideally v3 since it supports authentication and privacy) from the Eaton web UI and set a username and password. Then test SNMP access from another machine with the following command (assuming authNoPriv mode):
$ snmpwalk -v3 -l authNoPriv -u <user> -A <password> <ip_address> 1.3.6.1.4.1.705.1
In this case, 1.3.6.1.4.1.705.1
is the base MIB OID for Eaton UPSs (found in the user guide linked above) and the snmpwalk
command will output all of the data under that root. More options can be found in the man pages here.
With SNMP set up, I installed Network UPS Tools (NUT) on the protected servers that I wanted to shutdown automatically per the instructions on the NUT page. Specifically, I set them all up in standalone
mode using the snmp-ups
driver. This means that each server communicates with the UPS directly and allows me to configure the UPS behavior directly through its web interface. This also means that each protected server running NUT will shut down when the configured low power threshold is reached but that I can also override this on a per-server basis to shutdown earlier. This allows me to, for example, shed the NAS load more quickly in order to save battery power and ensure a safe shutdown.
Telegraf can be configured to read UPS data from either NUT or SNMP directly and write it to InfluxDB. I opted to read from SNMP directly since it allowed me to have full control over the metrics I was reading (including fetching some NUT didn't report) although this did require manually specifying the metrics which was a bit more work.
Telegraf can be set up simply using this docker-compose file with this Telegraf config in the same directory (renamed to eaton-server-room-telegraf.conf
). Configure the InfluxDB key and the SNMP password in the .env
file (along with bucket name, server addresses, etc. in the Telegraf config) and remove unneeded container definitions from docker-compose.yaml
. Start the container with docker compose up -d
from the same directory. Data should then be populated in InfluxDB.
This dashboard can then be loaded in Grafana to display the data.
What happens when power is lost/restored can largely be configured on the Shutdown Parameters
page in the Eaton web interface. In my case, I have this configured as follows:
Output | On battery | System Shutdown | Restart |
---|---|---|---|
Master | If remaining time under 300s or if CAPACITY under 50% (after checkbox unchecked) |
Duration 300s and Turn off UPS after Sequential Shutdown enabled |
If capacity exceeds 70% |
Group1 | Switch off after 900s or capacity under 0% | 5s | Switch on after 300s |
Group2 | Switch off after 600s or capacity under 0% | 5s | Switch on after 600s |
It is worth noting that my firewall, primary server, and primary switch are all attached to the master
group, my primary NAS is on Group1
and my backup NAS is on Group2
. The NASs are configured to power off after 120s on battery through their NUT configs so this ensures that there is ample time for them to shutdown before their outlet groups are turned off. They also don't need a shutdown duration on the outlet groups since the servers should already be off when the outlet group cuts power.
On the other hand, when the UPS hits 50% power, it will trigger the primary server to shutdown and give it 300s to do so (the shutdown duration). After this, it will cut power to the main group. This will turn off the firewall and switch as well. The firewall isn't configured to gracefully shutdown since there isn't data loss to be concerned about at that point and so that it remains running as long as possible so that other shutdown commands/packets can be routed.
When power is restored, the UPS will turn the main outlet group back on when the battery charge reaches 70%. The firewall and main server are set to always turn on when power is restored so they will both boot, along with the switch. This will bring the primary network back up. After 5 min (300s) the primary NAS's power will be restored but it won't boot automatically since it was shutdown gracefully and the power restored behavior is set to last
. Finally, the power to the backup NAS will be restored after another 5 minutes (300s). The backup NAS also won't automatically boot for the same reason.
I should also note that the APC UPS powering the fiber ONT from my ISP (and a few other things) will also shut itself off when its charge level reaches 50%. This will prevent the battery from being damaged by over-discharge and provides enough for another boot and shutdown if the power is restored briefly. This is accomplished by running a NUT server on a Raspberry Pi (as described here), overriding its low battery threshold, and ensuring that UPS shutdown is configured correctly and will actually shutdown the UPS. When power is restored, this UPS will also turn back on automatically, thereby bringing the ONT and Raspberry Pi running NUT back up.
In order for the above to work, there are a couple more settings that must be configured via the UPS control panel itself. The values that should be set are the default values but are documented here as well for completeness.
-
On/Off settings > Forced reboot
->Enable
-
On/Off settings > Auto restart
->Enable
-
On/Off settings > Sleep mode
->Disable
-
On/Off settings > Remote command
->Enable
-
Battery settings > Deep discharge protection
->Yes