LCR Fleet Alerts - directedmachines/customer-support GitHub Wiki
Email alerts are sent from: "[email protected]" with the subject "message from LCR24ZS0-xxxxxxxxxxxxxx"
This alert is sent when the robot is active, may also be sent when user connects to the robot UI over WIFI or LTE.
The following charge-level alerts are sent when different thresholds are reached. If the charge level is dropping and the robot is plugged in or out in the sun please diagnose if there is a charging issue or if something is draining the battery (e.g. a device plugged into the inverter). When in low-power mode the robot can drive but it will be slower and AUX will be disabled.
- Charge level at 100%
- Voltage is 47.710000, estimated charge is 49.965795
- Voltage is 53.470000, estimated charge is 19.873410, setting power state to [LOW_POWER]
Code branch mismatch: GIT branch set to development, service state (JSON configuration) set to master
When the code branch is mismatched the UI will periodically restart+rebuild to try to sync the latest code. To fix this error, either SSH into the robot and switch the git branch to the correct branch, or patch the "sourceBranch" for "cap" in the self-monitoring-tasks ui: http://<ROBOT_IP>:8000/dashboard/services/ui/?path=/mgmt/self-monitoring-tasks/default
If this happened suddenly without explanation it could be caused by data corruption on the SD card. When corruption is detected the configuration files are reloaded and changes may be lost.
Failure loading config mgmt-self-monitoring-tasks-default.json: java.lang.IllegalArgumentException: Unparseable JSON body: java.lang.IllegalStateException: Expected BEGIN_OBJECT but was STRING at line 1 column 1 path $
This type of alert indicates a corruption of the self monitoring state. Syslog may show abrupt reboots, such as starting runtime and abruptly running the reboot cycle.
The cause of this issue may be one of two things:
- The Pi is not powered properly. In this case, verify the wiring is not damaged, and is routed properly before further diagnosis is required.
- The SD card is not seated properly or is otherwise corrupt. In this case, verify the SD card is seated properly before further diagnosis on the SD card is required.
Large delta (1800 s) since last sample, self monitoring possibly stuck
Some false positives are possible with this alert, its general purpose is to prompt some careful examination of the robot software runtime state. Some possible causes:
- The robot might have been disconnected from the internet / cell network and just synchronized time. Alert can be ignored after operator verifies robot state
- The CAP runtime is no longer updating state in critical services such as the self monitoring task service. Software team member should check the statistics (/stats) JSON of the self-mon service and see if maintenance stat version is incrementing. If not, they need to check the maint. stat counter at other key services, such as pose-estimators/default, navigators/default and determine if periodic updates happened around the same time. The runtime logs should be captured along with syslog and kernel logs and added to an issue tracking "stuck" runtime occurances
- The host specific (override) JSON file is corrupt. SSH into robot
- delete file: dCentralizedSystems/cap-config/config/mgmt-self-monitoring-task-default..json
- restart CAP runtime
A slow update rate will occur when a plan has too many items in it, or too long of paths. This means that the topological planner is taking too long to localize. To verify the topological planner is the issue, in the logs you can search for java.util.concurrent.CancellationException: queue limit exceeded (operationQueue for update on /navigation/topological-path-planners/default)]
- To ensure smoother planning on the robots end, try to keep total path length less than 20 total kilometers. All paths that are used are split in 1m samples bi-directionally(i.e. a 20 meter path has 40 total samples). Getting into Km's, the number of samples add up very quickly.
- To alleviate the issue in larger plans, split the large plan into multiple smaller plans.
- Ensure to follow Plan Design: Best Practices to prevent this issue.
AXLE_AUX_MOTOR HW fault: R: M:0 T:0 P:0 V:0 T1:076.70C T2:063.70C VB:52.33V M+:10.90V M-:10.55V IF:-000.68A IR:-000.49A W:1500 MF:1 FS:4719104(1544) ES:0 I:0
This email is sent when a motor controller (AUX1, LEFT_MOTOR, or RIGHT_MOTOR) experiences a fault and includes the associated "R: 0" report string. This is usually due to over-current on the motor driver and the fault should be auto-cleared by CAP.
If the alert is reoccurring, review the Mowing Deck Stalled Motor Guide
LEFT_MOTOR HW fault: R: M:0 T:0 P:0 V:0 T1:026.75C T2:026.40C VB:56.16V M+:00.00V M-:28.05V IF:-000.29A IR:-000.19A W:900 MF:1 FS:4718608(1552) ES:0 I:1
A hardware fault is sent but the report contains "I:1". This alert is caused when current is detected but the robot is supposed to be idle. See details here: LCR Drivetrain Troubleshooting: Motor Controller Faults.
Over current detected on AXLE_AUX_MOTOR: 200.945750 amps over 60 seconds
The motor is likely stalling due to a clog or a failed motor. Review the Mowing Deck Stalled Motor Guide
LEFT|RIGHT|AXLE_AUX_MOTOR disconnected (check panel switch, wiring, motor brushes)
This notification is sent when there is voltage across the motor controller (AUX, LEFT_MOTOR, or RIGHT_MOTOR) leads but no current. This is usually caused by an unplugged motor, the motor disconnect switch being turned off, or broken wiring inside the motor.
For a LEFT or RIGHT MOTOR disconnect, review the Drivetrain Disconnected Motor Guide
For an AXLE_AUX_MOTOR disconnect, review the Mowing Deck Disconnected Motor Guide
It can also be caused by the Pi receiving undervoltage. Details here: LCR Pi Troubleshooting
(1.000000) LEFT_MOTOR: No response for 5 seconds, toggling serial connection
This email is sent when the robot stops receiving the "R: 0" report strings from a motor controller (AUX1, LEFT_MOTOR, or RIGHT_MOTOR). We automatically close and re-open the serial connection after this alert is sent so it should recover on its own. Note that this can be a symptom of a Realsense Camera causing serious USB faults, please see Camera Troubleshooting for details and more detailed diagnostics.
LEFT_MOTOR: device address opened previously but no longer exists, check for USB disconnects or chassis shorts
Could be caused by:
- A motor controller being physically unplugged during repairs or a flaky USB cable
- A motor controller reset that is done every ~50 days (will be designed out in the future)
- Electronics instability or chassis short.
High temperature sensed on LEFT_MOTOR: 696.250000 C
- The high-temperature alert is only sent at very high temperatures, and firmware should stop the MC before it gets hot enough to trigger alerts
- This alert likely indicates a broken temperature sensor
- In rare cases this could be due to a serious problem such as a fire in the electronics tray
Restarting (100.000000) capture process for left, counter now: 2626, last sample at 0, process live: true
Note that this can be a symptom of a Realsense Camera causing serious USB faults, please see Camera Troubleshooting for details and more detailed diagnostics.
Capture process returned nonzero exit code (<EXIT_CODE>)
Alert describes a camera-capture issue, could be cause by an unplugged camera, corrosion, or a short to chassis.
See:
LCR Realsense Camera Troubleshooting
LCR USB Side Camera Troubleshooting
Failure initializing GPS device:java.lang.IllegalStateException: Device query timeout
The GPS can be reset by following the GPS Accuracy Troubleshooting