LCR Fleet Alerts - directedmachines/customer-support GitHub Wiki

Table of Contents

Overview

Email alerts are sent from: "[email protected]" with the subject "message from LCR24ZS0-xxxxxxxxxxxxxx"

General Alerts

Active

This alert is sent when the robot is active, may also be sent when user connects to the robot UI over WIFI or LTE.

Charge Level

The following charge-level alerts are sent when different thresholds are reached. If the charge level is dropping and the robot is plugged in or out in the sun please diagnose if there is a charging issue or if something is draining the battery (e.g. a device plugged into the inverter). When in low-power mode the robot can drive but it will be slower and AUX will be disabled.

  • Charge level at 100%
  • Voltage is 47.710000, estimated charge is 49.965795
  • Voltage is 53.470000, estimated charge is 19.873410, setting power state to [LOW_POWER]

Code Branch Mismatch

Code branch mismatch: GIT branch set to development, service state (JSON configuration) set to master When the code branch is mismatched the UI will periodically restart+rebuild to try to sync the latest code. To fix this error, either SSH into the robot and switch the git branch to the correct branch, or patch the "sourceBranch" for "cap" in the self-monitoring-tasks ui: http://<ROBOT_IP>:8000/dashboard/services/ui/?path=/mgmt/self-monitoring-tasks/default

If this happened suddenly without explanation it could be caused by data corruption on the SD card. When corruption is detected the configuration files are reloaded and changes may be lost.

Failure loading config mgmt-self-monitoring-tasks-default.json

Failure loading config mgmt-self-monitoring-tasks-default.json: java.lang.IllegalArgumentException: Unparseable JSON body: java.lang.IllegalStateException: Expected BEGIN_OBJECT but was STRING at line 1 column 1 path $

This type of alert indicates a corruption of the self monitoring state. Syslog may show abrupt reboots, such as starting runtime and abruptly running the reboot cycle.

The cause of this issue may be one of two things:

  1. The Pi is not powered properly. In this case, verify the wiring is not damaged, and is routed properly before further diagnosis is required.
  2. The SD card is not seated properly or is otherwise corrupt. In this case, verify the SD card is seated properly before further diagnosis on the SD card is required.

Self Monitoring Possibly Stuck

Large delta (1800 s) since last sample, self monitoring possibly stuck

Some false positives are possible with this alert, its general purpose is to prompt some careful examination of the robot software runtime state. Some possible causes:

  • The robot might have been disconnected from the internet / cell network and just synchronized time. Alert can be ignored after operator verifies robot state
  • The CAP runtime is no longer updating state in critical services such as the self monitoring task service. Software team member should check the statistics (/stats) JSON of the self-mon service and see if maintenance stat version is incrementing. If not, they need to check the maint. stat counter at other key services, such as pose-estimators/default, navigators/default and determine if periodic updates happened around the same time. The runtime logs should be captured along with syslog and kernel logs and added to an issue tracking "stuck" runtime occurances
  • The host specific (override) JSON file is corrupt. SSH into robot
  • delete file: dCentralizedSystems/cap-config/config/mgmt-self-monitoring-task-default..json
  • restart CAP runtime

Slow Update Rate

A slow update rate will occur when a plan has too many items in it, or too long of paths. This means that the topological planner is taking too long to localize. To verify the topological planner is the issue, in the logs you can search for java.util.concurrent.CancellationException: queue limit exceeded (operationQueue for update on /navigation/topological-path-planners/default)]

  • To ensure smoother planning on the robots end, try to keep total path length less than 20 total kilometers. All paths that are used are split in 1m samples bi-directionally(i.e. a 20 meter path has 40 total samples). Getting into Km's, the number of samples add up very quickly.
  • To alleviate the issue in larger plans, split the large plan into multiple smaller plans.
  • Ensure to follow Plan Design: Best Practices to prevent this issue.

Hardware Issues

300A Motor Controller (300AMC)

Hardware/Firmware Fault

AXLE_AUX_MOTOR HW fault: R: M:0 T:0 P:0 V:0 T1:076.70C T2:063.70C VB:52.33V M+:10.90V M-:10.55V IF:-000.68A IR:-000.49A W:1500 MF:1 FS:4719104(1544) ES:0 I:0

This email is sent when a motor controller (AUX1, LEFT_MOTOR, or RIGHT_MOTOR) experiences a fault and includes the associated "R: 0" report string. This is usually due to over-current on the motor driver and the fault should be auto-cleared by CAP.

If the alert is reoccurring, review the Mowing Deck Stalled Motor Guide

Over Current while Idle

LEFT_MOTOR HW fault: R: M:0 T:0 P:0 V:0 T1:026.75C T2:026.40C VB:56.16V M+:00.00V M-:28.05V IF:-000.29A IR:-000.19A W:900 MF:1 FS:4718608(1552) ES:0 I:1

A hardware fault is sent but the report contains "I:1". This alert is caused when current is detected but the robot is supposed to be idle. See details here: LCR Drivetrain Troubleshooting: Motor Controller Faults.

Over Current Detected

Over current detected on AXLE_AUX_MOTOR: 200.945750 amps over 60 seconds

The motor is likely stalling due to a clog or a failed motor. Review the Mowing Deck Stalled Motor Guide

Motor Disconnected

LEFT|RIGHT|AXLE_AUX_MOTOR disconnected (check panel switch, wiring, motor brushes)

This notification is sent when there is voltage across the motor controller (AUX, LEFT_MOTOR, or RIGHT_MOTOR) leads but no current. This is usually caused by an unplugged motor, the motor disconnect switch being turned off, or broken wiring inside the motor.

For a LEFT or RIGHT MOTOR disconnect, review the Drivetrain Disconnected Motor Guide

For an AXLE_AUX_MOTOR disconnect, review the Mowing Deck Disconnected Motor Guide

It can also be caused by the Pi receiving undervoltage. Details here: LCR Pi Troubleshooting

Toggling Serial Connection

(1.000000) LEFT_MOTOR: No response for 5 seconds, toggling serial connection

This email is sent when the robot stops receiving the "R: 0" report strings from a motor controller (AUX1, LEFT_MOTOR, or RIGHT_MOTOR). We automatically close and re-open the serial connection after this alert is sent so it should recover on its own. Note that this can be a symptom of a Realsense Camera causing serious USB faults, please see Camera Troubleshooting for details and more detailed diagnostics.

Device No Longer Exists

LEFT_MOTOR: device address opened previously but no longer exists, check for USB disconnects or chassis shorts

Could be caused by:

  1. A motor controller being physically unplugged during repairs or a flaky USB cable
  2. A motor controller reset that is done every ~50 days (will be designed out in the future)
  3. Electronics instability or chassis short.

High Temperature

High temperature sensed on LEFT_MOTOR: 696.250000 C

  • The high-temperature alert is only sent at very high temperatures, and firmware should stop the MC before it gets hot enough to trigger alerts
  • This alert likely indicates a broken temperature sensor
  • In rare cases this could be due to a serious problem such as a fire in the electronics tray

Realsense and Generic Cameras

Capture Process Restart

Restarting (100.000000) capture process for left, counter now: 2626, last sample at 0, process live: true

Note that this can be a symptom of a Realsense Camera causing serious USB faults, please see Camera Troubleshooting for details and more detailed diagnostics.

Nonzero Exit Code

Capture process returned nonzero exit code (<EXIT_CODE>)

Alert describes a camera-capture issue, could be cause by an unplugged camera, corrosion, or a short to chassis. See:
LCR Realsense Camera Troubleshooting
LCR USB Side Camera Troubleshooting

Software Issues

Failure initializing GPS

Failure initializing GPS device:java.lang.IllegalStateException: Device query timeout

The GPS can be reset by following the GPS Accuracy Troubleshooting

⚠️ **GitHub.com Fallback** ⚠️