How to get FW syndrome when using DEVX - openucx/ucx GitHub Wiki

When you see UCX error looking like UCX ERROR mlx5dv_xxx(...) failed in many cases this means error reported by FW. Every such error has syndrome code that allows to precisely identify the error cause.

Currently the only way to retrieve this syndrome code is enabling dynamic debug in mlx5 driver. Here is how to do this:

  1. echo 'func mlx5_cmd_check +p' | sudo tee /sys/kernel/debug/dynamic_debug/control - enable dynamic debug
  2. sudo dmesg -C - clear dmesg
  3. Run the reproducer that yields the error
  4. dmesg > dmesg.log - capture dmesg, it should contain the syndrome code of failed DEVX command
  5. echo 'func mlx5_cmd_check -p' | sudo tee /sys/kernel/debug/dynamic_debug/control - disable dynamic debug
  6. Upload dmesg.log to the gihub issue for further analysis

References: