How to get FW syndrome when using DEVX - openucx/ucx GitHub Wiki
When you see UCX error looking like UCX ERROR mlx5dv_xxx(...) failed
in many cases this means error reported by FW.
Every such error has syndrome code that allows to precisely identify the error cause.
Currently the only way to retrieve this syndrome code is enabling dynamic debug in mlx5 driver. Here is how to do this:
echo 'func mlx5_cmd_check +p' | sudo tee /sys/kernel/debug/dynamic_debug/control
- enable dynamic debugsudo dmesg -C
- clear dmesg- Run the reproducer that yields the error
dmesg > dmesg.log
- capture dmesg, it should contain the syndrome code of failed DEVX commandecho 'func mlx5_cmd_check -p' | sudo tee /sys/kernel/debug/dynamic_debug/control
- disable dynamic debug- Upload dmesg.log to the gihub issue for further analysis
References: