This page contains the ML related instructions.
ML relevant instructions in Arm Architecture
- Most instructions have two variants (1) vector (2) indexed (by-element)
| Instruction |
Description |
| sdot |
[1x1 INT32] += [1x4 INT8] * [4x1 INT8] |
| udot |
[1x1 INT32] += [1x4 UINT8] * [4x1 UINT8] |
8-way INT8 dot product (NEON)
| Instruction |
Description |
| smmla |
[2x2 INT32] += [2x8 INT8] * [8x2 INT8] |
| ummla |
[2x2 INT32] += [2x8 UINT8] * [8x2 UINT8] |
| usmmla |
[2x2 INT32] += [2x8 UINT8] * [8x2 INT8] |
4-way INT8 dot product (NEON)
| Instruction |
Description |
| usdot |
[1x1 INT32] += [1x4 UINT8] * [4x1 INT8] |
| sudot |
[1x1 INT32] += [1x4 INT8] * [4x1 UINT8] |
2-way FP32 dot product (SVE only)
| Instruction |
Description |
| FMMLA |
[2x2 FP32] += [2x2 FP32] * [2x2 FP32] |
2-way FP64 dot product (SVE only, min-width 256-bit)
| Instruction |
Description |
| FMMLA |
[2x2 FP64] += [2x2 FP64] * [2x2 FP64] |
| Instruction |
Description |
| BFDOT |
[1x1 FP32] += [1x2 BF16] * [2x1 BF16] |
| BFMMLA |
[2x2 FP32] += [2x4 BF16] * [4x2 BF16] |
More details in Bfloat16-Arm