This page contains the ML related instructions.
ML relevant instructions in Arm Architecture
- Most instructions have two variants (1) vector (2) indexed (by-element)
Instruction |
Description |
sdot |
[1x1 INT32] += [1x4 INT8] * [4x1 INT8] |
udot |
[1x1 INT32] += [1x4 UINT8] * [4x1 UINT8] |
8-way INT8 dot product (NEON)
Instruction |
Description |
smmla |
[2x2 INT32] += [2x8 INT8] * [8x2 INT8] |
ummla |
[2x2 INT32] += [2x8 UINT8] * [8x2 UINT8] |
usmmla |
[2x2 INT32] += [2x8 UINT8] * [8x2 INT8] |
4-way INT8 dot product (NEON)
Instruction |
Description |
usdot |
[1x1 INT32] += [1x4 UINT8] * [4x1 INT8] |
sudot |
[1x1 INT32] += [1x4 INT8] * [4x1 UINT8] |
2-way FP32 dot product (SVE only)
Instruction |
Description |
FMMLA |
[2x2 FP32] += [2x2 FP32] * [2x2 FP32] |
2-way FP64 dot product (SVE only, min-width 256-bit)
Instruction |
Description |
FMMLA |
[2x2 FP64] += [2x2 FP64] * [2x2 FP64] |
Instruction |
Description |
BFDOT |
[1x1 FP32] += [1x2 BF16] * [2x1 BF16] |
BFMMLA |
[2x2 FP32] += [2x4 BF16] * [4x2 BF16] |
More details in Bfloat16-Arm