Arm - AshokBhat/ml GitHub Wiki

This page contains the ML related instructions.

ML relevant instructions in Arm Architecture

  • Most instructions have two variants (1) vector (2) indexed (by-element)

Dot product extension

4-way INT8 dot product

Instruction Description
sdot [1x1 INT32] += [1x4 INT8] * [4x1 INT8]
udot [1x1 INT32] += [1x4 UINT8] * [4x1 UINT8]

Matmul extension

8-way INT8 dot product (NEON)

Instruction Description
smmla [2x2 INT32] += [2x8 INT8] * [8x2 INT8]
ummla [2x2 INT32] += [2x8 UINT8] * [8x2 UINT8]
usmmla [2x2 INT32] += [2x8 UINT8] * [8x2 INT8]

4-way INT8 dot product (NEON)

Instruction Description
usdot [1x1 INT32] += [1x4 UINT8] * [4x1 INT8]
sudot [1x1 INT32] += [1x4 INT8] * [4x1 UINT8]

2-way FP32 dot product (SVE only)

Instruction Description
FMMLA [2x2 FP32] += [2x2 FP32] * [2x2 FP32]

2-way FP64 dot product (SVE only, min-width 256-bit)

Instruction Description
FMMLA [2x2 FP64] += [2x2 FP64] * [2x2 FP64]

BF16 extension

Bfloat16 (NEON and SVE)

Instruction Description
BFDOT [1x1 FP32] += [1x2 BF16] * [2x1 BF16]
BFMMLA [2x2 FP32] += [2x4 BF16] * [4x2 BF16]

More details in Bfloat16-Arm

See Also

⚠️ **GitHub.com Fallback** ⚠️