Benchmarks 2024 11 21 TFLM LLVM O3 spike_rv32 - tum-ei-eda/muriscv-nn GitHub Wiki
Setup
Simulator
- Spike (
riscv-isa-sim
) (ISS, CPI=1)- Spike :
0bc176b3fca43560b9e8586cdbc41cfde073e17a
- Spike PK :
7e9b671c0415dfd7b562ac934feb9380075d4aa2
- Spike :
Toolchains
-
RISC-V GCC:
- Scalar:
riscv32-unknown-elf-gcc (gc891d8dc23e) 13.2.0
- Vector:
riscv32-unknown-elf-gcc (gc891d8dc23e) 13.2.0
- Packed: Self compiled using patches found in https://github.com/riscv-collab/riscv-gcc/pull/258 and https://github.com/riscvarchive/riscv-binutils-gdb/pull/257
- Scalar:
-
LLVM/Clang:
clang version 18.1.8 (https://github.com/llvm/llvm-project.git 3b5b5c1ec4a3095ab096dd780e84d7ab81f3d7ff)
- Linker: lld (TODO)
Models
-
MLPerfTiny Benchmark
-
TODO: others!
Frameworks
-
MLonMCU :
develop
-
TFLM :
8eb6b23de4470d6a8da3131650d6a67514dfa130
Miscellaneous
- Used
-Os
flag for compilation. - Benchmarks generated using MLonMCU deployment tool with minimal efforts.
- Memory metrics are reported in Bytes
Results (Framework: tflm, Backend: tflmi, Toolchain: llvm, Flags: -O3, Target: spike_rv32 )
aww
)
Audio Wake Words (Cycles (Speedup) | Total ROM (rel.) | Total RAM (rel.) | VLEN | Kernels | Mode | Arch | Unroll | Auto-Vectorization |
---|---|---|---|---|---|---|---|---|
38583371 ( 0.4x ) |
157028 ( 0.891 ) |
36084 ( 1.0 ) |
0 | TFLM | Reference | RV32GC | 0 | - |
28760918 ( 0.5x ) |
172318 ( 0.978 ) |
36092 ( 1.0 ) |
128 | TFLM | Reference | RV32GCV | 0 | Loop+SLP |
27262138 ( 0.6x ) |
172318 ( 0.978 ) |
36092 ( 1.0 ) |
256 | TFLM | Reference | RV32GCV | 0 | Loop+SLP |
26557874 ( 0.6x ) |
172318 ( 0.978 ) |
36092 ( 1.0 ) |
512 | TFLM | Reference | RV32GCV | 0 | Loop+SLP |
26209131 ( 0.6x ) |
172318 ( 0.978 ) |
36092 ( 1.0 ) |
1024 | TFLM | Reference | RV32GCV | 0 | Loop+SLP |
42055875 ( 0.4x ) |
172318 ( 0.978 ) |
36092 ( 1.0 ) |
2048 | TFLM | Reference | RV32GCV | 0 | Loop+SLP |
42062653 ( 0.4x ) |
172318 ( 0.978 ) |
36092 ( 1.0 ) |
4096 | TFLM | Reference | RV32GCV | 0 | Loop+SLP |
15103274 ( Base ) |
176242 ( Base ) |
36084 ( Base ) |
0 | muRISCV-NN | Scalar | RV32GC | 0 | - |
14906717 ( 1.0x ) |
176030 ( 0.999 ) |
36084 ( 1.0 ) |
0 | muRISCV-NN | Vector (Portable) | RV32GC | 0 | - |
5348759 ( 2.8x ) |
195010 ( 1.106 ) |
36092 ( 1.0 ) |
128 | muRISCV-NN | Scalar | RV32GCV | 0 | Loop+SLP |
4923020 ( 3.1x ) |
195010 ( 1.106 ) |
36092 ( 1.0 ) |
256 | muRISCV-NN | Scalar | RV32GCV | 0 | Loop+SLP |
4605009 ( 3.3x ) |
195010 ( 1.106 ) |
36092 ( 1.0 ) |
512 | muRISCV-NN | Scalar | RV32GCV | 0 | Loop+SLP |
5352802 ( 2.8x ) |
195010 ( 1.106 ) |
36092 ( 1.0 ) |
1024 | muRISCV-NN | Scalar | RV32GCV | 0 | Loop+SLP |
13822118 ( 1.1x ) |
195010 ( 1.106 ) |
36092 ( 1.0 ) |
2048 | muRISCV-NN | Scalar | RV32GCV | 0 | Loop+SLP |
13825507 ( 1.1x ) |
195010 ( 1.106 ) |
36092 ( 1.0 ) |
4096 | muRISCV-NN | Scalar | RV32GCV | 0 | Loop+SLP |
4059318 ( 3.7x ) |
178620 ( 1.013 ) |
36084 ( 1.0 ) |
128 | muRISCV-NN | Vector | RV32GCV | 0 | - |
2804356 ( 5.4x ) |
178620 ( 1.013 ) |
36084 ( 1.0 ) |
256 | muRISCV-NN | Vector | RV32GCV | 0 | - |
2127046 ( 7.1x ) |
178620 ( 1.013 ) |
36084 ( 1.0 ) |
512 | muRISCV-NN | Vector | RV32GCV | 0 | - |
2086062 ( 7.2x ) |
178620 ( 1.013 ) |
36084 ( 1.0 ) |
1024 | muRISCV-NN | Vector | RV32GCV | 0 | - |
2089451 ( 7.2x ) |
178620 ( 1.013 ) |
36084 ( 1.0 ) |
2048 | muRISCV-NN | Vector | RV32GCV | 0 | - |
2092840 ( 7.2x ) |
178620 ( 1.013 ) |
36084 ( 1.0 ) |
4096 | muRISCV-NN | Vector | RV32GCV | 0 | - |
6401195 ( 2.4x ) |
195704 ( 1.11 ) |
36092 ( 1.0 ) |
128 | muRISCV-NN | Vector (Portable) | RV32GCV | 0 | Loop+SLP |
5731398 ( 2.6x ) |
195704 ( 1.11 ) |
36092 ( 1.0 ) |
256 | muRISCV-NN | Vector (Portable) | RV32GCV | 0 | Loop+SLP |
5396379 ( 2.8x ) |
195704 ( 1.11 ) |
36092 ( 1.0 ) |
512 | muRISCV-NN | Vector (Portable) | RV32GCV | 0 | Loop+SLP |
6481028 ( 2.3x ) |
195704 ( 1.11 ) |
36092 ( 1.0 ) |
1024 | muRISCV-NN | Vector (Portable) | RV32GCV | 0 | Loop+SLP |
14954076 ( 1.0x ) |
195704 ( 1.11 ) |
36092 ( 1.0 ) |
2048 | muRISCV-NN | Vector (Portable) | RV32GCV | 0 | Loop+SLP |
14960854 ( 1.0x ) |
195704 ( 1.11 ) |
36092 ( 1.0 ) |
4096 | muRISCV-NN | Vector (Portable) | RV32GCV | 0 | Loop+SLP |
resnet
)
Image Classification (Cycles (Speedup) | Total ROM (rel.) | Total RAM (rel.) | VLEN | Kernels | Mode | Arch | Unroll | Auto-Vectorization |
---|---|---|---|---|---|---|---|---|
120044787 ( 0.5x ) |
197216 ( 0.94 ) |
68852 ( 1.0 ) |
0 | TFLM | Reference | RV32GC | 0 | - |
52316169 ( 1.1x ) |
214280 ( 1.022 ) |
68860 ( 1.0 ) |
128 | TFLM | Reference | RV32GCV | 0 | Loop+SLP |
44654741 ( 1.3x ) |
214280 ( 1.022 ) |
68860 ( 1.0 ) |
256 | TFLM | Reference | RV32GCV | 0 | Loop+SLP |
82951897 ( 0.7x ) |
214280 ( 1.022 ) |
68860 ( 1.0 ) |
512 | TFLM | Reference | RV32GCV | 0 | Loop+SLP |
107868136 ( 0.5x ) |
214280 ( 1.022 ) |
68860 ( 1.0 ) |
1024 | TFLM | Reference | RV32GCV | 0 | Loop+SLP |
123206876 ( 0.5x ) |
214280 ( 1.022 ) |
68860 ( 1.0 ) |
2048 | TFLM | Reference | RV32GCV | 0 | Loop+SLP |
123213654 ( 0.5x ) |
214280 ( 1.022 ) |
68860 ( 1.0 ) |
4096 | TFLM | Reference | RV32GCV | 0 | Loop+SLP |
56211829 ( Base ) |
209754 ( Base ) |
68848 ( Base ) |
0 | muRISCV-NN | Scalar | RV32GC | 0 | - |
72238619 ( 0.8x ) |
209382 ( 0.998 ) |
68848 ( 1.0 ) |
0 | muRISCV-NN | Vector (Portable) | RV32GC | 0 | - |
11814783 ( 4.8x ) |
231472 ( 1.104 ) |
68856 ( 1.0 ) |
128 | muRISCV-NN | Scalar | RV32GCV | 0 | Loop+SLP |
11335729 ( 5.0x ) |
231472 ( 1.104 ) |
68856 ( 1.0 ) |
256 | muRISCV-NN | Scalar | RV32GCV | 0 | Loop+SLP |
14185510 ( 4.0x ) |
231472 ( 1.104 ) |
68856 ( 1.0 ) |
512 | muRISCV-NN | Scalar | RV32GCV | 0 | Loop+SLP |
15671021 ( 3.6x ) |
231472 ( 1.104 ) |
68856 ( 1.0 ) |
1024 | muRISCV-NN | Scalar | RV32GCV | 0 | Loop+SLP |
16348311 ( 3.4x ) |
231472 ( 1.104 ) |
68856 ( 1.0 ) |
2048 | muRISCV-NN | Scalar | RV32GCV | 0 | Loop+SLP |
35544301 ( 1.6x ) |
231472 ( 1.104 ) |
68856 ( 1.0 ) |
4096 | muRISCV-NN | Scalar | RV32GCV | 0 | Loop+SLP |
15148182 ( 3.7x ) |
212424 ( 1.013 ) |
68848 ( 1.0 ) |
128 | muRISCV-NN | Vector | RV32GCV | 0 | - |
9564986 ( 5.9x ) |
212424 ( 1.013 ) |
68848 ( 1.0 ) |
256 | muRISCV-NN | Vector | RV32GCV | 0 | - |
7019148 ( 8.0x ) |
212424 ( 1.013 ) |
68848 ( 1.0 ) |
512 | muRISCV-NN | Vector | RV32GCV | 0 | - |
5777020 ( 9.7x ) |
212424 ( 1.013 ) |
68848 ( 1.0 ) |
1024 | muRISCV-NN | Vector | RV32GCV | 0 | - |
4852145 ( 11.6x ) |
212424 ( 1.013 ) |
68848 ( 1.0 ) |
2048 | muRISCV-NN | Vector | RV32GCV | 0 | - |
4606442 ( 12.2x ) |
212424 ( 1.013 ) |
68848 ( 1.0 ) |
4096 | muRISCV-NN | Vector | RV32GCV | 0 | - |
17555544 ( 3.2x ) |
230938 ( 1.101 ) |
68856 ( 1.0 ) |
128 | muRISCV-NN | Vector (Portable) | RV32GCV | 0 | Loop+SLP |
16522777 ( 3.4x ) |
230938 ( 1.101 ) |
68856 ( 1.0 ) |
256 | muRISCV-NN | Vector (Portable) | RV32GCV | 0 | Loop+SLP |
17283410 ( 3.3x ) |
230938 ( 1.101 ) |
68856 ( 1.0 ) |
512 | muRISCV-NN | Vector (Portable) | RV32GCV | 0 | Loop+SLP |
16996749 ( 3.3x ) |
230938 ( 1.101 ) |
68856 ( 1.0 ) |
1024 | muRISCV-NN | Vector (Portable) | RV32GCV | 0 | Loop+SLP |
16532317 ( 3.4x ) |
230938 ( 1.101 ) |
68856 ( 1.0 ) |
2048 | muRISCV-NN | Vector (Portable) | RV32GCV | 0 | Loop+SLP |
43318707 ( 1.3x ) |
230938 ( 1.101 ) |
68856 ( 1.0 ) |
4096 | muRISCV-NN | Vector (Portable) | RV32GCV | 0 | Loop+SLP |
toycar
)
Anomaly Detection (Cycles (Speedup) | Total ROM (rel.) | Total RAM (rel.) | VLEN | Kernels | Mode | Arch | Unroll | Auto-Vectorization |
---|---|---|---|---|---|---|---|---|
2798517 ( 0.6x ) |
342978 ( 0.983 ) |
19372 ( 1.0 ) |
0 | TFLM | Reference | RV32GC | 0 | - |
794257 ( 2.1x ) |
346458 ( 0.993 ) |
19372 ( 1.0 ) |
128 | TFLM | Reference | RV32GCV | 0 | Loop+SLP |
620753 ( 2.7x ) |
346458 ( 0.993 ) |
19372 ( 1.0 ) |
256 | TFLM | Reference | RV32GCV | 0 | Loop+SLP |
530289 ( 3.2x ) |
346458 ( 0.993 ) |
19372 ( 1.0 ) |
512 | TFLM | Reference | RV32GCV | 0 | Loop+SLP |
485057 ( 3.5x ) |
346458 ( 0.993 ) |
19372 ( 1.0 ) |
1024 | TFLM | Reference | RV32GCV | 0 | Loop+SLP |
462441 ( 3.7x ) |
346458 ( 0.993 ) |
19372 ( 1.0 ) |
2048 | TFLM | Reference | RV32GCV | 0 | Loop+SLP |
2217849 ( 0.8x ) |
346458 ( 0.993 ) |
19372 ( 1.0 ) |
4096 | TFLM | Reference | RV32GCV | 0 | Loop+SLP |
1688876 ( Base ) |
348892 ( Base ) |
19372 ( Base ) |
0 | muRISCV-NN | Scalar | RV32GC | 0 | - |
3038249 ( 0.6x ) |
348894 ( 1.0 ) |
19372 ( 1.0 ) |
0 | muRISCV-NN | Vector (Portable) | RV32GC | 0 | - |
552030 ( 3.1x ) |
353948 ( 1.014 ) |
19372 ( 1.0 ) |
128 | muRISCV-NN | Scalar | RV32GCV | 0 | Loop+SLP |
464821 ( 3.6x ) |
353948 ( 1.014 ) |
19372 ( 1.0 ) |
256 | muRISCV-NN | Scalar | RV32GCV | 0 | Loop+SLP |
419605 ( 4.0x ) |
353948 ( 1.014 ) |
19372 ( 1.0 ) |
512 | muRISCV-NN | Scalar | RV32GCV | 0 | Loop+SLP |
396997 ( 4.3x ) |
353948 ( 1.014 ) |
19372 ( 1.0 ) |
1024 | muRISCV-NN | Scalar | RV32GCV | 0 | Loop+SLP |
385693 ( 4.4x ) |
353948 ( 1.014 ) |
19372 ( 1.0 ) |
2048 | muRISCV-NN | Scalar | RV32GCV | 0 | Loop+SLP |
1322634 ( 1.3x ) |
353948 ( 1.014 ) |
19372 ( 1.0 ) |
4096 | muRISCV-NN | Scalar | RV32GCV | 0 | Loop+SLP |
1937117 ( 0.9x ) |
349998 ( 1.003 ) |
19372 ( 1.0 ) |
128 | muRISCV-NN | Vector | RV32GCV | 0 | - |
1820437 ( 0.9x ) |
349998 ( 1.003 ) |
19372 ( 1.0 ) |
256 | muRISCV-NN | Vector | RV32GCV | 0 | - |
1762097 ( 1.0x ) |
349998 ( 1.003 ) |
19372 ( 1.0 ) |
512 | muRISCV-NN | Vector | RV32GCV | 0 | - |
1733251 ( 1.0x ) |
349998 ( 1.003 ) |
19372 ( 1.0 ) |
1024 | muRISCV-NN | Vector | RV32GCV | 0 | - |
1729620 ( 1.0x ) |
349998 ( 1.003 ) |
19372 ( 1.0 ) |
2048 | muRISCV-NN | Vector | RV32GCV | 0 | - |
1727764 ( 1.0x ) |
349998 ( 1.003 ) |
19372 ( 1.0 ) |
4096 | muRISCV-NN | Vector | RV32GCV | 0 | - |
788694 ( 2.1x ) |
353950 ( 1.014 ) |
19372 ( 1.0 ) |
128 | muRISCV-NN | Vector (Portable) | RV32GCV | 0 | Loop+SLP |
606765 ( 2.8x ) |
353950 ( 1.014 ) |
19372 ( 1.0 ) |
256 | muRISCV-NN | Vector (Portable) | RV32GCV | 0 | Loop+SLP |
512205 ( 3.3x ) |
353950 ( 1.014 ) |
19372 ( 1.0 ) |
512 | muRISCV-NN | Vector (Portable) | RV32GCV | 0 | Loop+SLP |
464925 ( 3.6x ) |
353950 ( 1.014 ) |
19372 ( 1.0 ) |
1024 | muRISCV-NN | Vector (Portable) | RV32GCV | 0 | Loop+SLP |
441285 ( 3.8x ) |
353950 ( 1.014 ) |
19372 ( 1.0 ) |
2048 | muRISCV-NN | Vector (Portable) | RV32GCV | 0 | Loop+SLP |
2351722 ( 0.7x ) |
353950 ( 1.014 ) |
19372 ( 1.0 ) |
4096 | muRISCV-NN | Vector (Portable) | RV32GCV | 0 | Loop+SLP |
vww
)
Visual Wake Words (Cycles (Speedup) | Total ROM (rel.) | Total RAM (rel.) | VLEN | Kernels | Mode | Arch | Unroll | Auto-Vectorization |
---|---|---|---|---|---|---|---|---|
101700273 ( 0.4x ) |
430670 ( 0.957 ) |
134388 ( 1.0 ) |
0 | TFLM | Reference | RV32GC | 0 | - |
69585191 ( 0.6x ) |
445960 ( 0.991 ) |
134396 ( 1.0 ) |
128 | TFLM | Reference | RV32GCV | 0 | Loop+SLP |
67014019 ( 0.7x ) |
445960 ( 0.991 ) |
134396 ( 1.0 ) |
256 | TFLM | Reference | RV32GCV | 0 | Loop+SLP |
67133651 ( 0.7x ) |
445960 ( 0.991 ) |
134396 ( 1.0 ) |
512 | TFLM | Reference | RV32GCV | 0 | Loop+SLP |
72933816 ( 0.6x ) |
445960 ( 0.991 ) |
134396 ( 1.0 ) |
1024 | TFLM | Reference | RV32GCV | 0 | Loop+SLP |
79447180 ( 0.6x ) |
445960 ( 0.991 ) |
134396 ( 1.0 ) |
2048 | TFLM | Reference | RV32GCV | 0 | Loop+SLP |
104950000 ( 0.4x ) |
445960 ( 0.991 ) |
134396 ( 1.0 ) |
4096 | TFLM | Reference | RV32GCV | 0 | Loop+SLP |
45209862 ( Base ) |
449884 ( Base ) |
134388 ( Base ) |
0 | muRISCV-NN | Scalar | RV32GC | 0 | - |
44774239 ( 1.0x ) |
449672 ( 1.0 ) |
134388 ( 1.0 ) |
0 | muRISCV-NN | Vector (Portable) | RV32GC | 0 | - |
17456297 ( 2.6x ) |
468652 ( 1.042 ) |
134396 ( 1.0 ) |
128 | muRISCV-NN | Scalar | RV32GCV | 0 | Loop+SLP |
17007755 ( 2.7x ) |
468652 ( 1.042 ) |
134396 ( 1.0 ) |
256 | muRISCV-NN | Scalar | RV32GCV | 0 | Loop+SLP |
18183880 ( 2.5x ) |
468652 ( 1.042 ) |
134396 ( 1.0 ) |
512 | muRISCV-NN | Scalar | RV32GCV | 0 | Loop+SLP |
21330205 ( 2.1x ) |
468652 ( 1.042 ) |
134396 ( 1.0 ) |
1024 | muRISCV-NN | Scalar | RV32GCV | 0 | Loop+SLP |
24856651 ( 1.8x ) |
468652 ( 1.042 ) |
134396 ( 1.0 ) |
2048 | muRISCV-NN | Scalar | RV32GCV | 0 | Loop+SLP |
38815794 ( 1.2x ) |
468652 ( 1.042 ) |
134396 ( 1.0 ) |
4096 | muRISCV-NN | Scalar | RV32GCV | 0 | Loop+SLP |
13248138 ( 3.4x ) |
452262 ( 1.005 ) |
134388 ( 1.0 ) |
128 | muRISCV-NN | Vector | RV32GCV | 0 | - |
9927162 ( 4.6x ) |
452262 ( 1.005 ) |
134388 ( 1.0 ) |
256 | muRISCV-NN | Vector | RV32GCV | 0 | - |
8635746 ( 5.2x ) |
452262 ( 1.005 ) |
134388 ( 1.0 ) |
512 | muRISCV-NN | Vector | RV32GCV | 0 | - |
8131762 ( 5.6x ) |
452262 ( 1.005 ) |
134388 ( 1.0 ) |
1024 | muRISCV-NN | Vector | RV32GCV | 0 | - |
8087385 ( 5.6x ) |
452262 ( 1.005 ) |
134388 ( 1.0 ) |
2048 | muRISCV-NN | Vector | RV32GCV | 0 | - |
8090774 ( 5.6x ) |
452262 ( 1.005 ) |
134388 ( 1.0 ) |
4096 | muRISCV-NN | Vector | RV32GCV | 0 | - |
20223002 ( 2.2x ) |
469346 ( 1.043 ) |
134396 ( 1.0 ) |
128 | muRISCV-NN | Vector (Portable) | RV32GCV | 0 | Loop+SLP |
19937565 ( 2.3x ) |
469346 ( 1.043 ) |
134396 ( 1.0 ) |
256 | muRISCV-NN | Vector (Portable) | RV32GCV | 0 | Loop+SLP |
21452282 ( 2.1x ) |
469346 ( 1.043 ) |
134396 ( 1.0 ) |
512 | muRISCV-NN | Vector (Portable) | RV32GCV | 0 | Loop+SLP |
24598559 ( 1.8x ) |
469346 ( 1.043 ) |
134396 ( 1.0 ) |
1024 | muRISCV-NN | Vector (Portable) | RV32GCV | 0 | Loop+SLP |
28124981 ( 1.6x ) |
469346 ( 1.043 ) |
134396 ( 1.0 ) |
2048 | muRISCV-NN | Vector (Portable) | RV32GCV | 0 | Loop+SLP |
42087501 ( 1.1x ) |
469346 ( 1.043 ) |
134396 ( 1.0 ) |
4096 | muRISCV-NN | Vector (Portable) | RV32GCV | 0 | Loop+SLP |
Original data
Click here to download the raw files for this benchmark.