Benchmarks 2024 11 26 TFLM GCC O3 spike_rv32 - tum-ei-eda/muriscv-nn GitHub Wiki
Setup
Simulator
- Spike (
riscv-isa-sim
) (ISS, CPI=1)- Spike :
eb0a3e2b0a7c57522928be39de95cd9f8c6dc636
- Spike PK :
fix-gcc14-rvv
- Spike :
Toolchains
- RISC-V GCC:
- Scalar:
riscv32-unknown-elf-gcc (g8b4bb54e6c4) 14.2.1 20241118
- Vector:
riscv32-unknown-elf-gcc (g8b4bb54e6c4) 14.2.1 20241118
- Packed: Self compiled using patches found in https://github.com/riscv-collab/riscv-gcc/pull/258 and https://github.com/riscvarchive/riscv-binutils-gdb/pull/257
- Scalar:
Models
-
MLPerfTiny Benchmark
-
TODO: others!
Frameworks
-
MLonMCU :
develop
-
TFLM :
8eb6b23de4470d6a8da3131650d6a67514dfa130
Miscellaneous
- Used
-Os
flag for compilation. - Benchmarks generated using MLonMCU deployment tool with minimal efforts.
- Memory metrics are reported in Bytes
Results (Framework: tflm, Backend: tflmi, Toolchain: gcc, Flags: -O3, Target: spike_rv32 )
aww
)
Audio Wake Words (Cycles (Speedup) | Total ROM (rel.) | Total RAM (rel.) | VLEN | Kernels | Mode | Arch | Unroll | Auto-Vectorization |
---|---|---|---|---|---|---|---|---|
54363848.0 ( 0.3x ) |
148424 ( 0.821 ) |
36144 ( 1.0 ) |
0 | TFLM | Reference | RV32GC | 0 | - |
34476456.0 ( 0.4x ) |
156946 ( 0.868 ) |
36200 ( 1.001 ) |
128 | TFLM | Reference | RV32GCV | 0 | Loop+SLP |
31869936.0 ( 0.5x ) |
157292 ( 0.87 ) |
36204 ( 1.001 ) |
256 | TFLM | Reference | RV32GCV | 0 | Loop+SLP |
31219427.0 ( 0.5x ) |
158376 ( 0.876 ) |
36204 ( 1.001 ) |
512 | TFLM | Reference | RV32GCV | 0 | Loop+SLP |
30534285.0 ( 0.5x ) |
159520 ( 0.882 ) |
36204 ( 1.001 ) |
1024 | TFLM | Reference | RV32GCV | 0 | Loop+SLP |
30144995.0 ( 0.5x ) |
160716 ( 0.889 ) |
36200 ( 1.001 ) |
2048 | TFLM | Reference | RV32GCV | 0 | Loop+SLP |
29803625.0 ( 0.5x ) |
162722 ( 0.9 ) |
36176 ( 1.001 ) |
4096 | TFLM | Reference | RV32GCV | 0 | Loop+SLP |
15060815.0 ( Base ) |
180806 ( Base ) |
36152 ( Base ) |
0 | muRISCV-NN | Scalar | RV32GC | 0 | - |
15088048.0 ( 1.0x ) |
178130 ( 0.985 ) |
36152 ( 1.0 ) |
0 | muRISCV-NN | Vector (Portable) | RV32GC | 0 | - |
8093709.0 ( 1.9x ) |
203012 ( 1.123 ) |
36216 ( 1.002 ) |
128 | muRISCV-NN | Scalar | RV32GCV | 0 | Loop+SLP |
6685275.0 ( 2.3x ) |
202958 ( 1.123 ) |
36220 ( 1.002 ) |
256 | muRISCV-NN | Scalar | RV32GCV | 0 | Loop+SLP |
6004861.0 ( 2.5x ) |
209620 ( 1.159 ) |
36220 ( 1.002 ) |
512 | muRISCV-NN | Scalar | RV32GCV | 0 | Loop+SLP |
5682416.0 ( 2.7x ) |
221040 ( 1.223 ) |
36220 ( 1.002 ) |
1024 | muRISCV-NN | Scalar | RV32GCV | 0 | Loop+SLP |
5497499.0 ( 2.7x ) |
247540 ( 1.369 ) |
36216 ( 1.002 ) |
2048 | muRISCV-NN | Scalar | RV32GCV | 0 | Loop+SLP |
5494123.0 ( 2.7x ) |
302598 ( 1.674 ) |
36192 ( 1.001 ) |
4096 | muRISCV-NN | Scalar | RV32GCV | 0 | Loop+SLP |
7989645.0 ( 1.9x ) |
183112 ( 1.013 ) |
36152 ( 1.0 ) |
128 | muRISCV-NN | Vector | RV32GCV | 0 | - |
5446551.0 ( 2.8x ) |
183112 ( 1.013 ) |
36152 ( 1.0 ) |
256 | muRISCV-NN | Vector | RV32GCV | 0 | - |
3931921.0 ( 3.8x ) |
183112 ( 1.013 ) |
36152 ( 1.0 ) |
512 | muRISCV-NN | Vector | RV32GCV | 0 | - |
3897348.0 ( 3.9x ) |
183112 ( 1.013 ) |
36152 ( 1.0 ) |
1024 | muRISCV-NN | Vector | RV32GCV | 0 | - |
3897348.0 ( 3.9x ) |
183112 ( 1.013 ) |
36152 ( 1.0 ) |
2048 | muRISCV-NN | Vector | RV32GCV | 0 | - |
3904130.0 ( 3.9x ) |
183112 ( 1.013 ) |
36152 ( 1.0 ) |
4096 | muRISCV-NN | Vector | RV32GCV | 0 | - |
7134408.0 ( 2.1x ) |
200698 ( 1.11 ) |
36216 ( 1.002 ) |
128 | muRISCV-NN | Vector (Portable) | RV32GCV | 0 | Loop+SLP |
5293706.0 ( 2.8x ) |
200606 ( 1.11 ) |
36220 ( 1.002 ) |
256 | muRISCV-NN | Vector (Portable) | RV32GCV | 0 | Loop+SLP |
4261290.0 ( 3.5x ) |
207268 ( 1.146 ) |
36220 ( 1.002 ) |
512 | muRISCV-NN | Vector (Portable) | RV32GCV | 0 | Loop+SLP |
3910675.0 ( 3.9x ) |
218688 ( 1.21 ) |
36220 ( 1.002 ) |
1024 | muRISCV-NN | Vector (Portable) | RV32GCV | 0 | Loop+SLP |
3687391.0 ( 4.1x ) |
245188 ( 1.356 ) |
36216 ( 1.002 ) |
2048 | muRISCV-NN | Vector (Portable) | RV32GCV | 0 | Loop+SLP |
3704454.0 ( 4.1x ) |
300258 ( 1.661 ) |
36192 ( 1.001 ) |
4096 | muRISCV-NN | Vector (Portable) | RV32GCV | 0 | Loop+SLP |
resnet
)
Image Classification (Cycles (Speedup) | Total ROM (rel.) | Total RAM (rel.) | VLEN | Kernels | Mode | Arch | Unroll | Auto-Vectorization |
---|---|---|---|---|---|---|---|---|
172295593.0 ( 0.3x ) |
196904 ( 0.905 ) |
68916 ( 1.0 ) |
0 | TFLM | Reference | RV32GC | 0 | - |
67905087.0 ( 0.8x ) |
210674 ( 0.969 ) |
68980 ( 1.001 ) |
128 | TFLM | Reference | RV32GCV | 0 | Loop+SLP |
53031626.0 ( 1.0x ) |
211398 ( 0.972 ) |
68984 ( 1.001 ) |
256 | TFLM | Reference | RV32GCV | 0 | Loop+SLP |
47125467.0 ( 1.2x ) |
214156 ( 0.985 ) |
68984 ( 1.001 ) |
512 | TFLM | Reference | RV32GCV | 0 | Loop+SLP |
45091198.0 ( 1.2x ) |
216022 ( 0.993 ) |
68984 ( 1.001 ) |
1024 | TFLM | Reference | RV32GCV | 0 | Loop+SLP |
44753023.0 ( 1.2x ) |
218252 ( 1.004 ) |
68980 ( 1.001 ) |
2048 | TFLM | Reference | RV32GCV | 0 | Loop+SLP |
44036761.0 ( 1.2x ) |
221856 ( 1.02 ) |
68956 ( 1.001 ) |
4096 | TFLM | Reference | RV32GCV | 0 | Loop+SLP |
54559765.0 ( Base ) |
217464 ( Base ) |
68908 ( Base ) |
0 | muRISCV-NN | Scalar | RV32GC | 0 | - |
72393197.0 ( 0.8x ) |
216724 ( 0.997 ) |
68908 ( 1.0 ) |
0 | muRISCV-NN | Vector (Portable) | RV32GC | 0 | - |
18041661.0 ( 3.0x ) |
249354 ( 1.147 ) |
68980 ( 1.001 ) |
128 | muRISCV-NN | Scalar | RV32GCV | 0 | Loop+SLP |
11533301.0 ( 4.7x ) |
242486 ( 1.115 ) |
68984 ( 1.001 ) |
256 | muRISCV-NN | Scalar | RV32GCV | 0 | Loop+SLP |
8332992.0 ( 6.5x ) |
245616 ( 1.129 ) |
68984 ( 1.001 ) |
512 | muRISCV-NN | Scalar | RV32GCV | 0 | Loop+SLP |
6860235.0 ( 8.0x ) |
247026 ( 1.136 ) |
68984 ( 1.001 ) |
1024 | muRISCV-NN | Scalar | RV32GCV | 0 | Loop+SLP |
6163500.0 ( 8.9x ) |
248778 ( 1.144 ) |
68980 ( 1.001 ) |
2048 | muRISCV-NN | Scalar | RV32GCV | 0 | Loop+SLP |
5850418.0 ( 9.3x ) |
251434 ( 1.156 ) |
68956 ( 1.001 ) |
4096 | muRISCV-NN | Scalar | RV32GCV | 0 | Loop+SLP |
32167007.0 ( 1.7x ) |
224760 ( 1.034 ) |
68908 ( 1.0 ) |
128 | muRISCV-NN | Vector | RV32GCV | 0 | - |
19763011.0 ( 2.8x ) |
224760 ( 1.034 ) |
68908 ( 1.0 ) |
256 | muRISCV-NN | Vector | RV32GCV | 0 | - |
14136501.0 ( 3.9x ) |
224760 ( 1.034 ) |
68908 ( 1.0 ) |
512 | muRISCV-NN | Vector | RV32GCV | 0 | - |
11394884.0 ( 4.8x ) |
224760 ( 1.034 ) |
68908 ( 1.0 ) |
1024 | muRISCV-NN | Vector | RV32GCV | 0 | - |
9218540.0 ( 5.9x ) |
224760 ( 1.034 ) |
68908 ( 1.0 ) |
2048 | muRISCV-NN | Vector | RV32GCV | 0 | - |
8638398.0 ( 6.3x ) |
224760 ( 1.034 ) |
68908 ( 1.0 ) |
4096 | muRISCV-NN | Vector | RV32GCV | 0 | - |
29171367.0 ( 1.9x ) |
248602 ( 1.143 ) |
68980 ( 1.001 ) |
128 | muRISCV-NN | Vector (Portable) | RV32GCV | 0 | Loop+SLP |
17067706.0 ( 3.2x ) |
241734 ( 1.112 ) |
68984 ( 1.001 ) |
256 | muRISCV-NN | Vector (Portable) | RV32GCV | 0 | Loop+SLP |
10956464.0 ( 5.0x ) |
244864 ( 1.126 ) |
68984 ( 1.001 ) |
512 | muRISCV-NN | Vector (Portable) | RV32GCV | 0 | Loop+SLP |
8093209.0 ( 6.7x ) |
246274 ( 1.132 ) |
68984 ( 1.001 ) |
1024 | muRISCV-NN | Vector (Portable) | RV32GCV | 0 | Loop+SLP |
6789234.0 ( 8.0x ) |
248026 ( 1.141 ) |
68980 ( 1.001 ) |
2048 | muRISCV-NN | Vector (Portable) | RV32GCV | 0 | Loop+SLP |
6151842.0 ( 8.9x ) |
250682 ( 1.153 ) |
68956 ( 1.001 ) |
4096 | muRISCV-NN | Vector (Portable) | RV32GCV | 0 | Loop+SLP |
toycar
)
Anomaly Detection (Cycles (Speedup) | Total ROM (rel.) | Total RAM (rel.) | VLEN | Kernels | Mode | Arch | Unroll | Auto-Vectorization |
---|---|---|---|---|---|---|---|---|
2784194.0 ( 0.6x ) |
340620 ( 0.978 ) |
19424 ( 1.0 ) |
0 | TFLM | Reference | RV32GC | 0 | - |
1232013.0 ( 1.3x ) |
343596 ( 0.987 ) |
19428 ( 1.0 ) |
128 | TFLM | Reference | RV32GCV | 0 | Loop+SLP |
835955.0 ( 2.0x ) |
343614 ( 0.987 ) |
19428 ( 1.0 ) |
256 | TFLM | Reference | RV32GCV | 0 | Loop+SLP |
638570.0 ( 2.6x ) |
344052 ( 0.988 ) |
19428 ( 1.0 ) |
512 | TFLM | Reference | RV32GCV | 0 | Loop+SLP |
534694.0 ( 3.1x ) |
344468 ( 0.989 ) |
19428 ( 1.0 ) |
1024 | TFLM | Reference | RV32GCV | 0 | Loop+SLP |
485380.0 ( 3.4x ) |
344926 ( 0.991 ) |
19428 ( 1.0 ) |
2048 | TFLM | Reference | RV32GCV | 0 | Loop+SLP |
460708.0 ( 3.6x ) |
345668 ( 0.993 ) |
19424 ( 1.0 ) |
4096 | TFLM | Reference | RV32GCV | 0 | Loop+SLP |
1648267.0 ( Base ) |
348140 ( Base ) |
19424 ( Base ) |
0 | muRISCV-NN | Scalar | RV32GC | 0 | - |
2728325.0 ( 0.6x ) |
348142 ( 1.0 ) |
19424 ( 1.0 ) |
0 | muRISCV-NN | Vector (Portable) | RV32GC | 0 | - |
726502.0 ( 2.3x ) |
349172 ( 1.003 ) |
19428 ( 1.0 ) |
128 | muRISCV-NN | Scalar | RV32GCV | 0 | Loop+SLP |
544912.0 ( 3.0x ) |
348766 ( 1.002 ) |
19428 ( 1.0 ) |
256 | muRISCV-NN | Scalar | RV32GCV | 0 | Loop+SLP |
454495.0 ( 3.6x ) |
348998 ( 1.002 ) |
19428 ( 1.0 ) |
512 | muRISCV-NN | Scalar | RV32GCV | 0 | Loop+SLP |
409282.0 ( 4.0x ) |
349202 ( 1.003 ) |
19428 ( 1.0 ) |
1024 | muRISCV-NN | Scalar | RV32GCV | 0 | Loop+SLP |
398968.0 ( 4.1x ) |
349428 ( 1.004 ) |
19428 ( 1.0 ) |
2048 | muRISCV-NN | Scalar | RV32GCV | 0 | Loop+SLP |
380515.0 ( 4.3x ) |
349810 ( 1.005 ) |
19424 ( 1.0 ) |
4096 | muRISCV-NN | Scalar | RV32GCV | 0 | Loop+SLP |
1993404.0 ( 0.8x ) |
351934 ( 1.011 ) |
19424 ( 1.0 ) |
128 | muRISCV-NN | Vector | RV32GCV | 0 | - |
1745390.0 ( 0.9x ) |
351934 ( 1.011 ) |
19424 ( 1.0 ) |
256 | muRISCV-NN | Vector | RV32GCV | 0 | - |
1621383.0 ( 1.0x ) |
351934 ( 1.011 ) |
19424 ( 1.0 ) |
512 | muRISCV-NN | Vector | RV32GCV | 0 | - |
1560047.0 ( 1.1x ) |
351934 ( 1.011 ) |
19424 ( 1.0 ) |
1024 | muRISCV-NN | Vector | RV32GCV | 0 | - |
1555722.0 ( 1.1x ) |
351934 ( 1.011 ) |
19424 ( 1.0 ) |
2048 | muRISCV-NN | Vector | RV32GCV | 0 | - |
1555172.0 ( 1.1x ) |
351934 ( 1.011 ) |
19424 ( 1.0 ) |
4096 | muRISCV-NN | Vector | RV32GCV | 0 | - |
1221483.0 ( 1.3x ) |
349174 ( 1.003 ) |
19428 ( 1.0 ) |
128 | muRISCV-NN | Vector (Portable) | RV32GCV | 0 | Loop+SLP |
808554.0 ( 2.0x ) |
348768 ( 1.002 ) |
19428 ( 1.0 ) |
256 | muRISCV-NN | Vector (Portable) | RV32GCV | 0 | Loop+SLP |
602899.0 ( 2.7x ) |
349000 ( 1.002 ) |
19428 ( 1.0 ) |
512 | muRISCV-NN | Vector (Portable) | RV32GCV | 0 | Loop+SLP |
500104.0 ( 3.3x ) |
349204 ( 1.003 ) |
19428 ( 1.0 ) |
1024 | muRISCV-NN | Vector (Portable) | RV32GCV | 0 | Loop+SLP |
460999.0 ( 3.6x ) |
349430 ( 1.004 ) |
19428 ( 1.0 ) |
2048 | muRISCV-NN | Vector (Portable) | RV32GCV | 0 | Loop+SLP |
428147.0 ( 3.8x ) |
349812 ( 1.005 ) |
19424 ( 1.0 ) |
4096 | muRISCV-NN | Vector (Portable) | RV32GCV | 0 | Loop+SLP |
vww
)
Visual Wake Words (Cycles (Speedup) | Total ROM (rel.) | Total RAM (rel.) | VLEN | Kernels | Mode | Arch | Unroll | Auto-Vectorization |
---|---|---|---|---|---|---|---|---|
140036503.0 ( 0.3x ) |
422066 ( 0.929 ) |
134448 ( 1.0 ) |
0 | TFLM | Reference | RV32GC | 0 | - |
77531620.0 ( 0.6x ) |
430588 ( 0.947 ) |
134504 ( 1.0 ) |
128 | TFLM | Reference | RV32GCV | 0 | Loop+SLP |
69387094.0 ( 0.7x ) |
430934 ( 0.948 ) |
134508 ( 1.0 ) |
256 | TFLM | Reference | RV32GCV | 0 | Loop+SLP |
66085799.0 ( 0.7x ) |
432018 ( 0.951 ) |
134508 ( 1.0 ) |
512 | TFLM | Reference | RV32GCV | 0 | Loop+SLP |
64123882.0 ( 0.7x ) |
433162 ( 0.953 ) |
134508 ( 1.0 ) |
1024 | TFLM | Reference | RV32GCV | 0 | Loop+SLP |
63218733.0 ( 0.7x ) |
434358 ( 0.956 ) |
134504 ( 1.0 ) |
2048 | TFLM | Reference | RV32GCV | 0 | Loop+SLP |
62122887.0 ( 0.7x ) |
436364 ( 0.96 ) |
134480 ( 1.0 ) |
4096 | TFLM | Reference | RV32GCV | 0 | Loop+SLP |
45259044.0 ( Base ) |
454448 ( Base ) |
134456 ( Base ) |
0 | muRISCV-NN | Scalar | RV32GC | 0 | - |
45599828.0 ( 1.0x ) |
451772 ( 0.994 ) |
134456 ( 1.0 ) |
0 | muRISCV-NN | Vector (Portable) | RV32GC | 0 | - |
25091654.0 ( 1.8x ) |
476662 ( 1.049 ) |
134520 ( 1.0 ) |
128 | muRISCV-NN | Scalar | RV32GCV | 0 | Loop+SLP |
21026264.0 ( 2.2x ) |
476608 ( 1.049 ) |
134524 ( 1.001 ) |
256 | muRISCV-NN | Scalar | RV32GCV | 0 | Loop+SLP |
19035801.0 ( 2.4x ) |
483270 ( 1.063 ) |
134524 ( 1.001 ) |
512 | muRISCV-NN | Scalar | RV32GCV | 0 | Loop+SLP |
18108838.0 ( 2.5x ) |
494674 ( 1.089 ) |
134524 ( 1.001 ) |
1024 | muRISCV-NN | Scalar | RV32GCV | 0 | Loop+SLP |
17730388.0 ( 2.6x ) |
521174 ( 1.147 ) |
134520 ( 1.0 ) |
2048 | muRISCV-NN | Scalar | RV32GCV | 0 | Loop+SLP |
17581231.0 ( 2.6x ) |
576248 ( 1.268 ) |
134496 ( 1.0 ) |
4096 | muRISCV-NN | Scalar | RV32GCV | 0 | Loop+SLP |
25430449.0 ( 1.8x ) |
456754 ( 1.005 ) |
134456 ( 1.0 ) |
128 | muRISCV-NN | Vector | RV32GCV | 0 | - |
18543026.0 ( 2.4x ) |
456754 ( 1.005 ) |
134456 ( 1.0 ) |
256 | muRISCV-NN | Vector | RV32GCV | 0 | - |
15914790.0 ( 2.8x ) |
456754 ( 1.005 ) |
134456 ( 1.0 ) |
512 | muRISCV-NN | Vector | RV32GCV | 0 | - |
14813451.0 ( 3.1x ) |
456754 ( 1.005 ) |
134456 ( 1.0 ) |
1024 | muRISCV-NN | Vector | RV32GCV | 0 | - |
14723422.0 ( 3.1x ) |
456754 ( 1.005 ) |
134456 ( 1.0 ) |
2048 | muRISCV-NN | Vector | RV32GCV | 0 | - |
14730204.0 ( 3.1x ) |
456754 ( 1.005 ) |
134456 ( 1.0 ) |
4096 | muRISCV-NN | Vector | RV32GCV | 0 | - |
22137934.0 ( 2.0x ) |
474348 ( 1.044 ) |
134520 ( 1.0 ) |
128 | muRISCV-NN | Vector (Portable) | RV32GCV | 0 | Loop+SLP |
17072508.0 ( 2.7x ) |
474256 ( 1.044 ) |
134524 ( 1.001 ) |
256 | muRISCV-NN | Vector (Portable) | RV32GCV | 0 | Loop+SLP |
14168099.0 ( 3.2x ) |
480918 ( 1.058 ) |
134524 ( 1.001 ) |
512 | muRISCV-NN | Vector (Portable) | RV32GCV | 0 | Loop+SLP |
13159922.0 ( 3.4x ) |
492322 ( 1.083 ) |
134524 ( 1.001 ) |
1024 | muRISCV-NN | Vector (Portable) | RV32GCV | 0 | Loop+SLP |
12786564.0 ( 3.5x ) |
518822 ( 1.142 ) |
134520 ( 1.0 ) |
2048 | muRISCV-NN | Vector (Portable) | RV32GCV | 0 | Loop+SLP |
12651382.0 ( 3.6x ) |
573908 ( 1.263 ) |
134496 ( 1.0 ) |
4096 | muRISCV-NN | Vector (Portable) | RV32GCV | 0 | Loop+SLP |
Original data
Click here to download the raw files for this benchmark.