Benchmarks 2024 02 22 TVM GCC - tum-ei-eda/muriscv-nn GitHub Wiki
Setup
Simulator
- Spike (
riscv-isa-sim
) (ISS, CPI=1)
Toolchains
- RISC-V GCC:
- Scalar: TODO: version & url
- Vector: TODO: version & url
- Packed: Self compiled using patches found in https://github.com/riscv-collab/riscv-gcc/pull/258 and https://github.com/riscvarchive/riscv-binutils-gdb/pull/257
Models
-
MLPerfTiny Benchmark
-
TODO: others!
Package Versions
-
MLonMCU : main
-
TVM : Nightly Pre-Build
-
Spike : 0bc176b3fca43560b9e8586cdbc41cfde073e17a
-
Spike PK : 7e9b671c0415dfd7b562ac934feb9380075d4aa2
Miscellaneous
- Used
-Os
flag for compilation. - Benchmarks generated using MLonMCU deployment tool with minimal efforts.
- Memory metrics are reported in Bytes
Results (Framework: tvm, Backend: tvmaot, Toolchain: gcc)
aww
)
Audio Wake Words (Cycles (Speedup) | Total ROM (rel.) | Total RAM (rel.) | VLEN | Layout | Kernels | Mode | Arch | Auto-Vectorization |
---|---|---|---|---|---|---|---|---|
31536718 ( 0.8x ) |
107754 ( 0.946 ) |
59536 ( 1.001 ) |
0 | NCHW | TVM | Fallback | RV32GC | - |
29578962 ( 0.9x ) |
101832 ( 0.894 ) |
59536 ( 1.001 ) |
0 | NHWC | TVM | Fallback | RV32GC | - |
15873108 ( 1.7x ) |
101288 ( 0.889 ) |
51364 ( 0.863 ) |
0 | NCHW | TVM | Autotuned | RV32GC | - |
29574349 ( 0.9x ) |
101974 ( 0.895 ) |
59536 ( 1.001 ) |
0 | NHWC | TVM | Autotuned | RV32GC | - |
31529514 ( 0.8x ) |
107730 ( 0.946 ) |
59536 ( 1.001 ) |
128 | NCHW | TVM | Fallback | RV32GCV | Loop+SLP |
31529514 ( 0.8x ) |
107730 ( 0.946 ) |
59536 ( 1.001 ) |
1024 | NCHW | TVM | Fallback | RV32GCV | Loop+SLP |
31529514 ( 0.8x ) |
107730 ( 0.946 ) |
59536 ( 1.001 ) |
4096 | NCHW | TVM | Fallback | RV32GCV | Loop+SLP |
29578958 ( 0.9x ) |
101832 ( 0.894 ) |
59536 ( 1.001 ) |
128 | NHWC | TVM | Fallback | RV32GCV | Loop+SLP |
29578958 ( 0.9x ) |
101832 ( 0.894 ) |
59536 ( 1.001 ) |
1024 | NHWC | TVM | Fallback | RV32GCV | Loop+SLP |
29578958 ( 0.9x ) |
101832 ( 0.894 ) |
59536 ( 1.001 ) |
4096 | NHWC | TVM | Fallback | RV32GCV | Loop+SLP |
15873104 ( 1.7x ) |
101286 ( 0.889 ) |
51364 ( 0.863 ) |
128 | NCHW | TVM | Autotuned | RV32GCV | Loop+SLP |
15873104 ( 1.7x ) |
101288 ( 0.889 ) |
51364 ( 0.863 ) |
1024 | NCHW | TVM | Autotuned | RV32GCV | Loop+SLP |
15873104 ( 1.7x ) |
101286 ( 0.889 ) |
51364 ( 0.863 ) |
4096 | NCHW | TVM | Autotuned | RV32GCV | Loop+SLP |
29574345 ( 0.9x ) |
101974 ( 0.895 ) |
59536 ( 1.001 ) |
128 | NHWC | TVM | Autotuned | RV32GCV | Loop+SLP |
29574345 ( 0.9x ) |
101974 ( 0.895 ) |
59536 ( 1.001 ) |
1024 | NHWC | TVM | Autotuned | RV32GCV | Loop+SLP |
29574345 ( 0.9x ) |
101974 ( 0.895 ) |
59536 ( 1.001 ) |
4096 | NHWC | TVM | Autotuned | RV32GCV | Loop+SLP |
28034895 ( 0.9x ) |
119608 ( 1.05 ) |
59484 ( 1.0 ) |
0 | NCHW | TVM | Fallback | RV32GCP | - |
26289773 ( Base ) |
113910 ( Base ) |
59484 ( Base ) |
0 | NHWC | TVM | Fallback | RV32GCP | - |
12897999 ( 2.0x ) |
113212 ( 0.994 ) |
51312 ( 0.863 ) |
0 | NCHW | TVM | Autotuned | RV32GCP | - |
26285154 ( 1.0x ) |
114032 ( 1.001 ) |
59484 ( 1.0 ) |
0 | NHWC | TVM | Autotuned | RV32GCP | - |
16859464 ( 1.6x ) |
85086 ( 0.747 ) |
19224 ( 0.323 ) |
0 | NHWC | muRISCV-NN | Scalar | RV32GC | - |
16859464 ( 1.6x ) |
85086 ( 0.747 ) |
19224 ( 0.323 ) |
128 | NHWC | muRISCV-NN | Scalar | RV32GCV | Loop+SLP |
16859464 ( 1.6x ) |
85086 ( 0.747 ) |
19224 ( 0.323 ) |
1024 | NHWC | muRISCV-NN | Scalar | RV32GCV | Loop+SLP |
16859464 ( 1.6x ) |
85086 ( 0.747 ) |
19224 ( 0.323 ) |
4096 | NHWC | muRISCV-NN | Scalar | RV32GCV | Loop+SLP |
6321997 ( 4.2x ) |
86102 ( 0.756 ) |
23688 ( 0.398 ) |
128 | NHWC | muRISCV-NN | Vector | RV32GCV | - |
4472277 ( 5.9x ) |
86102 ( 0.756 ) |
23688 ( 0.398 ) |
1024 | NHWC | muRISCV-NN | Vector | RV32GCV | - |
4475666 ( 5.9x ) |
86102 ( 0.756 ) |
23688 ( 0.398 ) |
4096 | NHWC | muRISCV-NN | Vector | RV32GCV | - |
13443817 ( 2.0x ) |
97020 ( 0.852 ) |
19172 ( 0.322 ) |
0 | NHWC | muRISCV-NN | Scalar | RV32GCP | - |
15707734 ( 1.7x ) |
99208 ( 0.871 ) |
20324 ( 0.342 ) |
0 | NHWC | muRISCV-NN | Packed | RV32GCP | - |
resnet
)
Image Classification (Cycles (Speedup) | Total ROM (rel.) | Total RAM (rel.) | VLEN | Layout | Kernels | Mode | Arch | Auto-Vectorization |
---|---|---|---|---|---|---|---|---|
132014570 ( 0.8x ) |
217002 ( 0.987 ) |
108448 ( 1.0 ) |
0 | NCHW | TVM | Fallback | RV32GC | - |
115757507 ( 0.9x ) |
207910 ( 0.945 ) |
108448 ( 1.0 ) |
0 | NHWC | TVM | Fallback | RV32GC | - |
57193829 ( 1.8x ) |
211090 ( 0.96 ) |
92264 ( 0.851 ) |
0 | NCHW | TVM | Autotuned | RV32GC | - |
115753589 ( 0.9x ) |
208012 ( 0.946 ) |
108448 ( 1.0 ) |
0 | NHWC | TVM | Autotuned | RV32GC | - |
132147940 ( 0.8x ) |
216994 ( 0.987 ) |
108448 ( 1.0 ) |
128 | NCHW | TVM | Fallback | RV32GCV | Loop+SLP |
132147940 ( 0.8x ) |
216992 ( 0.987 ) |
108448 ( 1.0 ) |
1024 | NCHW | TVM | Fallback | RV32GCV | Loop+SLP |
132147940 ( 0.8x ) |
216994 ( 0.987 ) |
108448 ( 1.0 ) |
4096 | NCHW | TVM | Fallback | RV32GCV | Loop+SLP |
115757501 ( 0.9x ) |
207908 ( 0.945 ) |
108448 ( 1.0 ) |
128 | NHWC | TVM | Fallback | RV32GCV | Loop+SLP |
115757501 ( 0.9x ) |
207910 ( 0.945 ) |
108448 ( 1.0 ) |
1024 | NHWC | TVM | Fallback | RV32GCV | Loop+SLP |
115757501 ( 0.9x ) |
207910 ( 0.945 ) |
108448 ( 1.0 ) |
4096 | NHWC | TVM | Fallback | RV32GCV | Loop+SLP |
57193823 ( 1.8x ) |
211096 ( 0.96 ) |
92264 ( 0.851 ) |
128 | NCHW | TVM | Autotuned | RV32GCV | Loop+SLP |
57193823 ( 1.8x ) |
211098 ( 0.96 ) |
92264 ( 0.851 ) |
1024 | NCHW | TVM | Autotuned | RV32GCV | Loop+SLP |
57193823 ( 1.8x ) |
211098 ( 0.96 ) |
92264 ( 0.851 ) |
4096 | NCHW | TVM | Autotuned | RV32GCV | Loop+SLP |
115753583 ( 0.9x ) |
208012 ( 0.946 ) |
108448 ( 1.0 ) |
128 | NHWC | TVM | Autotuned | RV32GCV | Loop+SLP |
115753583 ( 0.9x ) |
208012 ( 0.946 ) |
108448 ( 1.0 ) |
1024 | NHWC | TVM | Autotuned | RV32GCV | Loop+SLP |
115753583 ( 0.9x ) |
208014 ( 0.946 ) |
108448 ( 1.0 ) |
4096 | NHWC | TVM | Autotuned | RV32GCV | Loop+SLP |
134393924 ( 0.8x ) |
229126 ( 1.042 ) |
108396 ( 1.0 ) |
0 | NCHW | TVM | Fallback | RV32GCP | - |
102349260 ( Base ) |
219946 ( Base ) |
108396 ( Base ) |
0 | NHWC | TVM | Fallback | RV32GCP | - |
44503696 ( 2.3x ) |
222790 ( 1.013 ) |
92212 ( 0.851 ) |
0 | NCHW | TVM | Autotuned | RV32GCP | - |
102345331 ( 1.0x ) |
220022 ( 1.0 ) |
108396 ( 1.0 ) |
0 | NHWC | TVM | Autotuned | RV32GCP | - |
81013965 ( 1.3x ) |
134332 ( 0.611 ) |
55528 ( 0.512 ) |
0 | NHWC | muRISCV-NN | Scalar | RV32GC | - |
81013965 ( 1.3x ) |
134332 ( 0.611 ) |
55528 ( 0.512 ) |
128 | NHWC | muRISCV-NN | Scalar | RV32GCV | Loop+SLP |
81013965 ( 1.3x ) |
134332 ( 0.611 ) |
55528 ( 0.512 ) |
1024 | NHWC | muRISCV-NN | Scalar | RV32GCV | Loop+SLP |
81013965 ( 1.3x ) |
134332 ( 0.611 ) |
55528 ( 0.512 ) |
4096 | NHWC | muRISCV-NN | Scalar | RV32GCV | Loop+SLP |
16854114 ( 6.1x ) |
135702 ( 0.617 ) |
55528 ( 0.512 ) |
128 | NHWC | muRISCV-NN | Vector | RV32GCV | - |
6689442 ( 15.3x ) |
135702 ( 0.617 ) |
55528 ( 0.512 ) |
1024 | NHWC | muRISCV-NN | Vector | RV32GCV | - |
5419400 ( 18.9x ) |
135702 ( 0.617 ) |
55528 ( 0.512 ) |
4096 | NHWC | muRISCV-NN | Vector | RV32GCV | - |
62991355 ( 1.6x ) |
146266 ( 0.665 ) |
55476 ( 0.512 ) |
0 | NHWC | muRISCV-NN | Scalar | RV32GCP | - |
68441624 ( 1.5x ) |
149084 ( 0.678 ) |
55476 ( 0.512 ) |
0 | NHWC | muRISCV-NN | Packed | RV32GCP | - |
toycar
)
Anomaly Detection (Cycles (Speedup) | Total ROM (rel.) | Total RAM (rel.) | VLEN | Layout | Kernels | Mode | Arch | Auto-Vectorization |
---|---|---|---|---|---|---|---|---|
3230152 ( 0.9x ) |
580142 ( 0.979 ) |
5580 ( 1.009 ) |
0 | NCHW | TVM | Fallback | RV32GC | - |
3230152 ( 0.9x ) |
580142 ( 0.979 ) |
5580 ( 1.009 ) |
0 | NHWC | TVM | Fallback | RV32GC | - |
2147464 ( 1.3x ) |
609314 ( 1.028 ) |
6892 ( 1.247 ) |
0 | NCHW | TVM | Autotuned | RV32GC | - |
2147464 ( 1.3x ) |
609314 ( 1.028 ) |
6892 ( 1.247 ) |
0 | NHWC | TVM | Autotuned | RV32GC | - |
3230152 ( 0.9x ) |
580130 ( 0.979 ) |
5580 ( 1.009 ) |
128 | NCHW | TVM | Fallback | RV32GCV | Loop+SLP |
3230152 ( 0.9x ) |
580130 ( 0.979 ) |
5580 ( 1.009 ) |
1024 | NCHW | TVM | Fallback | RV32GCV | Loop+SLP |
3230152 ( 0.9x ) |
580130 ( 0.979 ) |
5580 ( 1.009 ) |
4096 | NCHW | TVM | Fallback | RV32GCV | Loop+SLP |
3230152 ( 0.9x ) |
580130 ( 0.979 ) |
5580 ( 1.009 ) |
128 | NHWC | TVM | Fallback | RV32GCV | Loop+SLP |
3230152 ( 0.9x ) |
580130 ( 0.979 ) |
5580 ( 1.009 ) |
1024 | NHWC | TVM | Fallback | RV32GCV | Loop+SLP |
3230152 ( 0.9x ) |
580130 ( 0.979 ) |
5580 ( 1.009 ) |
4096 | NHWC | TVM | Fallback | RV32GCV | Loop+SLP |
2147464 ( 1.3x ) |
609314 ( 1.028 ) |
6892 ( 1.247 ) |
128 | NCHW | TVM | Autotuned | RV32GCV | Loop+SLP |
2147464 ( 1.3x ) |
609314 ( 1.028 ) |
6892 ( 1.247 ) |
1024 | NCHW | TVM | Autotuned | RV32GCV | Loop+SLP |
2147464 ( 1.3x ) |
609314 ( 1.028 ) |
6892 ( 1.247 ) |
4096 | NCHW | TVM | Autotuned | RV32GCV | Loop+SLP |
2147464 ( 1.3x ) |
609314 ( 1.028 ) |
6892 ( 1.247 ) |
128 | NHWC | TVM | Autotuned | RV32GCV | Loop+SLP |
2147464 ( 1.3x ) |
609314 ( 1.028 ) |
6892 ( 1.247 ) |
1024 | NHWC | TVM | Autotuned | RV32GCV | Loop+SLP |
2147464 ( 1.3x ) |
609314 ( 1.028 ) |
6892 ( 1.247 ) |
4096 | NHWC | TVM | Autotuned | RV32GCV | Loop+SLP |
2867860 ( 1.0x ) |
592550 ( 1.0 ) |
5528 ( 1.0 ) |
0 | NCHW | TVM | Fallback | RV32GCP | - |
2867860 ( Base ) |
592550 ( Base ) |
5528 ( Base ) |
0 | NHWC | TVM | Fallback | RV32GCP | - |
1769706 ( 1.6x ) |
618794 ( 1.044 ) |
6840 ( 1.237 ) |
0 | NCHW | TVM | Autotuned | RV32GCP | - |
1769706 ( 1.6x ) |
618794 ( 1.044 ) |
6840 ( 1.237 ) |
0 | NHWC | TVM | Autotuned | RV32GCP | - |
1820385 ( 1.6x ) |
315402 ( 0.532 ) |
4780 ( 0.865 ) |
0 | NHWC | muRISCV-NN | Scalar | RV32GC | - |
1820385 ( 1.6x ) |
315402 ( 0.532 ) |
4780 ( 0.865 ) |
128 | NHWC | muRISCV-NN | Scalar | RV32GCV | Loop+SLP |
1820385 ( 1.6x ) |
315402 ( 0.532 ) |
4780 ( 0.865 ) |
1024 | NHWC | muRISCV-NN | Scalar | RV32GCV | Loop+SLP |
1820385 ( 1.6x ) |
315402 ( 0.532 ) |
4780 ( 0.865 ) |
4096 | NHWC | muRISCV-NN | Scalar | RV32GCV | Loop+SLP |
627187 ( 4.6x ) |
315802 ( 0.533 ) |
4780 ( 0.865 ) |
128 | NHWC | muRISCV-NN | Vector | RV32GCV | - |
420379 ( 6.8x ) |
315802 ( 0.533 ) |
4780 ( 0.865 ) |
1024 | NHWC | muRISCV-NN | Vector | RV32GCV | - |
414811 ( 6.9x ) |
315802 ( 0.533 ) |
4780 ( 0.865 ) |
4096 | NHWC | muRISCV-NN | Vector | RV32GCV | - |
1656904 ( 1.7x ) |
327620 ( 0.553 ) |
4728 ( 0.855 ) |
0 | NHWC | muRISCV-NN | Scalar | RV32GCP | - |
996956 ( 2.9x ) |
327838 ( 0.553 ) |
4728 ( 0.855 ) |
0 | NHWC | muRISCV-NN | Packed | RV32GCP | - |
vww
)
Visual Wake Words (Cycles (Speedup) | Total ROM (rel.) | Total RAM (rel.) | VLEN | Layout | Kernels | Mode | Arch | Auto-Vectorization |
---|---|---|---|---|---|---|---|---|
91779188 ( 0.8x ) |
540914 ( 1.017 ) |
181056 ( 1.0 ) |
0 | NCHW | TVM | Fallback | RV32GC | - |
86076684 ( 0.9x ) |
520542 ( 0.979 ) |
181056 ( 1.0 ) |
0 | NHWC | TVM | Fallback | RV32GC | - |
46542367 ( 1.6x ) |
521918 ( 0.981 ) |
181056 ( 1.0 ) |
0 | NCHW | TVM | Autotuned | RV32GC | - |
86076687 ( 0.9x ) |
520544 ( 0.979 ) |
181056 ( 1.0 ) |
0 | NHWC | TVM | Autotuned | RV32GC | - |
91775924 ( 0.8x ) |
540820 ( 1.017 ) |
181056 ( 1.0 ) |
128 | NCHW | TVM | Fallback | RV32GCV | Loop+SLP |
91775924 ( 0.8x ) |
540820 ( 1.017 ) |
181056 ( 1.0 ) |
1024 | NCHW | TVM | Fallback | RV32GCV | Loop+SLP |
91775924 ( 0.8x ) |
540820 ( 1.017 ) |
181056 ( 1.0 ) |
4096 | NCHW | TVM | Fallback | RV32GCV | Loop+SLP |
86076684 ( 0.9x ) |
520542 ( 0.979 ) |
181056 ( 1.0 ) |
128 | NHWC | TVM | Fallback | RV32GCV | Loop+SLP |
86076684 ( 0.9x ) |
520542 ( 0.979 ) |
181056 ( 1.0 ) |
1024 | NHWC | TVM | Fallback | RV32GCV | Loop+SLP |
86076684 ( 0.9x ) |
520542 ( 0.979 ) |
181056 ( 1.0 ) |
4096 | NHWC | TVM | Fallback | RV32GCV | Loop+SLP |
46557547 ( 1.6x ) |
521910 ( 0.981 ) |
181056 ( 1.0 ) |
128 | NCHW | TVM | Autotuned | RV32GCV | Loop+SLP |
46557547 ( 1.6x ) |
521912 ( 0.981 ) |
181056 ( 1.0 ) |
1024 | NCHW | TVM | Autotuned | RV32GCV | Loop+SLP |
46557547 ( 1.6x ) |
521912 ( 0.981 ) |
181056 ( 1.0 ) |
4096 | NCHW | TVM | Autotuned | RV32GCV | Loop+SLP |
86076687 ( 0.9x ) |
520544 ( 0.979 ) |
181056 ( 1.0 ) |
128 | NHWC | TVM | Autotuned | RV32GCV | Loop+SLP |
86076687 ( 0.9x ) |
520544 ( 0.979 ) |
181056 ( 1.0 ) |
1024 | NHWC | TVM | Autotuned | RV32GCV | Loop+SLP |
86076687 ( 0.9x ) |
520544 ( 0.979 ) |
181056 ( 1.0 ) |
4096 | NHWC | TVM | Autotuned | RV32GCV | Loop+SLP |
81680879 ( 0.9x ) |
551532 ( 1.037 ) |
181004 ( 1.0 ) |
0 | NCHW | TVM | Fallback | RV32GCP | - |
76410709 ( Base ) |
531796 ( Base ) |
181004 ( Base ) |
0 | NHWC | TVM | Fallback | RV32GCP | - |
37728221 ( 2.0x ) |
533010 ( 1.002 ) |
181004 ( 1.0 ) |
0 | NCHW | TVM | Autotuned | RV32GCP | - |
76410712 ( 1.0x ) |
531800 ( 1.0 ) |
181004 ( 1.0 ) |
0 | NHWC | TVM | Autotuned | RV32GCP | - |
49801852 ( 1.5x ) |
318910 ( 0.6 ) |
85672 ( 0.473 ) |
0 | NHWC | muRISCV-NN | Scalar | RV32GC | - |
49801852 ( 1.5x ) |
318910 ( 0.6 ) |
85672 ( 0.473 ) |
128 | NHWC | muRISCV-NN | Scalar | RV32GCV | Loop+SLP |
49801852 ( 1.5x ) |
318910 ( 0.6 ) |
85672 ( 0.473 ) |
1024 | NHWC | muRISCV-NN | Scalar | RV32GCV | Loop+SLP |
49801852 ( 1.5x ) |
318910 ( 0.6 ) |
85672 ( 0.473 ) |
4096 | NHWC | muRISCV-NN | Scalar | RV32GCV | Loop+SLP |
14992375 ( 5.1x ) |
320222 ( 0.602 ) |
85672 ( 0.473 ) |
128 | NHWC | muRISCV-NN | Vector | RV32GCV | - |
9578286 ( 8.0x ) |
320222 ( 0.602 ) |
85672 ( 0.473 ) |
1024 | NHWC | muRISCV-NN | Vector | RV32GCV | - |
9531013 ( 8.0x ) |
320222 ( 0.602 ) |
85672 ( 0.473 ) |
4096 | NHWC | muRISCV-NN | Vector | RV32GCV | - |
40880383 ( 1.9x ) |
330644 ( 0.622 ) |
85620 ( 0.473 ) |
0 | NHWC | muRISCV-NN | Scalar | RV32GCP | - |
49304040 ( 1.5x ) |
333096 ( 0.626 ) |
85620 ( 0.473 ) |
0 | NHWC | muRISCV-NN | Packed | RV32GCP | - |
Original data
Click here to download the raw files for this benchmark.