Benchmarks 2024 02 23 TVM LLVM - tum-ei-eda/muriscv-nn GitHub Wiki
Setup
Simulator
- Spike (
riscv-isa-sim
) (ISS, CPI=1)
Toolchains
- LLVM/Clang:
- TODO: Version
- Linker: lld (TODO)
- RISC-V GCC for Headers, libc,...
Models
-
MLPerfTiny Benchmark
-
TODO: others!
Package Versions
-
MLonMCU : main
-
TVM : Nightly Pre-Build
-
Spike : 0bc176b3fca43560b9e8586cdbc41cfde073e17a
-
Spike PK : 7e9b671c0415dfd7b562ac934feb9380075d4aa2
Miscellaneous
- Used
-Os
flag for compilation. - Benchmarks generated using MLonMCU deployment tool with minimal efforts.
- Memory metrics are reported in Bytes
Results (Framework: tvm, Backend: tvmaot, Toolchain: llvm)
aww
)
Audio Wake Words (Cycles (Speedup) | Total ROM (rel.) | Total RAM (rel.) | VLEN | Layout | Kernels | Mode | Arch | Auto-Vectorization |
---|---|---|---|---|---|---|---|---|
33508602 ( 0.5x ) |
109102 ( 1.202 ) |
59508 ( 3.097 ) |
0 | NCHW | TVM | Fallback | RV32GC | - |
27511073 ( 0.6x ) |
102506 ( 1.13 ) |
59508 ( 3.097 ) |
0 | NHWC | TVM | Fallback | RV32GC | - |
13706216 ( 1.1x ) |
102504 ( 1.13 ) |
51336 ( 2.672 ) |
0 | NCHW | TVM | Autotuned | RV32GC | - |
27505623 ( 0.6x ) |
102618 ( 1.131 ) |
59508 ( 3.097 ) |
0 | NHWC | TVM | Autotuned | RV32GC | - |
3384606 ( 4.6x ) |
105404 ( 1.162 ) |
59508 ( 3.097 ) |
128 | NCHW | TVM | Fallback | RV32GCV | Loop+SLP |
3384606 ( 4.6x ) |
105404 ( 1.162 ) |
59508 ( 3.097 ) |
1024 | NCHW | TVM | Fallback | RV32GCV | Loop+SLP |
3384606 ( 4.6x ) |
105404 ( 1.162 ) |
59508 ( 3.097 ) |
4096 | NCHW | TVM | Fallback | RV32GCV | Loop+SLP |
9566669 ( 1.6x ) |
103606 ( 1.142 ) |
59508 ( 3.097 ) |
128 | NHWC | TVM | Fallback | RV32GCV | Loop+SLP |
6682669 ( 2.3x ) |
103606 ( 1.142 ) |
59508 ( 3.097 ) |
1024 | NHWC | TVM | Fallback | RV32GCV | Loop+SLP |
6682669 ( 2.3x ) |
103606 ( 1.142 ) |
59508 ( 3.097 ) |
4096 | NHWC | TVM | Fallback | RV32GCV | Loop+SLP |
5607715 ( 2.8x ) |
106776 ( 1.177 ) |
51336 ( 2.672 ) |
128 | NCHW | TVM | Autotuned | RV32GCV | Loop+SLP |
3881984 ( 4.0x ) |
106776 ( 1.177 ) |
51336 ( 2.672 ) |
1024 | NCHW | TVM | Autotuned | RV32GCV | Loop+SLP |
3888762 ( 4.0x ) |
106776 ( 1.177 ) |
51336 ( 2.672 ) |
4096 | NCHW | TVM | Autotuned | RV32GCV | Loop+SLP |
9565134 ( 1.6x ) |
104544 ( 1.152 ) |
59508 ( 3.097 ) |
128 | NHWC | TVM | Autotuned | RV32GCV | Loop+SLP |
6683403 ( 2.3x ) |
104544 ( 1.152 ) |
59508 ( 3.097 ) |
1024 | NHWC | TVM | Autotuned | RV32GCV | Loop+SLP |
6690181 ( 2.3x ) |
104544 ( 1.152 ) |
59508 ( 3.097 ) |
4096 | NHWC | TVM | Autotuned | RV32GCV | Loop+SLP |
15615253 ( Base ) |
90744 ( Base ) |
19212 ( Base ) |
0 | NHWC | muRISCV-NN | Scalar | RV32GC | - |
6838472 ( 2.3x ) |
93956 ( 1.035 ) |
19212 ( 1.0 ) |
128 | NHWC | muRISCV-NN | Scalar | RV32GCV | Loop+SLP |
5367945 ( 2.9x ) |
93956 ( 1.035 ) |
19212 ( 1.0 ) |
1024 | NHWC | muRISCV-NN | Scalar | RV32GCV | Loop+SLP |
5374723 ( 2.9x ) |
93956 ( 1.035 ) |
19212 ( 1.0 ) |
4096 | NHWC | muRISCV-NN | Scalar | RV32GCV | Loop+SLP |
5393590 ( 2.9x ) |
90936 ( 1.002 ) |
23676 ( 1.232 ) |
128 | NHWC | muRISCV-NN | Vector | RV32GCV | - |
3650062 ( 4.3x ) |
90936 ( 1.002 ) |
23676 ( 1.232 ) |
1024 | NHWC | muRISCV-NN | Vector | RV32GCV | - |
3656840 ( 4.3x ) |
90936 ( 1.002 ) |
23676 ( 1.232 ) |
4096 | NHWC | muRISCV-NN | Vector | RV32GCV | - |
resnet
)
Image Classification (Cycles (Speedup) | Total ROM (rel.) | Total RAM (rel.) | VLEN | Layout | Kernels | Mode | Arch | Auto-Vectorization |
---|---|---|---|---|---|---|---|---|
144789099 ( 0.4x ) |
218266 ( 1.579 ) |
108420 ( 1.953 ) |
0 | NCHW | TVM | Fallback | RV32GC | - |
112394402 ( 0.5x ) |
209182 ( 1.514 ) |
108420 ( 1.953 ) |
0 | NHWC | TVM | Fallback | RV32GC | - |
53701816 ( 1.1x ) |
212580 ( 1.538 ) |
92236 ( 1.661 ) |
0 | NCHW | TVM | Autotuned | RV32GC | - |
112389724 ( 0.5x ) |
209256 ( 1.514 ) |
108420 ( 1.953 ) |
0 | NHWC | TVM | Autotuned | RV32GC | - |
12825991 ( 4.6x ) |
213678 ( 1.546 ) |
108420 ( 1.953 ) |
128 | NCHW | TVM | Fallback | RV32GCV | Loop+SLP |
12825991 ( 4.6x ) |
213678 ( 1.546 ) |
108420 ( 1.953 ) |
1024 | NCHW | TVM | Fallback | RV32GCV | Loop+SLP |
12825993 ( 4.6x ) |
213686 ( 1.546 ) |
108420 ( 1.953 ) |
4096 | NCHW | TVM | Fallback | RV32GCV | Loop+SLP |
36071695 ( 1.6x ) |
210194 ( 1.521 ) |
108420 ( 1.953 ) |
128 | NHWC | TVM | Fallback | RV32GCV | Loop+SLP |
24311825 ( 2.4x ) |
210202 ( 1.521 ) |
108420 ( 1.953 ) |
1024 | NHWC | TVM | Fallback | RV32GCV | Loop+SLP |
24311825 ( 2.4x ) |
210202 ( 1.521 ) |
108420 ( 1.953 ) |
4096 | NHWC | TVM | Fallback | RV32GCV | Loop+SLP |
19686221 ( 3.0x ) |
226234 ( 1.637 ) |
92236 ( 1.661 ) |
128 | NCHW | TVM | Autotuned | RV32GCV | Loop+SLP |
13965591 ( 4.2x ) |
226232 ( 1.637 ) |
92236 ( 1.661 ) |
1024 | NCHW | TVM | Autotuned | RV32GCV | Loop+SLP |
14023204 ( 4.2x ) |
226232 ( 1.637 ) |
92236 ( 1.661 ) |
4096 | NCHW | TVM | Autotuned | RV32GCV | Loop+SLP |
36069954 ( 1.6x ) |
210930 ( 1.526 ) |
108420 ( 1.953 ) |
128 | NHWC | TVM | Autotuned | RV32GCV | Loop+SLP |
24309151 ( 2.4x ) |
210930 ( 1.526 ) |
108420 ( 1.953 ) |
1024 | NHWC | TVM | Autotuned | RV32GCV | Loop+SLP |
24315931 ( 2.4x ) |
210938 ( 1.526 ) |
108420 ( 1.953 ) |
4096 | NHWC | TVM | Autotuned | RV32GCV | Loop+SLP |
58407308 ( Base ) |
138192 ( Base ) |
55516 ( Base ) |
0 | NHWC | muRISCV-NN | Scalar | RV32GC | - |
28255208 ( 2.1x ) |
141914 ( 1.027 ) |
55516 ( 1.0 ) |
128 | NHWC | muRISCV-NN | Scalar | RV32GCV | Loop+SLP |
13704813 ( 4.3x ) |
141914 ( 1.027 ) |
55516 ( 1.0 ) |
1024 | NHWC | muRISCV-NN | Scalar | RV32GCV | Loop+SLP |
12134631 ( 4.8x ) |
141914 ( 1.027 ) |
55516 ( 1.0 ) |
4096 | NHWC | muRISCV-NN | Scalar | RV32GCV | Loop+SLP |
15905989 ( 3.7x ) |
139024 ( 1.006 ) |
55516 ( 1.0 ) |
128 | NHWC | muRISCV-NN | Vector | RV32GCV | - |
6531691 ( 8.9x ) |
139024 ( 1.006 ) |
55516 ( 1.0 ) |
1024 | NHWC | muRISCV-NN | Vector | RV32GCV | - |
5360777 ( 10.9x ) |
139024 ( 1.006 ) |
55516 ( 1.0 ) |
4096 | NHWC | muRISCV-NN | Vector | RV32GCV | - |
toycar
)
Anomaly Detection (Cycles (Speedup) | Total ROM (rel.) | Total RAM (rel.) | VLEN | Layout | Kernels | Mode | Arch | Auto-Vectorization |
---|---|---|---|---|---|---|---|---|
3404880 ( 0.5x ) |
581362 ( 1.84 ) |
5572 ( 1.168 ) |
0 | NCHW | TVM | Fallback | RV32GC | - |
3404882 ( 0.5x ) |
581370 ( 1.84 ) |
5572 ( 1.168 ) |
0 | NHWC | TVM | Fallback | RV32GC | - |
2245737 ( 0.8x ) |
609080 ( 1.928 ) |
6884 ( 1.443 ) |
0 | NCHW | TVM | Autotuned | RV32GC | - |
2245737 ( 0.8x ) |
609080 ( 1.928 ) |
6884 ( 1.443 ) |
0 | NHWC | TVM | Autotuned | RV32GC | - |
984695 ( 1.8x ) |
581106 ( 1.839 ) |
5572 ( 1.168 ) |
128 | NCHW | TVM | Fallback | RV32GCV | Loop+SLP |
984695 ( 1.8x ) |
581106 ( 1.839 ) |
5572 ( 1.168 ) |
1024 | NCHW | TVM | Fallback | RV32GCV | Loop+SLP |
984695 ( 1.8x ) |
581106 ( 1.839 ) |
5572 ( 1.168 ) |
4096 | NCHW | TVM | Fallback | RV32GCV | Loop+SLP |
984693 ( 1.8x ) |
581098 ( 1.839 ) |
5572 ( 1.168 ) |
128 | NHWC | TVM | Fallback | RV32GCV | Loop+SLP |
984695 ( 1.8x ) |
581106 ( 1.839 ) |
5572 ( 1.168 ) |
1024 | NHWC | TVM | Fallback | RV32GCV | Loop+SLP |
984695 ( 1.8x ) |
581106 ( 1.839 ) |
5572 ( 1.168 ) |
4096 | NHWC | TVM | Fallback | RV32GCV | Loop+SLP |
1280619 ( 1.3x ) |
600432 ( 1.9 ) |
6884 ( 1.443 ) |
128 | NCHW | TVM | Autotuned | RV32GCV | Loop+SLP |
1148032 ( 1.5x ) |
600432 ( 1.9 ) |
6884 ( 1.443 ) |
1024 | NCHW | TVM | Autotuned | RV32GCV | Loop+SLP |
1151904 ( 1.5x ) |
600432 ( 1.9 ) |
6884 ( 1.443 ) |
4096 | NCHW | TVM | Autotuned | RV32GCV | Loop+SLP |
1280619 ( 1.3x ) |
600432 ( 1.9 ) |
6884 ( 1.443 ) |
128 | NHWC | TVM | Autotuned | RV32GCV | Loop+SLP |
1148032 ( 1.5x ) |
600432 ( 1.9 ) |
6884 ( 1.443 ) |
1024 | NHWC | TVM | Autotuned | RV32GCV | Loop+SLP |
1151904 ( 1.5x ) |
600432 ( 1.9 ) |
6884 ( 1.443 ) |
4096 | NHWC | TVM | Autotuned | RV32GCV | Loop+SLP |
1724220 ( Base ) |
315978 ( Base ) |
4772 ( Base ) |
0 | NHWC | muRISCV-NN | Scalar | RV32GC | - |
620488 ( 2.8x ) |
316614 ( 1.002 ) |
4772 ( 1.0 ) |
128 | NHWC | muRISCV-NN | Scalar | RV32GCV | Loop+SLP |
421126 ( 4.1x ) |
316610 ( 1.002 ) |
4772 ( 1.0 ) |
1024 | NHWC | muRISCV-NN | Scalar | RV32GCV | Loop+SLP |
405104 ( 4.3x ) |
316614 ( 1.002 ) |
4772 ( 1.0 ) |
4096 | NHWC | muRISCV-NN | Scalar | RV32GCV | Loop+SLP |
618527 ( 2.8x ) |
316520 ( 1.002 ) |
4772 ( 1.0 ) |
128 | NHWC | muRISCV-NN | Vector | RV32GCV | - |
414660 ( 4.2x ) |
316518 ( 1.002 ) |
4772 ( 1.0 ) |
1024 | NHWC | muRISCV-NN | Vector | RV32GCV | - |
409172 ( 4.2x ) |
316514 ( 1.002 ) |
4772 ( 1.0 ) |
4096 | NHWC | muRISCV-NN | Vector | RV32GCV | - |
vww
)
Visual Wake Words (Cycles (Speedup) | Total ROM (rel.) | Total RAM (rel.) | VLEN | Layout | Kernels | Mode | Arch | Auto-Vectorization |
---|---|---|---|---|---|---|---|---|
96665970 ( 0.5x ) |
545172 ( 1.684 ) |
181032 ( 2.113 ) |
0 | NCHW | TVM | Fallback | RV32GC | - |
79940190 ( 0.6x ) |
521128 ( 1.61 ) |
181032 ( 2.113 ) |
0 | NHWC | TVM | Fallback | RV32GC | - |
42404606 ( 1.1x ) |
525210 ( 1.622 ) |
181032 ( 2.113 ) |
0 | NCHW | TVM | Autotuned | RV32GC | - |
79940190 ( 0.6x ) |
521128 ( 1.61 ) |
181032 ( 2.113 ) |
0 | NHWC | TVM | Autotuned | RV32GC | - |
11010119 ( 4.2x ) |
532508 ( 1.645 ) |
181032 ( 2.113 ) |
128 | NCHW | TVM | Fallback | RV32GCV | Loop+SLP |
11010121 ( 4.2x ) |
532510 ( 1.645 ) |
181032 ( 2.113 ) |
1024 | NCHW | TVM | Fallback | RV32GCV | Loop+SLP |
11010118 ( 4.2x ) |
532506 ( 1.645 ) |
181032 ( 2.113 ) |
4096 | NCHW | TVM | Fallback | RV32GCV | Loop+SLP |
30451802 ( 1.5x ) |
523638 ( 1.617 ) |
181032 ( 2.113 ) |
128 | NHWC | TVM | Fallback | RV32GCV | Loop+SLP |
22700929 ( 2.1x ) |
523636 ( 1.617 ) |
181032 ( 2.113 ) |
1024 | NHWC | TVM | Fallback | RV32GCV | Loop+SLP |
22307809 ( 2.1x ) |
523636 ( 1.617 ) |
181032 ( 2.113 ) |
4096 | NHWC | TVM | Fallback | RV32GCV | Loop+SLP |
24965514 ( 1.9x ) |
550736 ( 1.701 ) |
181032 ( 2.113 ) |
128 | NCHW | TVM | Autotuned | RV32GCV | Loop+SLP |
19204885 ( 2.4x ) |
550734 ( 1.701 ) |
181032 ( 2.113 ) |
1024 | NCHW | TVM | Autotuned | RV32GCV | Loop+SLP |
19039947 ( 2.5x ) |
550734 ( 1.701 ) |
181032 ( 2.113 ) |
4096 | NCHW | TVM | Autotuned | RV32GCV | Loop+SLP |
30450317 ( 1.5x ) |
523748 ( 1.618 ) |
181032 ( 2.113 ) |
128 | NHWC | TVM | Autotuned | RV32GCV | Loop+SLP |
22698969 ( 2.1x ) |
523748 ( 1.618 ) |
181032 ( 2.113 ) |
1024 | NHWC | TVM | Autotuned | RV32GCV | Loop+SLP |
22305797 ( 2.1x ) |
523746 ( 1.618 ) |
181032 ( 2.113 ) |
4096 | NHWC | TVM | Autotuned | RV32GCV | Loop+SLP |
46765906 ( Base ) |
323762 ( Base ) |
85664 ( Base ) |
0 | NHWC | muRISCV-NN | Scalar | RV32GC | - |
19684203 ( 2.4x ) |
327944 ( 1.013 ) |
85664 ( 1.0 ) |
128 | NHWC | muRISCV-NN | Scalar | RV32GCV | Loop+SLP |
14888473 ( 3.1x ) |
327948 ( 1.013 ) |
85664 ( 1.0 ) |
1024 | NHWC | muRISCV-NN | Scalar | RV32GCV | Loop+SLP |
14671295 ( 3.2x ) |
327946 ( 1.013 ) |
85664 ( 1.0 ) |
4096 | NHWC | muRISCV-NN | Scalar | RV32GCV | Loop+SLP |
14675146 ( 3.2x ) |
325036 ( 1.004 ) |
85664 ( 1.0 ) |
128 | NHWC | muRISCV-NN | Vector | RV32GCV | - |
9540606 ( 4.9x ) |
325036 ( 1.004 ) |
85664 ( 1.0 ) |
1024 | NHWC | muRISCV-NN | Vector | RV32GCV | - |
9498994 ( 4.9x ) |
325036 ( 1.004 ) |
85664 ( 1.0 ) |
4096 | NHWC | muRISCV-NN | Vector | RV32GCV | - |
Original data
Click here to download the raw files for this benchmark.