Benchmarks 2024 02 22 TVM LLVM - tum-ei-eda/muriscv-nn GitHub Wiki
Setup
Simulator
- Spike (
riscv-isa-sim
) (ISS, CPI=1)
Toolchains
- LLVM/Clang:
- TODO: Version
- Linker: lld (TODO)
- RISC-V GCC for Headers, libc,...
Models
-
MLPerfTiny Benchmark
-
TODO: others!
Package Versions
-
MLonMCU : main
-
TVM : Nightly Pre-Build
-
Spike : 0bc176b3fca43560b9e8586cdbc41cfde073e17a
-
Spike PK : 7e9b671c0415dfd7b562ac934feb9380075d4aa2
Miscellaneous
- Used
-Os
flag for compilation. - Benchmarks generated using MLonMCU deployment tool with minimal efforts.
- Memory metrics are reported in Bytes
Results (Framework: tvm, Backend: tvmaot, Toolchain: llvm)
aww
)
Audio Wake Words (Cycles (Speedup) | Total ROM (rel.) | Total RAM (rel.) | VLEN | Layout | Kernels | Mode | Arch | Auto-Vectorization |
---|---|---|---|---|---|---|---|---|
33508602 ( 0.3x ) |
109102 ( 1.044 ) |
59508 ( 1.0 ) |
0 | NCHW | TVM | Fallback | RV32GC | - |
27511073 ( 0.3x ) |
102506 ( 0.981 ) |
59508 ( 1.0 ) |
0 | NHWC | TVM | Fallback | RV32GC | - |
13706216 ( 0.7x ) |
102504 ( 0.98 ) |
51336 ( 0.863 ) |
0 | NCHW | TVM | Autotuned | RV32GC | - |
27505623 ( 0.3x ) |
102618 ( 0.982 ) |
59508 ( 1.0 ) |
0 | NHWC | TVM | Autotuned | RV32GC | - |
3384606 ( 2.8x ) |
105404 ( 1.008 ) |
59508 ( 1.0 ) |
128 | NCHW | TVM | Fallback | RV32GCV | Loop+SLP |
3384606 ( 2.8x ) |
105404 ( 1.008 ) |
59508 ( 1.0 ) |
1024 | NCHW | TVM | Fallback | RV32GCV | Loop+SLP |
3384606 ( 2.8x ) |
105404 ( 1.008 ) |
59508 ( 1.0 ) |
4096 | NCHW | TVM | Fallback | RV32GCV | Loop+SLP |
9566669 ( 1.0x ) |
103606 ( 0.991 ) |
59508 ( 1.0 ) |
128 | NHWC | TVM | Fallback | RV32GCV | Loop+SLP |
6682669 ( 1.4x ) |
103606 ( 0.991 ) |
59508 ( 1.0 ) |
1024 | NHWC | TVM | Fallback | RV32GCV | Loop+SLP |
6682669 ( 1.4x ) |
103606 ( 0.991 ) |
59508 ( 1.0 ) |
4096 | NHWC | TVM | Fallback | RV32GCV | Loop+SLP |
5607715 ( 1.7x ) |
106776 ( 1.021 ) |
51336 ( 0.863 ) |
128 | NCHW | TVM | Autotuned | RV32GCV | Loop+SLP |
3881984 ( 2.5x ) |
106776 ( 1.021 ) |
51336 ( 0.863 ) |
1024 | NCHW | TVM | Autotuned | RV32GCV | Loop+SLP |
3888762 ( 2.5x ) |
106776 ( 1.021 ) |
51336 ( 0.863 ) |
4096 | NCHW | TVM | Autotuned | RV32GCV | Loop+SLP |
9565134 ( Base ) |
104544 ( Base ) |
59508 ( Base ) |
128 | NHWC | TVM | Autotuned | RV32GCV | Loop+SLP |
6683403 ( 1.4x ) |
104544 ( 1.0 ) |
59508 ( 1.0 ) |
1024 | NHWC | TVM | Autotuned | RV32GCV | Loop+SLP |
6690181 ( 1.4x ) |
104544 ( 1.0 ) |
59508 ( 1.0 ) |
4096 | NHWC | TVM | Autotuned | RV32GCV | Loop+SLP |
15615253 ( 0.6x ) |
90744 ( 0.868 ) |
19212 ( 0.323 ) |
0 | NHWC | muRISCV-NN | Scalar | RV32GC | - |
6838472 ( 1.4x ) |
93956 ( 0.899 ) |
19212 ( 0.323 ) |
128 | NHWC | muRISCV-NN | Scalar | RV32GCV | Loop+SLP |
5367945 ( 1.8x ) |
93956 ( 0.899 ) |
19212 ( 0.323 ) |
1024 | NHWC | muRISCV-NN | Scalar | RV32GCV | Loop+SLP |
5374723 ( 1.8x ) |
93956 ( 0.899 ) |
19212 ( 0.323 ) |
4096 | NHWC | muRISCV-NN | Scalar | RV32GCV | Loop+SLP |
5393590 ( 1.8x ) |
90936 ( 0.87 ) |
23676 ( 0.398 ) |
128 | NHWC | muRISCV-NN | Vector | RV32GCV | - |
3650062 ( 2.6x ) |
90936 ( 0.87 ) |
23676 ( 0.398 ) |
1024 | NHWC | muRISCV-NN | Vector | RV32GCV | - |
3656840 ( 2.6x ) |
90936 ( 0.87 ) |
23676 ( 0.398 ) |
4096 | NHWC | muRISCV-NN | Vector | RV32GCV | - |
resnet
)
Image Classification (Cycles (Speedup) | Total ROM (rel.) | Total RAM (rel.) | VLEN | Layout | Kernels | Mode | Arch | Auto-Vectorization |
---|---|---|---|---|---|---|---|---|
144789098 ( 0.2x ) |
218264 ( 1.035 ) |
108420 ( 1.0 ) |
0 | NCHW | TVM | Fallback | RV32GC | - |
112394400 ( 0.3x ) |
209174 ( 0.992 ) |
108420 ( 1.0 ) |
0 | NHWC | TVM | Fallback | RV32GC | - |
53701816 ( 0.7x ) |
212580 ( 1.008 ) |
92236 ( 0.851 ) |
0 | NCHW | TVM | Autotuned | RV32GC | - |
112389726 ( 0.3x ) |
209264 ( 0.992 ) |
108420 ( 1.0 ) |
0 | NHWC | TVM | Autotuned | RV32GC | - |
12825990 ( 2.8x ) |
213676 ( 1.013 ) |
108420 ( 1.0 ) |
128 | NCHW | TVM | Fallback | RV32GCV | Loop+SLP |
12825992 ( 2.8x ) |
213684 ( 1.013 ) |
108420 ( 1.0 ) |
1024 | NCHW | TVM | Fallback | RV32GCV | Loop+SLP |
12825990 ( 2.8x ) |
213676 ( 1.013 ) |
108420 ( 1.0 ) |
4096 | NCHW | TVM | Fallback | RV32GCV | Loop+SLP |
36071695 ( 1.0x ) |
210194 ( 0.997 ) |
108420 ( 1.0 ) |
128 | NHWC | TVM | Fallback | RV32GCV | Loop+SLP |
24311825 ( 1.5x ) |
210202 ( 0.997 ) |
108420 ( 1.0 ) |
1024 | NHWC | TVM | Fallback | RV32GCV | Loop+SLP |
24311825 ( 1.5x ) |
210202 ( 0.997 ) |
108420 ( 1.0 ) |
4096 | NHWC | TVM | Fallback | RV32GCV | Loop+SLP |
19686221 ( 1.8x ) |
226234 ( 1.073 ) |
92236 ( 0.851 ) |
128 | NCHW | TVM | Autotuned | RV32GCV | Loop+SLP |
13965592 ( 2.6x ) |
226234 ( 1.073 ) |
92236 ( 0.851 ) |
1024 | NCHW | TVM | Autotuned | RV32GCV | Loop+SLP |
14023205 ( 2.6x ) |
226234 ( 1.073 ) |
92236 ( 0.851 ) |
4096 | NCHW | TVM | Autotuned | RV32GCV | Loop+SLP |
36069954 ( Base ) |
210930 ( Base ) |
108420 ( Base ) |
128 | NHWC | TVM | Autotuned | RV32GCV | Loop+SLP |
24309153 ( 1.5x ) |
210938 ( 1.0 ) |
108420 ( 1.0 ) |
1024 | NHWC | TVM | Autotuned | RV32GCV | Loop+SLP |
24315929 ( 1.5x ) |
210930 ( 1.0 ) |
108420 ( 1.0 ) |
4096 | NHWC | TVM | Autotuned | RV32GCV | Loop+SLP |
58407308 ( 0.6x ) |
138192 ( 0.655 ) |
55516 ( 0.512 ) |
0 | NHWC | muRISCV-NN | Scalar | RV32GC | - |
28255208 ( 1.3x ) |
141914 ( 0.673 ) |
55516 ( 0.512 ) |
128 | NHWC | muRISCV-NN | Scalar | RV32GCV | Loop+SLP |
13704813 ( 2.6x ) |
141914 ( 0.673 ) |
55516 ( 0.512 ) |
1024 | NHWC | muRISCV-NN | Scalar | RV32GCV | Loop+SLP |
12134631 ( 3.0x ) |
141914 ( 0.673 ) |
55516 ( 0.512 ) |
4096 | NHWC | muRISCV-NN | Scalar | RV32GCV | Loop+SLP |
15905989 ( 2.3x ) |
139024 ( 0.659 ) |
55516 ( 0.512 ) |
128 | NHWC | muRISCV-NN | Vector | RV32GCV | - |
6531691 ( 5.5x ) |
139024 ( 0.659 ) |
55516 ( 0.512 ) |
1024 | NHWC | muRISCV-NN | Vector | RV32GCV | - |
5360777 ( 6.7x ) |
139024 ( 0.659 ) |
55516 ( 0.512 ) |
4096 | NHWC | muRISCV-NN | Vector | RV32GCV | - |
toycar
)
Anomaly Detection (Cycles (Speedup) | Total ROM (rel.) | Total RAM (rel.) | VLEN | Layout | Kernels | Mode | Arch | Auto-Vectorization |
---|---|---|---|---|---|---|---|---|
3404880 ( 0.4x ) |
581362 ( 0.968 ) |
5572 ( 0.809 ) |
0 | NCHW | TVM | Fallback | RV32GC | - |
3404880 ( 0.4x ) |
581362 ( 0.968 ) |
5572 ( 0.809 ) |
0 | NHWC | TVM | Fallback | RV32GC | - |
2245737 ( 0.6x ) |
609080 ( 1.014 ) |
6884 ( 1.0 ) |
0 | NCHW | TVM | Autotuned | RV32GC | - |
2245737 ( 0.6x ) |
609080 ( 1.014 ) |
6884 ( 1.0 ) |
0 | NHWC | TVM | Autotuned | RV32GC | - |
984695 ( 1.3x ) |
581106 ( 0.968 ) |
5572 ( 0.809 ) |
128 | NCHW | TVM | Fallback | RV32GCV | Loop+SLP |
984695 ( 1.3x ) |
581106 ( 0.968 ) |
5572 ( 0.809 ) |
1024 | NCHW | TVM | Fallback | RV32GCV | Loop+SLP |
984693 ( 1.3x ) |
581098 ( 0.968 ) |
5572 ( 0.809 ) |
4096 | NCHW | TVM | Fallback | RV32GCV | Loop+SLP |
984695 ( 1.3x ) |
581106 ( 0.968 ) |
5572 ( 0.809 ) |
128 | NHWC | TVM | Fallback | RV32GCV | Loop+SLP |
984693 ( 1.3x ) |
581098 ( 0.968 ) |
5572 ( 0.809 ) |
1024 | NHWC | TVM | Fallback | RV32GCV | Loop+SLP |
984693 ( 1.3x ) |
581098 ( 0.968 ) |
5572 ( 0.809 ) |
4096 | NHWC | TVM | Fallback | RV32GCV | Loop+SLP |
1280619 ( 1.0x ) |
600432 ( 1.0 ) |
6884 ( 1.0 ) |
128 | NCHW | TVM | Autotuned | RV32GCV | Loop+SLP |
1148032 ( 1.1x ) |
600432 ( 1.0 ) |
6884 ( 1.0 ) |
1024 | NCHW | TVM | Autotuned | RV32GCV | Loop+SLP |
1151904 ( 1.1x ) |
600432 ( 1.0 ) |
6884 ( 1.0 ) |
4096 | NCHW | TVM | Autotuned | RV32GCV | Loop+SLP |
1280619 ( Base ) |
600432 ( Base ) |
6884 ( Base ) |
128 | NHWC | TVM | Autotuned | RV32GCV | Loop+SLP |
1148032 ( 1.1x ) |
600432 ( 1.0 ) |
6884 ( 1.0 ) |
1024 | NHWC | TVM | Autotuned | RV32GCV | Loop+SLP |
1151904 ( 1.1x ) |
600432 ( 1.0 ) |
6884 ( 1.0 ) |
4096 | NHWC | TVM | Autotuned | RV32GCV | Loop+SLP |
1724219 ( 0.7x ) |
315976 ( 0.526 ) |
4772 ( 0.693 ) |
0 | NHWC | muRISCV-NN | Scalar | RV32GC | - |
620488 ( 2.1x ) |
316614 ( 0.527 ) |
4772 ( 0.693 ) |
128 | NHWC | muRISCV-NN | Scalar | RV32GCV | Loop+SLP |
421128 ( 3.0x ) |
316614 ( 0.527 ) |
4772 ( 0.693 ) |
1024 | NHWC | muRISCV-NN | Scalar | RV32GCV | Loop+SLP |
405102 ( 3.2x ) |
316608 ( 0.527 ) |
4772 ( 0.693 ) |
4096 | NHWC | muRISCV-NN | Scalar | RV32GCV | Loop+SLP |
618525 ( 2.1x ) |
316516 ( 0.527 ) |
4772 ( 0.693 ) |
128 | NHWC | muRISCV-NN | Vector | RV32GCV | - |
414660 ( 3.1x ) |
316518 ( 0.527 ) |
4772 ( 0.693 ) |
1024 | NHWC | muRISCV-NN | Vector | RV32GCV | - |
409173 ( 3.1x ) |
316518 ( 0.527 ) |
4772 ( 0.693 ) |
4096 | NHWC | muRISCV-NN | Vector | RV32GCV | - |
vww
)
Visual Wake Words (Cycles (Speedup) | Total ROM (rel.) | Total RAM (rel.) | VLEN | Layout | Kernels | Mode | Arch | Auto-Vectorization |
---|---|---|---|---|---|---|---|---|
96665972 ( 0.3x ) |
545176 ( 1.041 ) |
181032 ( 1.0 ) |
0 | NCHW | TVM | Fallback | RV32GC | - |
79940190 ( 0.4x ) |
521128 ( 0.995 ) |
181032 ( 1.0 ) |
0 | NHWC | TVM | Fallback | RV32GC | - |
42404605 ( 0.7x ) |
525206 ( 1.003 ) |
181032 ( 1.0 ) |
0 | NCHW | TVM | Autotuned | RV32GC | - |
79940189 ( 0.4x ) |
521126 ( 0.995 ) |
181032 ( 1.0 ) |
0 | NHWC | TVM | Autotuned | RV32GC | - |
11010121 ( 2.8x ) |
532512 ( 1.017 ) |
181032 ( 1.0 ) |
128 | NCHW | TVM | Fallback | RV32GCV | Loop+SLP |
11010119 ( 2.8x ) |
532508 ( 1.017 ) |
181032 ( 1.0 ) |
1024 | NCHW | TVM | Fallback | RV32GCV | Loop+SLP |
11010120 ( 2.8x ) |
532512 ( 1.017 ) |
181032 ( 1.0 ) |
4096 | NCHW | TVM | Fallback | RV32GCV | Loop+SLP |
30451802 ( 1.0x ) |
523638 ( 1.0 ) |
181032 ( 1.0 ) |
128 | NHWC | TVM | Fallback | RV32GCV | Loop+SLP |
22700929 ( 1.3x ) |
523638 ( 1.0 ) |
181032 ( 1.0 ) |
1024 | NHWC | TVM | Fallback | RV32GCV | Loop+SLP |
22307809 ( 1.4x ) |
523638 ( 1.0 ) |
181032 ( 1.0 ) |
4096 | NHWC | TVM | Fallback | RV32GCV | Loop+SLP |
24965513 ( 1.2x ) |
550722 ( 1.051 ) |
181032 ( 1.0 ) |
128 | NCHW | TVM | Autotuned | RV32GCV | Loop+SLP |
19204882 ( 1.6x ) |
550722 ( 1.051 ) |
181032 ( 1.0 ) |
1024 | NCHW | TVM | Autotuned | RV32GCV | Loop+SLP |
19039942 ( 1.6x ) |
550718 ( 1.051 ) |
181032 ( 1.0 ) |
4096 | NCHW | TVM | Autotuned | RV32GCV | Loop+SLP |
30450319 ( Base ) |
523750 ( Base ) |
181032 ( Base ) |
128 | NHWC | TVM | Autotuned | RV32GCV | Loop+SLP |
22698969 ( 1.3x ) |
523748 ( 1.0 ) |
181032 ( 1.0 ) |
1024 | NHWC | TVM | Autotuned | RV32GCV | Loop+SLP |
22305797 ( 1.4x ) |
523746 ( 1.0 ) |
181032 ( 1.0 ) |
4096 | NHWC | TVM | Autotuned | RV32GCV | Loop+SLP |
46765906 ( 0.7x ) |
323762 ( 0.618 ) |
85664 ( 0.473 ) |
0 | NHWC | muRISCV-NN | Scalar | RV32GC | - |
19684204 ( 1.5x ) |
327948 ( 0.626 ) |
85664 ( 0.473 ) |
128 | NHWC | muRISCV-NN | Scalar | RV32GCV | Loop+SLP |
14888472 ( 2.0x ) |
327942 ( 0.626 ) |
85664 ( 0.473 ) |
1024 | NHWC | muRISCV-NN | Scalar | RV32GCV | Loop+SLP |
14671294 ( 2.1x ) |
327944 ( 0.626 ) |
85664 ( 0.473 ) |
4096 | NHWC | muRISCV-NN | Scalar | RV32GCV | Loop+SLP |
14675146 ( 2.1x ) |
325034 ( 0.621 ) |
85664 ( 0.473 ) |
128 | NHWC | muRISCV-NN | Vector | RV32GCV | - |
9540606 ( 3.2x ) |
325034 ( 0.621 ) |
85664 ( 0.473 ) |
1024 | NHWC | muRISCV-NN | Vector | RV32GCV | - |
9498994 ( 3.2x ) |
325036 ( 0.621 ) |
85664 ( 0.473 ) |
4096 | NHWC | muRISCV-NN | Vector | RV32GCV | - |
Original data
Click here to download the raw files for this benchmark.