Benchmarks 2024 02 11 TVM LLVM - tum-ei-eda/muriscv-nn GitHub Wiki

Setup

Simulator

Toolchains

  • LLVM/Clang:
    • TODO: Version
    • Linker: lld (TODO)
    • RISC-V GCC for Headers, libc,...

Models

Package Versions

  • MLonMCU : main

  • TVM : Nightly Pre-Build

  • Spike : 0bc176b3fca43560b9e8586cdbc41cfde073e17a

  • Spike PK : 7e9b671c0415dfd7b562ac934feb9380075d4aa2

Miscellaneous

  • Used -Os flag for compilation.
  • Benchmarks generated using MLonMCU deployment tool with minimal efforts.
  • Memory metrics are reported in Bytes

Results (Framework: tvm, Backend: tvmaot, Toolchain: llvm)

Audio Wake Words (aww)

Cycles (Speedup) Total ROM (rel.) Total RAM (rel.) VLEN Layout Kernels Mode Arch Auto-Vectorization
33508602 ( 0.5x ) 109102 ( 1.205 ) 59508 ( 3.097 ) 0 NCHW TVM Fallback RV32GC -
27511073 ( 0.6x ) 102506 ( 1.133 ) 59508 ( 3.097 ) 0 NHWC TVM Fallback RV32GC -
13706216 ( 1.1x ) 102504 ( 1.133 ) 51336 ( 2.672 ) 0 NCHW TVM Autotuned RV32GC -
27505623 ( 0.6x ) 102618 ( 1.134 ) 59508 ( 3.097 ) 0 NHWC TVM Autotuned RV32GC -
3384606 ( 4.6x ) 105404 ( 1.165 ) 59508 ( 3.097 ) 128 NCHW TVM Fallback RV32GCV Loop+SLP
3384606 ( 4.6x ) 105404 ( 1.165 ) 59508 ( 3.097 ) 1024 NCHW TVM Fallback RV32GCV Loop+SLP
9566669 ( 1.6x ) 103606 ( 1.145 ) 59508 ( 3.097 ) 128 NHWC TVM Fallback RV32GCV Loop+SLP
6682669 ( 2.3x ) 103606 ( 1.145 ) 59508 ( 3.097 ) 1024 NHWC TVM Fallback RV32GCV Loop+SLP
5607715 ( 2.8x ) 106776 ( 1.18 ) 51336 ( 2.672 ) 128 NCHW TVM Autotuned RV32GCV Loop+SLP
3881984 ( 4.0x ) 106776 ( 1.18 ) 51336 ( 2.672 ) 1024 NCHW TVM Autotuned RV32GCV Loop+SLP
9565134 ( 1.6x ) 104544 ( 1.155 ) 59508 ( 3.097 ) 128 NHWC TVM Autotuned RV32GCV Loop+SLP
6683403 ( 2.3x ) 104544 ( 1.155 ) 59508 ( 3.097 ) 1024 NHWC TVM Autotuned RV32GCV Loop+SLP
15615223 ( Base ) 90510 ( Base ) 19212 ( Base ) 0 NHWC muRISCV-NN Scalar RV32GC -
6838468 ( 2.3x ) 93734 ( 1.036 ) 19212 ( 1.0 ) 128 NHWC muRISCV-NN Scalar RV32GCV Loop+SLP
5367941 ( 2.9x ) 93734 ( 1.036 ) 19212 ( 1.0 ) 1024 NHWC muRISCV-NN Scalar RV32GCV Loop+SLP
7407276 ( 2.1x ) 90216 ( 0.997 ) 23676 ( 1.232 ) 128 NHWC muRISCV-NN Vector RV32GCV -
3759264 ( 4.2x ) 90216 ( 0.997 ) 23676 ( 1.232 ) 1024 NHWC muRISCV-NN Vector RV32GCV -

Notes

  • TVM Fallback kernels can be vectoried easily with LLVM
  • Autotuning + NCHW layout + AutoVectorizer may outperform muRISCV-NN (especially for small VLENs)
  • Tuned results could even be more drastically (Tuned on old TVM version and without auto-vectorizer! TODO: replace)
  • Autotuned is sometimes worse than Fallback!

Image Classification (resnet)

Cycles (Speedup) Total ROM (rel.) Total RAM (rel.) VLEN Layout Kernels Mode Arch Auto-Vectorization
144789099 ( 0.4x ) 218266 ( 1.582 ) 108420 ( 1.953 ) 0 NCHW TVM Fallback RV32GC -
112394400 ( 0.5x ) 209174 ( 1.516 ) 108420 ( 1.953 ) 0 NHWC TVM Fallback RV32GC -
53701817 ( 1.1x ) 212582 ( 1.541 ) 92236 ( 1.661 ) 0 NCHW TVM Autotuned RV32GC -
112389724 ( 0.5x ) 209256 ( 1.517 ) 108420 ( 1.953 ) 0 NHWC TVM Autotuned RV32GC -
12825990 ( 4.6x ) 213676 ( 1.549 ) 108420 ( 1.953 ) 128 NCHW TVM Fallback RV32GCV Loop+SLP
12825991 ( 4.6x ) 213678 ( 1.549 ) 108420 ( 1.953 ) 1024 NCHW TVM Fallback RV32GCV Loop+SLP
36071697 ( 1.6x ) 210202 ( 1.524 ) 108420 ( 1.953 ) 128 NHWC TVM Fallback RV32GCV Loop+SLP
24311825 ( 2.4x ) 210202 ( 1.524 ) 108420 ( 1.953 ) 1024 NHWC TVM Fallback RV32GCV Loop+SLP
19686220 ( 3.0x ) 226232 ( 1.64 ) 92236 ( 1.661 ) 128 NCHW TVM Autotuned RV32GCV Loop+SLP
13965591 ( 4.2x ) 226232 ( 1.64 ) 92236 ( 1.661 ) 1024 NCHW TVM Autotuned RV32GCV Loop+SLP
36069954 ( 1.6x ) 210930 ( 1.529 ) 108420 ( 1.953 ) 128 NHWC TVM Autotuned RV32GCV Loop+SLP
24309153 ( 2.4x ) 210938 ( 1.529 ) 108420 ( 1.953 ) 1024 NHWC TVM Autotuned RV32GCV Loop+SLP
58402822 ( Base ) 137958 ( Base ) 55516 ( Base ) 0 NHWC muRISCV-NN Scalar RV32GC -
28255398 ( 2.1x ) 141694 ( 1.027 ) 55516 ( 1.0 ) 128 NHWC muRISCV-NN Scalar RV32GCV Loop+SLP
13704877 ( 4.3x ) 141694 ( 1.027 ) 55516 ( 1.0 ) 1024 NHWC muRISCV-NN Scalar RV32GCV Loop+SLP
27976928 ( 2.1x ) 138304 ( 1.003 ) 55516 ( 1.0 ) 128 NHWC muRISCV-NN Vector RV32GCV -
8035106 ( 7.3x ) 138304 ( 1.003 ) 55516 ( 1.0 ) 1024 NHWC muRISCV-NN Vector RV32GCV -

Anomaly Detection (toycar)

Cycles (Speedup) Total ROM (rel.) Total RAM (rel.) VLEN Layout Kernels Mode Arch Auto-Vectorization
3404880 ( 0.6x ) 581362 ( 1.841 ) 5572 ( 1.168 ) 0 NCHW TVM Fallback RV32GC -
3404880 ( 0.6x ) 581362 ( 1.841 ) 5572 ( 1.168 ) 0 NHWC TVM Fallback RV32GC -
2245737 ( 0.8x ) 609080 ( 1.929 ) 6884 ( 1.443 ) 0 NCHW TVM Autotuned RV32GC -
2245737 ( 0.8x ) 609080 ( 1.929 ) 6884 ( 1.443 ) 0 NHWC TVM Autotuned RV32GC -
984693 ( 1.9x ) 581098 ( 1.84 ) 5572 ( 1.168 ) 128 NCHW TVM Fallback RV32GCV Loop+SLP
984695 ( 1.9x ) 581106 ( 1.84 ) 5572 ( 1.168 ) 1024 NCHW TVM Fallback RV32GCV Loop+SLP
984693 ( 1.9x ) 581098 ( 1.84 ) 5572 ( 1.168 ) 128 NHWC TVM Fallback RV32GCV Loop+SLP
984693 ( 1.9x ) 581098 ( 1.84 ) 5572 ( 1.168 ) 1024 NHWC TVM Fallback RV32GCV Loop+SLP
1280619 ( 1.5x ) 600432 ( 1.902 ) 6884 ( 1.443 ) 128 NCHW TVM Autotuned RV32GCV Loop+SLP
1148032 ( 1.6x ) 600432 ( 1.902 ) 6884 ( 1.443 ) 1024 NCHW TVM Autotuned RV32GCV Loop+SLP
1280619 ( 1.5x ) 600432 ( 1.902 ) 6884 ( 1.443 ) 128 NHWC TVM Autotuned RV32GCV Loop+SLP
1148032 ( 1.6x ) 600432 ( 1.902 ) 6884 ( 1.443 ) 1024 NHWC TVM Autotuned RV32GCV Loop+SLP
1893647 ( Base ) 315740 ( Base ) 4772 ( Base ) 0 NHWC muRISCV-NN Scalar RV32GC -
662593 ( 2.9x ) 316396 ( 1.002 ) 4772 ( 1.0 ) 128 NHWC muRISCV-NN Scalar RV32GCV Loop+SLP
430222 ( 4.4x ) 316394 ( 1.002 ) 4772 ( 1.0 ) 1024 NHWC muRISCV-NN Scalar RV32GCV Loop+SLP
639280 ( 3.0x ) 315740 ( 1.0 ) 4772 ( 1.0 ) 128 NHWC muRISCV-NN Vector RV32GCV -
465832 ( 4.1x ) 315740 ( 1.0 ) 4772 ( 1.0 ) 1024 NHWC muRISCV-NN Vector RV32GCV -

Notes

  • For DNNs muRISCV-NN (Vector mode) can often outperform TVM (Fallback/Tuned) + AutoVectorizer

Visual Wake Words (vww)

Cycles (Speedup) Total ROM (rel.) Total RAM (rel.) VLEN Layout Kernels Mode Arch Auto-Vectorization
96665970 ( 0.5x ) 545172 ( 1.685 ) 181032 ( 2.113 ) 0 NCHW TVM Fallback RV32GC -
79940191 ( 0.6x ) 521128 ( 1.611 ) 181032 ( 2.113 ) 0 NHWC TVM Fallback RV32GC -
42404608 ( 1.1x ) 525208 ( 1.623 ) 181032 ( 2.113 ) 0 NCHW TVM Autotuned RV32GC -
79940191 ( 0.6x ) 521130 ( 1.611 ) 181032 ( 2.113 ) 0 NHWC TVM Autotuned RV32GC -
11010120 ( 4.2x ) 532510 ( 1.646 ) 181032 ( 2.113 ) 128 NCHW TVM Fallback RV32GCV Loop+SLP
11010120 ( 4.2x ) 532510 ( 1.646 ) 181032 ( 2.113 ) 1024 NCHW TVM Fallback RV32GCV Loop+SLP
30451803 ( 1.5x ) 523638 ( 1.618 ) 181032 ( 2.113 ) 128 NHWC TVM Fallback RV32GCV Loop+SLP
22700929 ( 2.1x ) 523636 ( 1.618 ) 181032 ( 2.113 ) 1024 NHWC TVM Fallback RV32GCV Loop+SLP
24965516 ( 1.9x ) 550736 ( 1.702 ) 181032 ( 2.113 ) 128 NCHW TVM Autotuned RV32GCV Loop+SLP
19204882 ( 2.4x ) 550718 ( 1.702 ) 181032 ( 2.113 ) 1024 NCHW TVM Autotuned RV32GCV Loop+SLP
30450316 ( 1.5x ) 523746 ( 1.619 ) 181032 ( 2.113 ) 128 NHWC TVM Autotuned RV32GCV Loop+SLP
22698968 ( 2.1x ) 523746 ( 1.619 ) 181032 ( 2.113 ) 1024 NHWC TVM Autotuned RV32GCV Loop+SLP
46765906 ( Base ) 323534 ( Base ) 85664 ( Base ) 0 NHWC muRISCV-NN Scalar RV32GC -
19684202 ( 2.4x ) 327720 ( 1.013 ) 85664 ( 1.0 ) 128 NHWC muRISCV-NN Scalar RV32GCV Loop+SLP
14888471 ( 3.1x ) 327722 ( 1.013 ) 85664 ( 1.0 ) 1024 NHWC muRISCV-NN Scalar RV32GCV Loop+SLP
21141100 ( 2.2x ) 324316 ( 1.002 ) 85664 ( 1.0 ) 128 NHWC muRISCV-NN Vector RV32GCV -
10453920 ( 4.5x ) 324314 ( 1.002 ) 85664 ( 1.0 ) 1024 NHWC muRISCV-NN Vector RV32GCV -

Original data

Click here to download the raw files for this benchmark.