Benchmarks 2024 03 02 TFLM LLVM O3 - tum-ei-eda/muriscv-nn GitHub Wiki

Setup

Simulator

Toolchains

  • LLVM/Clang:
    • TODO: Version
    • Linker: lld (TODO)
    • RISC-V GCC for Headers, libc,...

Models

Package Versions

  • MLonMCU : main

  • TFLM : main

  • Spike : 0bc176b3fca43560b9e8586cdbc41cfde073e17a

  • Spike PK : 7e9b671c0415dfd7b562ac934feb9380075d4aa2

Miscellaneous

  • Used -Os flag for compilation.
  • Benchmarks generated using MLonMCU deployment tool with minimal efforts.
  • Memory metrics are reported in Bytes

Results (Framework: tflm, Backend: tflmi, Toolchain: llvm, Flags: -O3)

Audio Wake Words (aww)

Cycles (Speedup) Total ROM (rel.) Total RAM (rel.) VLEN Kernels Mode Arch Unroll Auto-Vectorization
42613536 ( 0.4x ) 156982 ( 0.892 ) 36148 ( 1.0 ) 0 TFLM Reference RV32GC 0 -
30264099 ( 0.5x ) 173052 ( 0.984 ) 36156 ( 1.0 ) 128 TFLM Reference RV32GCV 0 Loop+SLP
28765301 ( 0.5x ) 173052 ( 0.984 ) 36156 ( 1.0 ) 256 TFLM Reference RV32GCV 0 Loop+SLP
28061037 ( 0.6x ) 173052 ( 0.984 ) 36156 ( 1.0 ) 512 TFLM Reference RV32GCV 0 Loop+SLP
27712294 ( 0.6x ) 173052 ( 0.984 ) 36156 ( 1.0 ) 1024 TFLM Reference RV32GCV 0 Loop+SLP
45607770 ( 0.3x ) 173052 ( 0.984 ) 36156 ( 1.0 ) 2048 TFLM Reference RV32GCV 0 Loop+SLP
45617937 ( 0.3x ) 173052 ( 0.984 ) 36156 ( 1.0 ) 4096 TFLM Reference RV32GCV 0 Loop+SLP
15645936 ( Base ) 175930 ( Base ) 36148 ( Base ) 0 muRISCV-NN Scalar RV32GC 0 -
5422098 ( 2.9x ) 194876 ( 1.108 ) 36156 ( 1.0 ) 128 muRISCV-NN Scalar RV32GCV 0 Loop+SLP
4967759 ( 3.1x ) 194876 ( 1.108 ) 36156 ( 1.0 ) 256 muRISCV-NN Scalar RV32GCV 0 Loop+SLP
4630487 ( 3.4x ) 194876 ( 1.108 ) 36156 ( 1.0 ) 512 muRISCV-NN Scalar RV32GCV 0 Loop+SLP
5411336 ( 2.9x ) 194876 ( 1.108 ) 36156 ( 1.0 ) 1024 muRISCV-NN Scalar RV32GCV 0 Loop+SLP
14394157 ( 1.1x ) 194876 ( 1.108 ) 36156 ( 1.0 ) 2048 muRISCV-NN Scalar RV32GCV 0 Loop+SLP
14400935 ( 1.1x ) 194876 ( 1.108 ) 36156 ( 1.0 ) 4096 muRISCV-NN Scalar RV32GCV 0 Loop+SLP
4139348 ( 3.8x ) 177552 ( 1.009 ) 36148 ( 1.0 ) 128 muRISCV-NN Vector RV32GCV 0 -
2878642 ( 5.4x ) 177552 ( 1.009 ) 36148 ( 1.0 ) 256 muRISCV-NN Vector RV32GCV 0 -
2198332 ( 7.1x ) 177552 ( 1.009 ) 36148 ( 1.0 ) 512 muRISCV-NN Vector RV32GCV 0 -
2155848 ( 7.3x ) 177552 ( 1.009 ) 36148 ( 1.0 ) 1024 muRISCV-NN Vector RV32GCV 0 -
2159237 ( 7.2x ) 177552 ( 1.009 ) 36148 ( 1.0 ) 2048 muRISCV-NN Vector RV32GCV 0 -
2162626 ( 7.2x ) 177552 ( 1.009 ) 36148 ( 1.0 ) 4096 muRISCV-NN Vector RV32GCV 0 -

Image Classification (resnet)

Cycles (Speedup) Total ROM (rel.) Total RAM (rel.) VLEN Kernels Mode Arch Unroll Auto-Vectorization
132471988 ( 0.4x ) 197152 ( 0.942 ) 68916 ( 1.0 ) 0 TFLM Reference RV32GC 0 -
52716169 ( 1.1x ) 215044 ( 1.027 ) 68924 ( 1.0 ) 128 TFLM Reference RV32GCV 0 Loop+SLP
45054716 ( 1.3x ) 215044 ( 1.027 ) 68924 ( 1.0 ) 256 TFLM Reference RV32GCV 0 Loop+SLP
89137984 ( 0.7x ) 215044 ( 1.027 ) 68924 ( 1.0 ) 512 TFLM Reference RV32GCV 0 Loop+SLP
117435471 ( 0.5x ) 215044 ( 1.027 ) 68924 ( 1.0 ) 1024 TFLM Reference RV32GCV 0 Loop+SLP
134757285 ( 0.4x ) 215044 ( 1.027 ) 68924 ( 1.0 ) 2048 TFLM Reference RV32GCV 0 Loop+SLP
134767452 ( 0.4x ) 215044 ( 1.027 ) 68924 ( 1.0 ) 4096 TFLM Reference RV32GCV 0 Loop+SLP
58314324 ( Base ) 209358 ( Base ) 68912 ( Base ) 0 muRISCV-NN Scalar RV32GC 0 -
11909943 ( 4.9x ) 231122 ( 1.104 ) 68920 ( 1.0 ) 128 muRISCV-NN Scalar RV32GCV 0 Loop+SLP
11569873 ( 5.0x ) 231122 ( 1.104 ) 68920 ( 1.0 ) 256 muRISCV-NN Scalar RV32GCV 0 Loop+SLP
14592770 ( 4.0x ) 231122 ( 1.104 ) 68920 ( 1.0 ) 512 muRISCV-NN Scalar RV32GCV 0 Loop+SLP
16161693 ( 3.6x ) 231122 ( 1.104 ) 68920 ( 1.0 ) 1024 muRISCV-NN Scalar RV32GCV 0 Loop+SLP
16865394 ( 3.5x ) 231122 ( 1.104 ) 68920 ( 1.0 ) 2048 muRISCV-NN Scalar RV32GCV 0 Loop+SLP
36759082 ( 1.6x ) 231122 ( 1.104 ) 68920 ( 1.0 ) 4096 muRISCV-NN Scalar RV32GCV 0 Loop+SLP
15298076 ( 3.8x ) 211228 ( 1.009 ) 68912 ( 1.0 ) 128 muRISCV-NN Vector RV32GCV 0 -
9713088 ( 6.0x ) 211228 ( 1.009 ) 68912 ( 1.0 ) 256 muRISCV-NN Vector RV32GCV 0 -
7166354 ( 8.1x ) 211228 ( 1.009 ) 68912 ( 1.0 ) 512 muRISCV-NN Vector RV32GCV 0 -
5923778 ( 9.8x ) 211228 ( 1.009 ) 68912 ( 1.0 ) 1024 muRISCV-NN Vector RV32GCV 0 -
4998679 ( 11.7x ) 211228 ( 1.009 ) 68912 ( 1.0 ) 2048 muRISCV-NN Vector RV32GCV 0 -
4752864 ( 12.3x ) 211228 ( 1.009 ) 68912 ( 1.0 ) 4096 muRISCV-NN Vector RV32GCV 0 -

Anomaly Detection (toycar)

Cycles (Speedup) Total ROM (rel.) Total RAM (rel.) VLEN Kernels Mode Arch Unroll Auto-Vectorization
3071225 ( 0.6x ) 342862 ( 0.987 ) 19384 ( 1.0 ) 0 TFLM Reference RV32GC 0 -
816342 ( 2.1x ) 346914 ( 0.999 ) 19384 ( 1.0 ) 128 TFLM Reference RV32GCV 0 Loop+SLP
643478 ( 2.6x ) 346914 ( 0.999 ) 19384 ( 1.0 ) 256 TFLM Reference RV32GCV 0 Loop+SLP
553014 ( 3.1x ) 346914 ( 0.999 ) 19384 ( 1.0 ) 512 TFLM Reference RV32GCV 0 Loop+SLP
507782 ( 3.3x ) 346914 ( 0.999 ) 19384 ( 1.0 ) 1024 TFLM Reference RV32GCV 0 Loop+SLP
485166 ( 3.5x ) 346914 ( 0.999 ) 19384 ( 1.0 ) 2048 TFLM Reference RV32GCV 0 Loop+SLP
2433574 ( 0.7x ) 346914 ( 0.999 ) 19384 ( 1.0 ) 4096 TFLM Reference RV32GCV 0 Loop+SLP
1693239 ( Base ) 347344 ( Base ) 19384 ( Base ) 0 muRISCV-NN Scalar RV32GC 0 -
542096 ( 3.1x ) 351986 ( 1.013 ) 19384 ( 1.0 ) 128 muRISCV-NN Scalar RV32GCV 0 Loop+SLP
455420 ( 3.7x ) 351986 ( 1.013 ) 19384 ( 1.0 ) 256 muRISCV-NN Scalar RV32GCV 0 Loop+SLP
410204 ( 4.1x ) 351986 ( 1.013 ) 19384 ( 1.0 ) 512 muRISCV-NN Scalar RV32GCV 0 Loop+SLP
387596 ( 4.4x ) 351986 ( 1.013 ) 19384 ( 1.0 ) 1024 muRISCV-NN Scalar RV32GCV 0 Loop+SLP
376292 ( 4.5x ) 351986 ( 1.013 ) 19384 ( 1.0 ) 2048 muRISCV-NN Scalar RV32GCV 0 Loop+SLP
1358600 ( 1.2x ) 351986 ( 1.013 ) 19384 ( 1.0 ) 4096 muRISCV-NN Scalar RV32GCV 0 Loop+SLP
586965 ( 2.9x ) 347688 ( 1.001 ) 19384 ( 1.0 ) 128 muRISCV-NN Vector RV32GCV 0 -
470285 ( 3.6x ) 347688 ( 1.001 ) 19384 ( 1.0 ) 256 muRISCV-NN Vector RV32GCV 0 -
411945 ( 4.1x ) 347688 ( 1.001 ) 19384 ( 1.0 ) 512 muRISCV-NN Vector RV32GCV 0 -
383099 ( 4.4x ) 347688 ( 1.001 ) 19384 ( 1.0 ) 1024 muRISCV-NN Vector RV32GCV 0 -
379468 ( 4.5x ) 347688 ( 1.001 ) 19384 ( 1.0 ) 2048 muRISCV-NN Vector RV32GCV 0 -
377612 ( 4.5x ) 347688 ( 1.001 ) 19384 ( 1.0 ) 4096 muRISCV-NN Vector RV32GCV 0 -

Visual Wake Words (vww)

Cycles (Speedup) Total ROM (rel.) Total RAM (rel.) VLEN Kernels Mode Arch Unroll Auto-Vectorization
112276178 ( 0.4x ) 430466 ( 0.958 ) 134476 ( 1.0 ) 0 TFLM Reference RV32GC 0 -
72688978 ( 0.6x ) 446742 ( 0.994 ) 134484 ( 1.0 ) 128 TFLM Reference RV32GCV 0 Loop+SLP
70412718 ( 0.7x ) 446742 ( 0.994 ) 134484 ( 1.0 ) 256 TFLM Reference RV32GCV 0 Loop+SLP
70827262 ( 0.7x ) 446742 ( 0.994 ) 134484 ( 1.0 ) 512 TFLM Reference RV32GCV 0 Loop+SLP
77512163 ( 0.6x ) 446742 ( 0.994 ) 134484 ( 1.0 ) 1024 TFLM Reference RV32GCV 0 Loop+SLP
84910263 ( 0.6x ) 446742 ( 0.994 ) 134484 ( 1.0 ) 2048 TFLM Reference RV32GCV 0 Loop+SLP
113660504 ( 0.4x ) 446742 ( 0.994 ) 134484 ( 1.0 ) 4096 TFLM Reference RV32GCV 0 Loop+SLP
46914513 ( Base ) 449414 ( Base ) 134476 ( Base ) 0 muRISCV-NN Scalar RV32GC 0 -
17690680 ( 2.7x ) 468566 ( 1.043 ) 134484 ( 1.0 ) 128 muRISCV-NN Scalar RV32GCV 0 Loop+SLP
17234514 ( 2.7x ) 468566 ( 1.043 ) 134484 ( 1.0 ) 256 muRISCV-NN Scalar RV32GCV 0 Loop+SLP
18486594 ( 2.5x ) 468566 ( 1.043 ) 134484 ( 1.0 ) 512 muRISCV-NN Scalar RV32GCV 0 Loop+SLP
21830631 ( 2.1x ) 468566 ( 1.043 ) 134484 ( 1.0 ) 1024 muRISCV-NN Scalar RV32GCV 0 Loop+SLP
25570168 ( 1.8x ) 468566 ( 1.043 ) 134484 ( 1.0 ) 2048 muRISCV-NN Scalar RV32GCV 0 Loop+SLP
40402396 ( 1.2x ) 468566 ( 1.043 ) 134484 ( 1.0 ) 4096 muRISCV-NN Scalar RV32GCV 0 Loop+SLP
13541089 ( 3.5x ) 451036 ( 1.004 ) 134476 ( 1.0 ) 128 muRISCV-NN Vector RV32GCV 0 -
10208473 ( 4.6x ) 451036 ( 1.004 ) 134476 ( 1.0 ) 256 muRISCV-NN Vector RV32GCV 0 -
8912101 ( 5.3x ) 451036 ( 1.004 ) 134476 ( 1.0 ) 512 muRISCV-NN Vector RV32GCV 0 -
8406591 ( 5.6x ) 451036 ( 1.004 ) 134476 ( 1.0 ) 1024 muRISCV-NN Vector RV32GCV 0 -
8361593 ( 5.6x ) 451036 ( 1.004 ) 134476 ( 1.0 ) 2048 muRISCV-NN Vector RV32GCV 0 -
8364982 ( 5.6x ) 451036 ( 1.004 ) 134476 ( 1.0 ) 4096 muRISCV-NN Vector RV32GCV 0 -

Original data

Click here to download the raw files for this benchmark.