Benchmarks 2024 02 11 TFLM GCC - tum-ei-eda/muriscv-nn GitHub Wiki

Setup

Simulator

Toolchains

Models

Package Versions

  • MLonMCU : main

  • TFLM : a549448bb234cf3fed15ad5dabf83d06f82326ce

  • Spike : 0bc176b3fca43560b9e8586cdbc41cfde073e17a

  • Spike PK : 7e9b671c0415dfd7b562ac934feb9380075d4aa2

Miscellaneous

  • Used -Os flag for compilation.
  • Benchmarks generated using MLonMCU deployment tool with minimal efforts.
  • Memory metrics are reported in Bytes

Results (Framework: tflm, Backend: tflmi, Toolchain: gcc)

Audio Wake Words (aww)

Cycles (Speedup) Total ROM (rel.) Total RAM (rel.) VLEN Kernels Mode Arch Auto-Vectorization
174698646 ( 0.1x ) 132407 ( 0.878 ) 36204 ( 1.0 ) 0 TFLM Reference RV32GC -
174698646 ( 0.1x ) 132413 ( 0.878 ) 36204 ( 1.0 ) 128 TFLM Reference RV32GCV Loop+SLP
174698646 ( 0.1x ) 132413 ( 0.878 ) 36204 ( 1.0 ) 1024 TFLM Reference RV32GCV Loop+SLP
157549999 ( 0.1x ) 144774 ( 0.96 ) 36148 ( 0.998 ) 0 TFLM Reference RV32GCP -
16644695 ( Base ) 150798 ( Base ) 36212 ( Base ) 0 muRISCV-NN Scalar RV32GC -
16644695 ( 1.0x ) 150804 ( 1.0 ) 36212 ( 1.0 ) 128 muRISCV-NN Scalar RV32GCV Loop+SLP
16644695 ( 1.0x ) 150804 ( 1.0 ) 36212 ( 1.0 ) 1024 muRISCV-NN Scalar RV32GCV Loop+SLP
7002129 ( 2.4x ) 151322 ( 1.003 ) 36212 ( 1.0 ) 128 muRISCV-NN Vector RV32GCV -
2571441 ( 6.5x ) 151322 ( 1.003 ) 36212 ( 1.0 ) 1024 muRISCV-NN Vector RV32GCV -
13498251 ( 1.2x ) 162366 ( 1.077 ) 36156 ( 0.998 ) 0 muRISCV-NN Scalar RV32GCP -
15939667 ( 1.0x ) 164764 ( 1.093 ) 36156 ( 0.998 ) 0 muRISCV-NN Packed RV32GCP -

Notes

  • TFLM Reference kernels perform extremely bad with GCC (LLVM is more than 2-4x faster here)
  • AutoVectorization disabled in GCC? (Check MLonMCU config!)
  • Packed muRISC-V kernels have little speedup (1-1.2x) for CNNs (Even Scalar + RVP GCC performs better..., DNNs are fine)

Image Classification (resnet)

Cycles (Speedup) Total ROM (rel.) Total RAM (rel.) VLEN Kernels Mode Arch Auto-Vectorization
745801113 ( 0.1x ) 172997 ( 0.934 ) 68968 ( 1.0 ) 0 TFLM Reference RV32GC -
745801113 ( 0.1x ) 173013 ( 0.934 ) 68968 ( 1.0 ) 128 TFLM Reference RV32GCV Loop+SLP
745801113 ( 0.1x ) 173013 ( 0.934 ) 68968 ( 1.0 ) 1024 TFLM Reference RV32GCV Loop+SLP
697912970 ( 0.1x ) 185266 ( 1.0 ) 68912 ( 0.999 ) 0 TFLM Reference RV32GCP -
80995160 ( Base ) 185298 ( Base ) 68960 ( Base ) 0 muRISCV-NN Scalar RV32GC -
80995160 ( 1.0x ) 185330 ( 1.0 ) 68960 ( 1.0 ) 128 muRISCV-NN Scalar RV32GCV Loop+SLP
80995160 ( 1.0x ) 185330 ( 1.0 ) 68960 ( 1.0 ) 1024 muRISCV-NN Scalar RV32GCV Loop+SLP
29960342 ( 2.7x ) 186654 ( 1.007 ) 68960 ( 1.0 ) 128 muRISCV-NN Vector RV32GCV -
8347502 ( 9.7x ) 186654 ( 1.007 ) 68960 ( 1.0 ) 1024 muRISCV-NN Vector RV32GCV -
62976509 ( 1.3x ) 196964 ( 1.063 ) 68904 ( 0.999 ) 0 muRISCV-NN Scalar RV32GCP -
68431268 ( 1.2x ) 199958 ( 1.079 ) 68904 ( 0.999 ) 0 muRISCV-NN Packed RV32GCP -

Anomaly Detection (toycar)

Cycles (Speedup) Total ROM (rel.) Total RAM (rel.) VLEN Kernels Mode Arch Auto-Vectorization
3094956 ( 0.6x ) 333908 ( 0.989 ) 19432 ( 1.0 ) 0 TFLM Reference RV32GC -
3094956 ( 0.6x ) 333914 ( 0.989 ) 19432 ( 1.0 ) 128 TFLM Reference RV32GCV Loop+SLP
3094956 ( 0.6x ) 333914 ( 0.989 ) 19432 ( 1.0 ) 1024 TFLM Reference RV32GCV Loop+SLP
3097895 ( 0.6x ) 346264 ( 1.026 ) 19380 ( 0.997 ) 0 TFLM Reference RV32GCP -
1969300 ( Base ) 337546 ( Base ) 19432 ( Base ) 0 muRISCV-NN Scalar RV32GC -
1969300 ( 1.0x ) 337552 ( 1.0 ) 19432 ( 1.0 ) 128 muRISCV-NN Scalar RV32GCV Loop+SLP
1969300 ( 1.0x ) 337552 ( 1.0 ) 19432 ( 1.0 ) 1024 muRISCV-NN Scalar RV32GCV Loop+SLP
608151 ( 3.2x ) 337800 ( 1.001 ) 19432 ( 1.0 ) 128 muRISCV-NN Vector RV32GCV -
428279 ( 4.6x ) 337800 ( 1.001 ) 19432 ( 1.0 ) 1024 muRISCV-NN Vector RV32GCV -
1873278 ( 1.1x ) 349708 ( 1.036 ) 19380 ( 0.997 ) 0 muRISCV-NN Scalar RV32GCP -
942627 ( 2.1x ) 351426 ( 1.041 ) 19380 ( 0.997 ) 0 muRISCV-NN Packed RV32GCP -

Visual Wake Words (vww)

Cycles (Speedup) Total ROM (rel.) Total RAM (rel.) VLEN Kernels Mode Arch Auto-Vectorization
495266208 ( 0.1x ) 406111 ( 0.957 ) 134520 ( 1.0 ) 0 TFLM Reference RV32GC -
495266207 ( 0.1x ) 406117 ( 0.957 ) 134520 ( 1.0 ) 128 TFLM Reference RV32GCV Loop+SLP
495266207 ( 0.1x ) 406117 ( 0.957 ) 134520 ( 1.0 ) 1024 TFLM Reference RV32GCV Loop+SLP
445892345 ( 0.1x ) 418478 ( 0.986 ) 134464 ( 1.0 ) 0 TFLM Reference RV32GCP -
49676633 ( Base ) 424502 ( Base ) 134528 ( Base ) 0 muRISCV-NN Scalar RV32GC -
49676633 ( 1.0x ) 424508 ( 1.0 ) 134528 ( 1.0 ) 128 muRISCV-NN Scalar RV32GCV Loop+SLP
49676633 ( 1.0x ) 424508 ( 1.0 ) 134528 ( 1.0 ) 1024 muRISCV-NN Scalar RV32GCV Loop+SLP
21932927 ( 2.3x ) 425026 ( 1.001 ) 134528 ( 1.0 ) 128 muRISCV-NN Vector RV32GCV -
10503478 ( 4.7x ) 425026 ( 1.001 ) 134528 ( 1.0 ) 1024 muRISCV-NN Vector RV32GCV -
40750242 ( 1.2x ) 436070 ( 1.027 ) 134472 ( 1.0 ) 0 muRISCV-NN Scalar RV32GCP -
49184061 ( 1.0x ) 438468 ( 1.033 ) 134472 ( 1.0 ) 0 muRISCV-NN Packed RV32GCP -

Original data

Click here to download the raw files for this benchmark.