Benchmarks CUSTOM TFLM LLVM Os - tum-ei-eda/muriscv-nn GitHub Wiki

Setup

Simulator

Toolchains

  • LLVM/Clang:
    • TODO: Version
    • Linker: lld (TODO)
    • RISC-V GCC for Headers, libc,...

Models

Package Versions

  • MLonMCU : main

  • TFLM : main

  • Spike : 0bc176b3fca43560b9e8586cdbc41cfde073e17a

  • Spike PK : 7e9b671c0415dfd7b562ac934feb9380075d4aa2

Miscellaneous

  • Used -Os flag for compilation.
  • Benchmarks generated using MLonMCU deployment tool with minimal efforts.
  • Memory metrics are reported in Bytes

Results (Framework: tflm, Backend: tflmi, Toolchain: llvm, Flags: -Os)

Audio Wake Words (aww)

Cycles (Speedup) Total ROM (rel.) Total RAM (rel.) VLEN Kernels Mode Arch Unroll Auto-Vectorization
39299313 ( 0.4x ) 147040 ( 0.869 ) 36124 ( 1.0 ) 128 TFLM Reference RV32GC 0 -
33496503 ( 0.5x ) 152732 ( 0.903 ) 36132 ( 1.0 ) 128 TFLM Reference RV32GCV 0 Loop+SLP
31703831 ( 0.5x ) 152732 ( 0.903 ) 36132 ( 1.0 ) 256 TFLM Reference RV32GCV 0 Loop+SLP
30807495 ( 0.5x ) 152732 ( 0.903 ) 36132 ( 1.0 ) 512 TFLM Reference RV32GCV 0 Loop+SLP
30359327 ( 0.5x ) 152732 ( 0.903 ) 36132 ( 1.0 ) 1024 TFLM Reference RV32GCV 0 Loop+SLP
30362716 ( 0.5x ) 152732 ( 0.903 ) 36132 ( 1.0 ) 2048 TFLM Reference RV32GCV 0 Loop+SLP
30362716 ( 0.5x ) 152732 ( 0.903 ) 36132 ( 1.0 ) 4096 TFLM Reference RV32GCV 0 Loop+SLP
15089987 ( Base ) 169180 ( Base ) 36124 ( Base ) 128 muRISCV-NN Scalar RV32GC 0 -
6142871 ( 2.5x ) 179372 ( 1.06 ) 36132 ( 1.0 ) 128 muRISCV-NN Scalar RV32GCV 0 Loop+SLP
5140180 ( 2.9x ) 179372 ( 1.06 ) 36132 ( 1.0 ) 256 muRISCV-NN Scalar RV32GCV 0 Loop+SLP
4638932 ( 3.3x ) 179372 ( 1.06 ) 36132 ( 1.0 ) 512 muRISCV-NN Scalar RV32GCV 0 Loop+SLP
4351601 ( 3.5x ) 179372 ( 1.06 ) 36132 ( 1.0 ) 1024 muRISCV-NN Scalar RV32GCV 0 Loop+SLP
4351601 ( 3.5x ) 179372 ( 1.06 ) 36132 ( 1.0 ) 2048 muRISCV-NN Scalar RV32GCV 0 Loop+SLP
4358379 ( 3.5x ) 179372 ( 1.06 ) 36132 ( 1.0 ) 4096 muRISCV-NN Scalar RV32GCV 0 Loop+SLP
4078846 ( 3.7x ) 169536 ( 1.002 ) 36124 ( 1.0 ) 128 muRISCV-NN Vector RV32GCV 0 -
2823868 ( 5.3x ) 169536 ( 1.002 ) 36124 ( 1.0 ) 256 muRISCV-NN Vector RV32GCV 0 -
2146550 ( 7.0x ) 169536 ( 1.002 ) 36124 ( 1.0 ) 512 muRISCV-NN Vector RV32GCV 0 -
2105562 ( 7.2x ) 169536 ( 1.002 ) 36124 ( 1.0 ) 1024 muRISCV-NN Vector RV32GCV 0 -
2108951 ( 7.2x ) 169536 ( 1.002 ) 36124 ( 1.0 ) 2048 muRISCV-NN Vector RV32GCV 0 -
2112340 ( 7.1x ) 169536 ( 1.002 ) 36124 ( 1.0 ) 4096 muRISCV-NN Vector RV32GCV 0 -

Image Classification (resnet)

Cycles (Speedup) Total ROM (rel.) Total RAM (rel.) VLEN Kernels Mode Arch Unroll Auto-Vectorization
121053737 ( 0.5x ) 188840 ( 0.929 ) 68892 ( 1.0 ) 128 TFLM Reference RV32GC 0 -
56889926 ( 1.0x ) 194476 ( 0.957 ) 68900 ( 1.0 ) 128 TFLM Reference RV32GCV 0 Loop+SLP
47133270 ( 1.2x ) 194476 ( 0.957 ) 68900 ( 1.0 ) 256 TFLM Reference RV32GCV 0 Loop+SLP
44786366 ( 1.3x ) 194476 ( 0.957 ) 68900 ( 1.0 ) 512 TFLM Reference RV32GCV 0 Loop+SLP
44352562 ( 1.3x ) 194476 ( 0.957 ) 68900 ( 1.0 ) 1024 TFLM Reference RV32GCV 0 Loop+SLP
44355951 ( 1.3x ) 194476 ( 0.957 ) 68900 ( 1.0 ) 2048 TFLM Reference RV32GCV 0 Loop+SLP
44355951 ( 1.3x ) 194476 ( 0.957 ) 68900 ( 1.0 ) 4096 TFLM Reference RV32GCV 0 Loop+SLP
56368693 ( Base ) 203198 ( Base ) 68888 ( Base ) 128 muRISCV-NN Scalar RV32GC 0 -
26157077 ( 2.2x ) 213618 ( 1.051 ) 68896 ( 1.0 ) 128 muRISCV-NN Scalar RV32GCV 0 Loop+SLP
18292290 ( 3.1x ) 213618 ( 1.051 ) 68896 ( 1.0 ) 256 muRISCV-NN Scalar RV32GCV 0 Loop+SLP
14605754 ( 3.9x ) 213618 ( 1.051 ) 68896 ( 1.0 ) 512 muRISCV-NN Scalar RV32GCV 0 Loop+SLP
12929715 ( 4.4x ) 213618 ( 1.051 ) 68896 ( 1.0 ) 1024 muRISCV-NN Scalar RV32GCV 0 Loop+SLP
12110515 ( 4.7x ) 213618 ( 1.051 ) 68896 ( 1.0 ) 2048 muRISCV-NN Scalar RV32GCV 0 Loop+SLP
11502893 ( 4.9x ) 213618 ( 1.051 ) 68896 ( 1.0 ) 4096 muRISCV-NN Scalar RV32GCV 0 Loop+SLP
15254126 ( 3.7x ) 204270 ( 1.005 ) 68888 ( 1.0 ) 128 muRISCV-NN Vector RV32GCV 0 -
9670930 ( 5.8x ) 204270 ( 1.005 ) 68888 ( 1.0 ) 256 muRISCV-NN Vector RV32GCV 0 -
7125092 ( 7.9x ) 204270 ( 1.005 ) 68888 ( 1.0 ) 512 muRISCV-NN Vector RV32GCV 0 -
5882964 ( 9.6x ) 204270 ( 1.005 ) 68888 ( 1.0 ) 1024 muRISCV-NN Vector RV32GCV 0 -
4958089 ( 11.4x ) 204270 ( 1.005 ) 68888 ( 1.0 ) 2048 muRISCV-NN Vector RV32GCV 0 -
4712386 ( 12.0x ) 204270 ( 1.005 ) 68888 ( 1.0 ) 4096 muRISCV-NN Vector RV32GCV 0 -

Anomaly Detection (toycar)

Cycles (Speedup) Total ROM (rel.) Total RAM (rel.) VLEN Kernels Mode Arch Unroll Auto-Vectorization
2804288 ( 0.6x ) 342396 ( 0.988 ) 19376 ( 1.0 ) 128 TFLM Reference RV32GC 0 -
898428 ( 1.9x ) 344096 ( 0.993 ) 19376 ( 1.0 ) 128 TFLM Reference RV32GCV 0 Loop+SLP
668156 ( 2.5x ) 344096 ( 0.993 ) 19376 ( 1.0 ) 256 TFLM Reference RV32GCV 0 Loop+SLP
553020 ( 3.1x ) 344096 ( 0.993 ) 19376 ( 1.0 ) 512 TFLM Reference RV32GCV 0 Loop+SLP
495452 ( 3.4x ) 344096 ( 0.993 ) 19376 ( 1.0 ) 1024 TFLM Reference RV32GCV 0 Loop+SLP
466668 ( 3.6x ) 344096 ( 0.993 ) 19376 ( 1.0 ) 2048 TFLM Reference RV32GCV 0 Loop+SLP
463084 ( 3.7x ) 344096 ( 0.993 ) 19376 ( 1.0 ) 4096 TFLM Reference RV32GCV 0 Loop+SLP
1693692 ( Base ) 346606 ( Base ) 19376 ( Base ) 128 muRISCV-NN Scalar RV32GC 0 -
598024 ( 2.8x ) 349842 ( 1.009 ) 19376 ( 1.0 ) 128 muRISCV-NN Scalar RV32GCV 0 Loop+SLP
484104 ( 3.5x ) 349842 ( 1.009 ) 19376 ( 1.0 ) 256 muRISCV-NN Scalar RV32GCV 0 Loop+SLP
427144 ( 4.0x ) 349842 ( 1.009 ) 19376 ( 1.0 ) 512 muRISCV-NN Scalar RV32GCV 0 Loop+SLP
398664 ( 4.2x ) 349842 ( 1.009 ) 19376 ( 1.0 ) 1024 muRISCV-NN Scalar RV32GCV 0 Loop+SLP
384424 ( 4.4x ) 349842 ( 1.009 ) 19376 ( 1.0 ) 2048 muRISCV-NN Scalar RV32GCV 0 Loop+SLP
382640 ( 4.4x ) 349842 ( 1.009 ) 19376 ( 1.0 ) 4096 muRISCV-NN Scalar RV32GCV 0 Loop+SLP
595027 ( 2.8x ) 347268 ( 1.002 ) 19376 ( 1.0 ) 128 muRISCV-NN Vector RV32GCV 0 -
478347 ( 3.5x ) 347268 ( 1.002 ) 19376 ( 1.0 ) 256 muRISCV-NN Vector RV32GCV 0 -
420007 ( 4.0x ) 347268 ( 1.002 ) 19376 ( 1.0 ) 512 muRISCV-NN Vector RV32GCV 0 -
391161 ( 4.3x ) 347268 ( 1.002 ) 19376 ( 1.0 ) 1024 muRISCV-NN Vector RV32GCV 0 -
387530 ( 4.4x ) 347268 ( 1.002 ) 19376 ( 1.0 ) 2048 muRISCV-NN Vector RV32GCV 0 -
385674 ( 4.4x ) 347268 ( 1.002 ) 19376 ( 1.0 ) 4096 muRISCV-NN Vector RV32GCV 0 -

Visual Wake Words (vww)

Cycles (Speedup) Total ROM (rel.) Total RAM (rel.) VLEN Kernels Mode Arch Unroll Auto-Vectorization
103598496 ( 0.4x ) 420802 ( 0.95 ) 134452 ( 1.0 ) 128 TFLM Reference RV32GC 0 -
72373406 ( 0.6x ) 426442 ( 0.963 ) 134460 ( 1.0 ) 128 TFLM Reference RV32GCV 0 Loop+SLP
67211998 ( 0.7x ) 426442 ( 0.963 ) 134460 ( 1.0 ) 256 TFLM Reference RV32GCV 0 Loop+SLP
64760318 ( 0.7x ) 426442 ( 0.963 ) 134460 ( 1.0 ) 512 TFLM Reference RV32GCV 0 Loop+SLP
63728014 ( 0.7x ) 426442 ( 0.963 ) 134460 ( 1.0 ) 1024 TFLM Reference RV32GCV 0 Loop+SLP
63312019 ( 0.7x ) 426442 ( 0.963 ) 134460 ( 1.0 ) 2048 TFLM Reference RV32GCV 0 Loop+SLP
63279735 ( 0.7x ) 426442 ( 0.963 ) 134460 ( 1.0 ) 4096 TFLM Reference RV32GCV 0 Loop+SLP
45201423 ( Base ) 442944 ( Base ) 134452 ( Base ) 128 muRISCV-NN Scalar RV32GC 0 -
19579538 ( 2.3x ) 453082 ( 1.023 ) 134460 ( 1.0 ) 128 muRISCV-NN Scalar RV32GCV 0 Loop+SLP
16713138 ( 2.7x ) 453082 ( 1.023 ) 134460 ( 1.0 ) 256 muRISCV-NN Scalar RV32GCV 0 Loop+SLP
15339954 ( 2.9x ) 453082 ( 1.023 ) 134460 ( 1.0 ) 512 muRISCV-NN Scalar RV32GCV 0 Loop+SLP
14839103 ( 3.0x ) 453082 ( 1.023 ) 134460 ( 1.0 ) 1024 muRISCV-NN Scalar RV32GCV 0 Loop+SLP
14632583 ( 3.1x ) 453082 ( 1.023 ) 134460 ( 1.0 ) 2048 muRISCV-NN Scalar RV32GCV 0 Loop+SLP
14621925 ( 3.1x ) 453082 ( 1.023 ) 134460 ( 1.0 ) 4096 muRISCV-NN Scalar RV32GCV 0 Loop+SLP
13374433 ( 3.4x ) 443298 ( 1.001 ) 134452 ( 1.0 ) 128 muRISCV-NN Vector RV32GCV 0 -
10053433 ( 4.5x ) 443298 ( 1.001 ) 134452 ( 1.0 ) 256 muRISCV-NN Vector RV32GCV 0 -
8762005 ( 5.2x ) 443298 ( 1.001 ) 134452 ( 1.0 ) 512 muRISCV-NN Vector RV32GCV 0 -
8258015 ( 5.5x ) 443298 ( 1.001 ) 134452 ( 1.0 ) 1024 muRISCV-NN Vector RV32GCV 0 -
8213635 ( 5.5x ) 443298 ( 1.001 ) 134452 ( 1.0 ) 2048 muRISCV-NN Vector RV32GCV 0 -
8217024 ( 5.5x ) 443298 ( 1.001 ) 134452 ( 1.0 ) 4096 muRISCV-NN Vector RV32GCV 0 -

Original data

Click here to download the raw files for this benchmark.