Embedded Machine Learning Frameworks - aifoundry-org/erbium GitHub Wiki

This page attempts to capture thoughts around various frameworks and applicability (or not) to Erbium in a low-memory/low-core embedded environment.

For a discussion of RTOS (such as Zephyr, Apache NuttX, FreeRTOS) suitable for Erbium, please see Embedded RTOS.

Other readings:

  • EdgeAI Foundation (formerly known as tinyML Foundation)
  • AI at the edge - a curated list of hardware, software, frameworks and other resources for Artificial Intelligence at the edge.

Framework stack layer comparison

Stack Layer emlearn LiteRT for Microcontrollers ExecuTorch
HTTP API Server - - -
High-Level LLM Interface - - -
Batching & Scheduling - - -
Model Definitions - (from Tensorflow) (from Torch)
Python Tensor API YES (to C99) (from Tensorflow) (from Torch)
Execution Engine C99 C++17 C++
Compute Kernels Portable C99 C++17 C++
Hardware Targets Microcontrollers, embedded ESP32, Coral NPU, etc. Microcontrollers, ARM Ethos-U

emlearn

At the extreme low-end, there's an emlearn framework that allows you to "train in Python - inference in Pure C". They have an interesting integration with Zephyr RTOS and integration with micropython. At FOSDEM 2025, we had a great talk about it from the lead author - Jon Nordby.

From FOSDEM 2025 talk, emlearn supports:

  • Decision Trees (DT)
  • Random Forest (RF)
  • K Nearest Neighbors (KNN)
  • Gaussian Mixture Models (GMM)
  • Multi-Layer-Perceptron (MLP)
  • Convolutional Neural Network (CNN)

emlearn-micropython support targets:

  • Implements a subset of Python 3.x
  • For devices with 16 kB+ RAM
  • Supports 8+ microcontroller families

Example run with Zephyr on arty_a7

Running on a litex_vexriscv / Arty a7-100t:

$ litex_term /dev/ttyUSB1 --speed 115200 --kernel build/zephyr/zephyr.bin
...
*** Booting Zephyr OS build v4.3.0-rc2-127-gf2428c68ff2f ***
xor(0,0) = 0
xor(255,0) = 1
xor(0,255) = 1
xor(255,255) = 0

Framework capabilities and limitations

  • CNN support is implemented through TinyMaix library and very limited.
  • Also it appears that even existing implementations of CNN are experiencing some level of degradation on Cifar-10 dataset. Which begs the question about a bit more advanced tasks.
  • No support for quantization ISSUE for neural nets, which is technically essential.
  • Basic areas of application
    • Classic ML approaches for regression/classification
    • Tiny MLP inference
    • Limited CV-related classification
    • Signal processing

LiteRT for Microcontrollers

Tensorflow / LiteRT framing:

  • Tensorflow - runs on GPUs/devices - full featured
  • LiteRT (Tensorflow Lite) - runs on Android / iOS / etc - capable of running a base OS but constrained memory, CPU, GPU, etc.
  • LiteRT for Microcontrollers (Tensorflow Micro) - runs on embedded devices / RTOS (eg, Zephyr)
  • LiteRT-LM - C++ library to efficiently run language models across edge platforms.

From LiteRT Overview:

LiteRT (short for Lite Runtime), formerly known as TensorFlow Lite, is Google's high-performance runtime for on-device AI. You can find ready-to-run LiteRT models for a wide range of ML/AI tasks, or convert and run TensorFlow, PyTorch, and JAX models to the TFLite format using the AI Edge conversion and optimization tools.

LiteRT for Microcontrollers produces a native C++ binary that can be loaded on a microcontroller. The model is translated via Tensorflow. Examples below of running this model via Zephyr RTOS.

As of this writing (Nov 2025), LiteRT-LM supports mobile devices and Linux systems (similarly to base LiteRT); but, it does not support microcontroller-class devices (eg, Zephyr).

Example:

Example run with Zephyr on arty_a7

Running on a litex_vexriscv / Arty a7-100t:

$ litex_term /dev/ttyUSB1 --speed 115200 --kernel build/zephyr/zephyr.bin
...
--============= Liftoff! ===============--
*** Booting Zephyr OS build v4.3.0-rc2-127-gf2428c68ff2f ***
x_value: 0.000000, y_value: 0.000000

x_value: 0.314159, y_value: 0.372770

x_value: 0.628319, y_value: 0.559154

x_value: 0.942478, y_value: 0.847203

x_value: 1.256637, y_value: 0.982756

x_value: 1.570796, y_value: 1.042060

x_value: 1.884956, y_value: 0.957340

x_value: 2.199115, y_value: 0.864148

x_value: 2.513274, y_value: 0.609986

x_value: 2.827433, y_value: 0.313465

Framework capabilities and limitations

  • Proper quantization support. Not just values scaling.
  • Limited, but decent amount of supported OPS
  • Scalability. 'LiteRT for Microcontrollers' framework operates with same .tflite format as LiteRT, which is more advanced.
  • Custom operators support. Very useful feature.

ExecuTorch

ExecuTorch is PyTorch’s solution for efficient AI inference on edge devices — from mobile phones to embedded systems.

Key Value Propositions

  • Portability: Run on diverse platforms, from high-end mobile to constrained microcontrollers
  • Performance: Lightweight runtime with full hardware acceleration (CPU, GPU, NPU, DSP)
  • Productivity: Use familiar PyTorch tools from authoring to deployment

There is a proposed WIP PR for bringing ExecuTorch into Zephyr:

(As of Nov 2025, feedback from upstream Zephyr maintainers is steering towards an out-of-tree Zephyr module rather than in-tree; N.B. emlearn Zephyr examples are out-of-tree.)

Other readings:

Framework capabilities and limitations

  • Proper quantization support with 4/8 bits support for weights and activations quantization
  • Multiple backends with XNNPACK being one of them that support RISC-V architecture. Therefore main limitation - XNNPACK implementation.
  • Supported ops are from XNNPACK that covers basic needs for CV and signal processing.
  • Supported by huggingface using optimum-executorch.
  • 50KB Base Footprint.
  • Selective build for specific kernels/ops.
  • Does not have native integration with ZephyrOS.
  • Scalable from simple MLP's up to LLM's.

Apache TVM

microTVM is deprecated/removed as of TVM 0.19 (as of Nov 2025, 0.22.x is current release) - see discussion:

Of note:

As a consolation that VTA & micro is gone, the mentioned tutorial’s last part/goal can include a small showcase how to construct a small custom “vector instruction/block” (i.e. a instantaneous HW dot-product) as a hypotetic ISA extension (i.e. it can be a futuristic RISC-V extension/block) and how to declare the TIR search template for it with it’s real or a virtual (in our case, to run on a local PC for simulation, a C equivalent or a verilated call/implementation function for it).

Open question: could we follow the breadcrumb above to support ET ISA extensions?

Links:

IREE

IREE consist of a MLIR-based compiler and a runtime. Follows usual mlir-based frameworks pattern -- model parsed and lowered down to Intermediate Representation -> target platform compiler generates device code -> device code executed by the runtime.

From docs:

IREE adopts a holistic approach towards ML model compilation: the IR produced contains both the scheduling logic, required to communicate data dependencies to low-level parallel pipelined hardware/API like Vulkan, and the execution logic, encoding dense computation on the hardware in the form of hardware/API-specific binaries like SPIR-V.

Because IREE is mlir-based and TOSA-compliant it seems that it supports

  • For Pytorch - similar set of operations as executorch, that is derived from Core Aten IR
  • For Tensorflow - based on TOSA-lowerings provided by tensorflow itself.
  • For ONNX - uses torch-MLIR for operations lowering, so set of supported operations is similar to pytorch's one.

Some interesting links:

Framework capabilities and limitations

  • Support import from JAX, ONNX, Pytorch, TensorFlow and TensorFlowLite(LiteRT)
  • Arch support: ARM, x86, RISC-V
  • As far as my understanding goes as long as frontend framework supports quantized model, it should be possible to lower it. Based on that discussion
  • Not directly supported by Zephyr, but some developers reported that managed to run IREE on zephyr SLIDES VIDEO

Kenning

Kenning can support running embedded models on Zephyr via LiteRT and microTVM (since deprecated as of TVM 0.19.0 - see TVM section).

From Kenning README:

Kenning addresses this issue by providing a unified API that focuses on deployment tasks rather than their implementation - the developer decides which implementation should be used for each task, and with Kenning, it is possible to do in a seamless way. This way, switching to another target platform results, in most cases, in a very small change in the code, instead of reimplementing larger parts of a project. This is how Kenning can get the most out of the existing Deep Neural Network training and compilation frameworks.

MicroBlocks

MicroBlocks is a blocks programming language for physical computing inspired by Scratch.

MicroBlocks is a free, live, blocks programming system for educators and makers that aims to be "the Scratch of physical computing." It runs on the micro:bit, Raspberry Pi Pico (RP2040), Calliope mini, Adafruit CircuitPlayground Express and Bluefruit, ESP8266, ESP32, many other microcontrollers. The MicroBlock firmware (or virtual machine) runs on 32-bit embedded processors with as little as 16k of RAM. It is intended to be simple, easily ported and extended, and offer decent performance. It includes a low-latency task scheduler that works at timescales down to ~50 microseconds and a garbage collected memory that allows working with dynamic lists and strings – within the limits of the available RAM, of course!

The MicroBlocks runtime is dependent upon Arduino primitives:

The MicroBlocks firmware, or virtual machine, is written in C and C++. It is built on the Arduino platform and uses additional Arduino libraries for features such as graphics and WiFi on boards that support those features.

There are efforts on supporting the Arduino framework on top of Zephyr: