LLM on Raspberry Pi - cu-ecen-aeld/buildroot-assignments-base GitHub Wiki

Details

This guide will help integrate llama.cpp (LLM inference engine) into your Buildroot-based embedded Linux system for Raspberry Pi.


References


Requirements

  1. Raspberry Pi 4 (4GB+ RAM recommended)

Step 1: Validate on Raspbian OS First

Always test on full Linux before Buildroot integration!

# On Raspbian/Raspberry Pi OS
sudo apt update
sudo apt install -y build-essential git cmake

# Clone and build llama.cpp
cd ~
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp
git checkout b4315
mkdir build && cd build
cmake .. -DGGML_CUDA=OFF
make -j4

# Binary location
~/llama.cpp/build/bin/llama-cli

# Create models directory
mkdir -p ~/models

# Download TinyLlama model
cd ~/models
wget https://huggingface.co/TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF/resolve/main/tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf

# Simple test
~/llama.cpp/build/bin/llama-cli \
  --model ~/models/tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf \
  --prompt "Hello, how are you?" \
  -n 20
image

image

If this doesn't work, fix it before proceeding to Buildroot!


Step 2: Create Buildroot Package

2.1 Create Directory Structure

In your project repo, get buildroot as a submodule (as done in assignment 5 and further).

cd ~/buildroot-assignments-base
mkdir -p base_external/package/llama-cpp

2.2 Create Package Files

Config.in:
See: https://github.com/cu-ecen-aeld/final-project-hawa7555/blob/main/base_external/package/llama-cpp/Config.in

llama-cpp.mk:
See: https://github.com/cu-ecen-aeld/final-project-hawa7555/blob/main/base_external/package/llama-cpp/llama-cpp.mk

Above files are the minimal files needed to integrate llama in buildroot, for more systematic structure which can be use in application development. Please refer to this base external folder: https://github.com/cu-ecen-aeld/final-project-hawa7555/tree/main/base_external

(Refer to .mk files and Config.in files in the above base external folder and respective package folder that you want to intergrate, for scope of this document, llama has been shown, but steps will be similar for whisper & piper too which are speech-text and text-speech systems.)


Step 3: Configure Buildroot

cd ~/buildroot
make menuconfig

Use menuconfig to enable the llama-cpp package in your target packages menu.

menuconfig llama-cpp

Step 4: Deploy Model File

Include model in image:

# Create overlay directory
mkdir -p base_external/rootfs_overlay/root/models

# Download model (TinyLlama-1.1B is smallest usable model)
wget -O base_external/rootfs_overlay/root/models/tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf \
  https://huggingface.co/TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF/resolve/main/tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf

# In menuconfig, set:
# System configuration → Root filesystem overlay directories
# to your rootfs_overlay path

Alternative: Create a separate package for model downloads. See example:
https://github.com/cu-ecen-aeld/final-project-hawa7555/blob/main/base_external/package/ai-models/ai-models.mk


Step 5: Build

cd ~/buildroot
make clean
make -j$(nproc)

Build time: 1-3 hours on first build.

Build output

Validation (on Raspberry Pi)

1. Boot and Check Installation

Boot Raspberry Pi and login as root.

# Verify llama-cli is installed
which llama-cli
# Expected: /usr/bin/llama-cli

# Check help
llama-cli --help

Pi llama buildroot

2. Test Basic Inference

# Run inference test
llama-cli \
    -m /root/models/tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf \
    -p "What is embedded Linux?" \
    -n 50

Pi llama prompt

Expected behavior:

  • Model loads (5-10 seconds)
  • Text generation starts
  • No crash/segmentation faults

3. Verify Architecture

file /usr/bin/llama-cli
# Must show: ARM aarch64 (not x86-64!)

Integration with Your Application

Working examples from the reference implementation:

  1. Loading llama in background:
    https://github.com/hawa7555/final-project-assignment-hawa7555/blob/main/scripts/start_llama.sh

  2. Response parser:
    https://github.com/hawa7555/final-project-assignment-hawa7555/blob/main/app/llm_interface.c

  3. Testing LLM:
    https://github.com/hawa7555/final-project-assignment-hawa7555/blob/main/test/llm_test.c

  4. Model download recipe:
    https://github.com/cu-ecen-aeld/final-project-hawa7555/blob/main/base_external/package/ai-models/ai-models.mk


Model Selection Guide

Why TinyLlama-1.1B-Q4_K_M?

Tested models on Raspberry Pi 4 (4GB):

Model Size Load Time Speed Quality Recommended
TinyLlama-Q2_K 450MB 3-4s 8-10 tok/s Good Fast inference
TinyLlama-Q4_K_M 637MB 5-7s 6-8 tok/s Better Best balance
TinyLlama-Q8_0 1.1GB 8-10s 4-6 tok/s Best For 8GB Pi only

Selection criteria:

  1. Less model size
  2. Q4_K_M quantization - best speed & quality tradeoff (balanced)
  3. Context length 512-1024 - sufficient for conversational AI
  4. TinyLlama architecture - optimized for low-resource inference

Debugging common issues

Issue Cause Solution
"cannot execute binary file" Binary is x86_64 Verify cross compiling
Very slow (<3 tok/s) CPU throttling Check governor: scaling_governor should be performance
Model not found Wrong path Verify /root/models/*.gguf exists

Future Work & Improvements

  1. Prompt templates - Pre-defined templates for common queries
  2. GPU Support - Future Pi models with better GPU support
  3. Test on 8GB Pi for better quality & speed
  4. Fine-tuning Model for specific application

Note: This guide is working configuration as of December 2025. Future versions of llama.cpp or Buildroot may require some adjustments.

⚠️ **GitHub.com Fallback** ⚠️