How to Build - chraac/llama.cpp GitHub Wiki

This guide describes how to build Android and Windows versions of the QNN backend for llama.cpp, enabling efficient inference on Qualcomm hardware.

Table of Contents
Android Build
Windows Build
Troubleshooting
- Common Issues

Android Build

Android Prerequisites

Docker Engine
- Install following the official Docker guide
- Ensure Docker Compose is included with your installation

Source Code

Clone the repository:

git clone https://github.com/chraac/llama-cpp-qnn-builder.git
cd llama-cpp-qnn-builder

Note: Use the latest main branch as we're using NDK r27c with important optimization flags for Release builds.

Android Build Process

Basic Build
- Navigate to the project root directory:
```
./docker/docker_compose_compile.sh
```
Build Output
- Executables will be in build_qnn_arm64-v8a/bin/
- The console will show build progress and completion status:

Build Options

Parameter	Short	Description	Default
`--rebuild`	`-r`	Force rebuild of the project	`false`
`--repo-dir`		Specify llama.cpp repository directory	`../llama.cpp`
`--debug`	`-d`	Build in Debug mode	`Release`
`--asan`		Enable AddressSanitizer	`false`
`--build-linux-x64`		Build for Linux x86_64 platform	`android arm64-v8a`
`--perf-log`		Enable Hexagon performance tracking	`false`
`--enable-hexagon-backend`		Enable Hexagon backend support	`false`
`--hexagon-npu-only`		Build Hexagon NPU backend only	`false`
`--disable-hexagon-and-qnn`		Disable both Hexagon and QNN backends	`false`
`--qnn-only`		Build QNN backend only	`false`
`--enable-dequant`		Enable quantized tensor support in Hexagon	`false`

Build Examples

# Basic build (default: Release mode, QNN + Hexagon backends)
./docker/docker_compose_compile.sh

# Debug build with Hexagon NPU backend
./docker/docker_compose_compile.sh -d --enable-hexagon-backend

# Debug build with Hexagon NPU backend only
./docker/docker_compose_compile.sh -d --hexagon-npu-only

# Debug build with Hexagon NPU backend and quantized tensor support
./docker/docker_compose_compile.sh -d --hexagon-npu-only --enable-dequant

# QNN-only build with performance logging
./docker/docker_compose_compile.sh --qnn-only --perf-log

# Force rebuild with debug symbols
./docker/docker_compose_compile.sh -r -d

Hexagon SDK Setup

To build with Hexagon NPU backend support, you need to create a Docker image that includes the Hexagon SDK.

Prerequisites

Hexagon SDK
- Option 1: Download SDK from Hexagon NPU SDK - Getting started (version 6.3.0.0 for Linux)
- Option 2: Use an existing SDK installation
Base Docker Image
- Required image: chraac/llama-cpp-qnn-builder:2.36.0.250627-ndk-r27
- Contains Android NDK r27c and build tools

Building the Hexagon SDK Image with Local SDK Folder

If you already have the Hexagon SDK extracted on your machine:

Create Dockerfile (save as Dockerfile.hexagon_sdk.local):

FROM chraac/llama-cpp-qnn-builder:2.36.0.250627-ndk-r27

ENV HEXAGON_SDK_VERSION='6.3.0.0'
ENV HEXAGON_SDK_BASE=/local/mnt/workspace/Qualcomm/Hexagon_SDK
ENV HEXAGON_SDK_PATH=${HEXAGON_SDK_BASE}/${HEXAGON_SDK_VERSION}
ENV ANDROID_NDK_HOME=/android-ndk/android-ndk-r27c
ENV ANDROID_ROOT_DIR=${ANDROID_NDK_HOME}/

RUN mkdir -p ${HEXAGON_SDK_PATH}
ARG LOCAL_SDK_PATH
ADD ${LOCAL_SDK_PATH} ${HEXAGON_SDK_PATH}/6.3.0.0

# Install required dependencies
RUN apt update && apt install -y \
    python-is-python3 \
    libncurses5 \
    lsb-base \
    lsb-release \
    sqlite3 \
    rsync \
    git \
    build-essential \
    libc++-dev \
    clang \
    cmake

# Dummy version info for hexagon-sdk 
RUN echo 'VERSION_ID="20.04"' > /etc/os-release

Create Setup Script (save as docker_compose_hexagon_local.sh):

#!/bin/bash

# Check if SDK path is provided
if [ -z "$1" ]; then
echo "Usage: $0 /path/to/hexagon/sdk/6.3.0.0"
exit 1
fi

SDK_PATH="$1"

# Check if SDK path exists
if [ ! -d "$SDK_PATH" ]; then
echo "Error: SDK path does not exist: $SDK_PATH"
exit 1
fi

# Build the Docker image with SDK embedded
docker build -f Dockerfile.hexagon_sdk.local --build-arg LOCAL_SDK_PATH="$SDK_PATH" -t llama-cpp-qnn-hexagon:embedded .

# Create a Docker Compose configuration file
cat > docker-compose.hexagon.yml << EOF
version: '3'
services:
hexagon-builder:
   image: llama-cpp-qnn-hexagon:embedded
   volumes:
      - ./:/workspace
   working_dir: /workspace
EOF

echo "Setup complete! Use the following command to compile with Hexagon support:"
echo "./docker/docker_compose_compile.sh --enable-hexagon-backend"

Run Setup:

chmod +x docker_compose_hexagon_local.sh
./docker_compose_hexagon_local.sh /path/to/your/Hexagon_SDK/6.3.0.0

Build with Hexagon Support:

# Enable Hexagon NPU backend
./docker/docker_compose_compile.sh --enable-hexagon-backend

# Or build with Hexagon NPU backend only
./docker/docker_compose_compile.sh --hexagon-npu-only

# Access container shell for manual builds
docker-compose -f docker-compose.hexagon.yml run --rm hexagon-builder bash

Windows Build

Windows Prerequisites

Qualcomm AI Engine Direct SDK
- Download from Qualcomm Developer Portal
- Extract to a folder (example: C:/ml/qnn_sdk/qairt/2.31.0.250130/)
Visual Studio 2022
- Required components:
  - Clang toolchain for ARM64 compilation
  - CMake tools for Visual Studio
Hexagon SDK (optional, only for Hexagon NPU backend)
- Follow Hexagon NPU SDK - Getting started
- Install Qualcomm Package Manager (QPM) first
- Use QPM to install the Hexagon SDK
- Set environment variable HEXAGON_SDK_ROOT to your installation directory

Windows Build Process

Open Project
- Launch Visual Studio 2022
- Click Continue without code
- Navigate to File → Open → CMake
- Select CMakeLists.txt in the llama.cpp root directory

Configure CMake

Edit llama.cpp/CMakePresets.json to modify the arm64-windows-llvm configuration:

{
    "name": "arm64-windows-llvm", 
    "hidden": true,
    "architecture": { "value": "arm64", "strategy": "external" },
    "toolset": { "value": "host=x64", "strategy": "external" },
    "cacheVariables": {
-        "CMAKE_TOOLCHAIN_FILE": "${sourceDir}/cmake/arm64-windows-llvm.cmake"
+        "CMAKE_TOOLCHAIN_FILE": "${sourceDir}/cmake/arm64-windows-llvm.cmake",
+        "GGML_QNN": "ON",
+        "GGML_QNN_SDK_PATH": "C:/ml/qnn_sdk/qairt/2.31.0.250130/",
+        "BUILD_SHARED_LIBS": "OFF"
    }
},

Important: Replace the QNN SDK path with your actual installation path.

Select Configuration
- Choose arm64-windows-llvm-debug configuration from the dropdown menu
Build
- Select Build → Build All
- Output will be in build-arm64-windows-llvm-debug/bin/

Windows Build Output

After successful compilation, you'll have these executables:

llama-cli.exe - Main inference executable
llama-bench.exe - Benchmarking tool
test-backend-ops.exe - Backend operation tests

Troubleshooting

Common Issues

Docker Permission Issues

Add your user to the docker group:

sudo usermod -aG docker $USER
# Log out and back in for changes to take effect

Hexagon SDK Compatibility
- Verify you're using exactly version 6.3.0.0 of the SDK
- Ensure SDK directory permissions allow Docker container access
Build Failures
- Check Docker logs for detailed error messages:
```
docker-compose -f docker-compose.hexagon.yml logs
```