How to Build - chraac/llama.cpp GitHub Wiki
This guide describes how to build Android and Windows versions of the QNN backend for llama.cpp, enabling efficient inference on Qualcomm hardware.
Table of Contents
Android Build
Android Prerequisites
-
Docker Engine
- Install following the official Docker guide
- Ensure Docker Compose is included with your installation
-
Source Code
- Clone the repository:
git clone https://github.com/chraac/llama-cpp-qnn-builder.git cd llama-cpp-qnn-builder
- Clone the repository:
Note: Use the latest
main
branch as we're using NDK r27c with important optimization flags for Release builds.
Android Build Process
-
Basic Build
- Navigate to the project root directory:
./docker/docker_compose_compile.sh
- Navigate to the project root directory:
-
Build Output
- Executables will be in
build_qnn_arm64-v8a/bin/
- The console will show build progress and completion status:
- Executables will be in
Build Options
Parameter | Short | Description | Default |
---|---|---|---|
--rebuild |
-r |
Force rebuild of the project | false |
--repo-dir |
Specify llama.cpp repository directory | ../llama.cpp |
|
--debug |
-d |
Build in Debug mode | Release |
--asan |
Enable AddressSanitizer | false |
|
--build-linux-x64 |
Build for Linux x86_64 platform | android arm64-v8a |
|
--perf-log |
Enable Hexagon performance tracking | false |
|
--enable-hexagon-backend |
Enable Hexagon backend support | false |
|
--hexagon-npu-only |
Build Hexagon NPU backend only | false |
|
--disable-hexagon-and-qnn |
Disable both Hexagon and QNN backends | false |
|
--qnn-only |
Build QNN backend only | false |
|
--enable-dequant |
Enable quantized tensor support in Hexagon | false |
Build Examples
# Basic build (default: Release mode, QNN + Hexagon backends)
./docker/docker_compose_compile.sh
# Debug build with Hexagon NPU backend
./docker/docker_compose_compile.sh -d --enable-hexagon-backend
# Debug build with Hexagon NPU backend only
./docker/docker_compose_compile.sh -d --hexagon-npu-only
# Debug build with Hexagon NPU backend and quantized tensor support
./docker/docker_compose_compile.sh -d --hexagon-npu-only --enable-dequant
# QNN-only build with performance logging
./docker/docker_compose_compile.sh --qnn-only --perf-log
# Force rebuild with debug symbols
./docker/docker_compose_compile.sh -r -d
Hexagon SDK Setup
To build with Hexagon NPU backend support, you need to create a Docker image that includes the Hexagon SDK.
Prerequisites
-
Hexagon SDK
- Option 1: Download SDK from Hexagon NPU SDK - Getting started (version 6.3.0.0 for Linux)
- Option 2: Use an existing SDK installation
-
Base Docker Image
- Required image:
chraac/llama-cpp-qnn-builder:2.36.0.250627-ndk-r27
- Contains Android NDK r27c and build tools
- Required image:
Building the Hexagon SDK Image with Local SDK Folder
If you already have the Hexagon SDK extracted on your machine:
-
Create Dockerfile (save as
Dockerfile.hexagon_sdk.local
):FROM chraac/llama-cpp-qnn-builder:2.36.0.250627-ndk-r27 ENV HEXAGON_SDK_VERSION='6.3.0.0' ENV HEXAGON_SDK_BASE=/local/mnt/workspace/Qualcomm/Hexagon_SDK ENV HEXAGON_SDK_PATH=${HEXAGON_SDK_BASE}/${HEXAGON_SDK_VERSION} ENV ANDROID_NDK_HOME=/android-ndk/android-ndk-r27c ENV ANDROID_ROOT_DIR=${ANDROID_NDK_HOME}/ RUN mkdir -p ${HEXAGON_SDK_PATH} ARG LOCAL_SDK_PATH ADD ${LOCAL_SDK_PATH} ${HEXAGON_SDK_PATH}/6.3.0.0 # Install required dependencies RUN apt update && apt install -y \ python-is-python3 \ libncurses5 \ lsb-base \ lsb-release \ sqlite3 \ rsync \ git \ build-essential \ libc++-dev \ clang \ cmake # Dummy version info for hexagon-sdk RUN echo 'VERSION_ID="20.04"' > /etc/os-release
-
Create Setup Script (save as
docker_compose_hexagon_local.sh
):#!/bin/bash # Check if SDK path is provided if [ -z "$1" ]; then echo "Usage: $0 /path/to/hexagon/sdk/6.3.0.0" exit 1 fi SDK_PATH="$1" # Check if SDK path exists if [ ! -d "$SDK_PATH" ]; then echo "Error: SDK path does not exist: $SDK_PATH" exit 1 fi # Build the Docker image with SDK embedded docker build -f Dockerfile.hexagon_sdk.local --build-arg LOCAL_SDK_PATH="$SDK_PATH" -t llama-cpp-qnn-hexagon:embedded . # Create a Docker Compose configuration file cat > docker-compose.hexagon.yml << EOF version: '3' services: hexagon-builder: image: llama-cpp-qnn-hexagon:embedded volumes: - ./:/workspace working_dir: /workspace EOF echo "Setup complete! Use the following command to compile with Hexagon support:" echo "./docker/docker_compose_compile.sh --enable-hexagon-backend"
-
Run Setup:
chmod +x docker_compose_hexagon_local.sh ./docker_compose_hexagon_local.sh /path/to/your/Hexagon_SDK/6.3.0.0
-
Build with Hexagon Support:
# Enable Hexagon NPU backend ./docker/docker_compose_compile.sh --enable-hexagon-backend # Or build with Hexagon NPU backend only ./docker/docker_compose_compile.sh --hexagon-npu-only # Access container shell for manual builds docker-compose -f docker-compose.hexagon.yml run --rm hexagon-builder bash
Windows Build
Windows Prerequisites
-
Qualcomm AI Engine Direct SDK
- Download from Qualcomm Developer Portal
- Extract to a folder (example:
C:/ml/qnn_sdk/qairt/2.31.0.250130/
)
-
Visual Studio 2022
- Required components:
-
Clang toolchain for ARM64 compilation
-
CMake tools for Visual Studio
-
- Required components:
-
Hexagon SDK (optional, only for Hexagon NPU backend)
- Follow Hexagon NPU SDK - Getting started
- Install Qualcomm Package Manager (QPM) first
- Use QPM to install the Hexagon SDK
- Set environment variable
HEXAGON_SDK_ROOT
to your installation directory
Windows Build Process
-
Open Project
- Launch Visual Studio 2022
- Click
Continue without code
- Navigate to
File
→Open
→CMake
- Select
CMakeLists.txt
in the llama.cpp root directory
-
Configure CMake
Edit
llama.cpp/CMakePresets.json
to modify thearm64-windows-llvm
configuration:{ "name": "arm64-windows-llvm", "hidden": true, "architecture": { "value": "arm64", "strategy": "external" }, "toolset": { "value": "host=x64", "strategy": "external" }, "cacheVariables": { - "CMAKE_TOOLCHAIN_FILE": "${sourceDir}/cmake/arm64-windows-llvm.cmake" + "CMAKE_TOOLCHAIN_FILE": "${sourceDir}/cmake/arm64-windows-llvm.cmake", + "GGML_QNN": "ON", + "GGML_QNN_SDK_PATH": "C:/ml/qnn_sdk/qairt/2.31.0.250130/", + "BUILD_SHARED_LIBS": "OFF" } },
Important: Replace the QNN SDK path with your actual installation path.
-
Select Configuration
- Choose
arm64-windows-llvm-debug
configuration from the dropdown menu
- Choose
-
Build
- Select
Build
→Build All
- Output will be in
build-arm64-windows-llvm-debug/bin/
- Select
Windows Build Output
After successful compilation, you'll have these executables:
llama-cli.exe
- Main inference executablellama-bench.exe
- Benchmarking tooltest-backend-ops.exe
- Backend operation tests
Troubleshooting
Common Issues
-
Docker Permission Issues
- Add your user to the docker group:
sudo usermod -aG docker $USER # Log out and back in for changes to take effect
- Add your user to the docker group:
-
Hexagon SDK Compatibility
- Verify you're using exactly version 6.3.0.0 of the SDK
- Ensure SDK directory permissions allow Docker container access
-
Build Failures
- Check Docker logs for detailed error messages:
docker-compose -f docker-compose.hexagon.yml logs
- Check Docker logs for detailed error messages: