Cpp interface of LLMs - chunhualiao/public-docs GitHub Wiki
| Category | Representative libraries / approaches | Main Characteristics |
|---|---|---|
| Thin REST wrappers | • libcpr (C++ Requests) • cpp-httplib + nlohmann/json • Boost.Beast (+ your JSON lib) • Community OpenAI wrappers (liboai, openai-cpp) | Minimal abstractions: you build HTTP requests and JSON payloads yourself. Fast to integrate and easy to vendor (often header-only), but you own retries, streaming chunk handling, auth flows, and error policy. ([GitHub][1], [Boost][2]) |
| Full SDKs / service clients (C++) | • AWS SDK for C++ (Amazon Bedrock Runtime) • Google Cloud Vertex AI C++ Client (AI Platform) • NVIDIA Triton Inference Server C++ client | Higher‑level clients with built‑in auth, retries, pagination/streaming (often gRPC) and telemetry hooks; larger dependency trees. Note: OpenAI does not ship an official C++ SDK; use REST or community libs. ([sdk.amazonaws.com][3], [Google Cloud][4], [NVIDIA Docs][5], [OpenAI Platform][6], [GitHub][7]) |
| Embedded / on-device | • llama.cpp (C/C++ API; optional REST server) • TensorRT‑LLM C++ runtime (NVIDIA GPUs) • ONNX Runtime C++ API • whisper.cpp (local ASR) • GGUF/GGML executors | Run models locally (no network). Great for edge/HPC; requires model conversion/quantization and careful memory/VRAM budgeting. Footprint varies from tiny CPU‑only builds to GPU‑accelerated runtimes. ([GitHub][8], [NVIDIA Docs][9], [ONNX Runtime][10]) |
| Framework-style orchestration libs | • llama.cpp server + C++/HTTP clients • instinct.cpp (agent/RAG in C++) • LLMKit++ (LangChain‑inspired, Ollama‑backed) | Opinionated chains/agents, tools, and memory abstractions. Smaller/younger ecosystems in C++ compared to Python; useful when you want built‑ins beyond raw API calls. ([GitHub][8]) |
Notes & sources (updated Aug 2025):
libcprreleases and repo;cpp-httplibreleases;nlohmann/json;Boost.Beastdocs. ([GitHub][11], [Boost][2])- Community OpenAI C++ libraries:
liboai,openai-cpp. OpenAI’s official SDKs focus on Python/JS/.NET, not C++. ([GitHub][12], [OpenAI Platform][6]) - Full SDKs / clients: AWS Bedrock Runtime (C++), Google Cloud Vertex AI C++ client, NVIDIA Triton C++ client libs. ([sdk.amazonaws.com][3], [Google Cloud][4], [NVIDIA Docs][5])
- Embedded / on‑device:
llama.cpp, GGUF format, TensorRT‑LLM C++ runtime, ONNX Runtime C++ API,whisper.cpp. ([GitHub][8], [NVIDIA Docs][9], [ONNX Runtime][10])