dyn_feature - OpenNebula/one-apps GitHub Wiki
Features and Usage
The appliance deploys the selected LLM using Dynamo for distributing and optimizing computing. There are many engines available (see official Dynamo documentation).
This appliance integrates seamlessly with models available on Hugging Face (note: a Hugging Face account and token may be required for access to some models). The appliance runs the simplest configuration for testing dynamo with a variety of models.
Contextualization
The appliance's behavior and configuration are controlled by contextualization parameters specified in the VM template's Context Section. Below are the primary configurable aspects:
Inference engine parameters
You can select which dynamo infrerence engine to use.
Parameter | Default | Description |
---|---|---|
ONEAPP_DYNAMO_ENGINE_NAME |
vllm |
Name of the dynamo engine to use. (Available engines: mistralrs,sglang,llamacpp,vllm,trtllm,echo_full,echo_core). |
ONEAPP_DYNAMO_ENGINE_EXTRA_ARGS_JSON |
- |
Engine extra args set in JSON format. |
ONEAPP_DYNAMO_ENGINE_EXTRA_ARGS_JSON_BASE64 |
- |
Engine extra args set in JSON and encoded in base64. |
Inference API
The model will be exposed through an API that can be consumed by your application, you can control it with the following parameters. You can also deploy a web application to interact with the model.
Parameter | Default | Description |
---|---|---|
ONEAPP_DYNAMO_API_PORT |
8000 |
Port number for the API endpoint. |
ONEAPP_DYNAMO_API_WEB |
YES |
Deploy a web application to interact with the model. |
Inference Model
Parameter | Default | Description |
---|---|---|
ONEAPP_DYNAMO_MODEL_ID |
Qwen/Qwen2.5-1.5B-Instruct |
Determines the LLM model to use for inference. |
ONEAPP_DYNAMO_MODEL_TOKEN |
- |
Hugging Face token to access the specified LLM model. |
Using the Appliance from the CLI
Dynamo is installed in a virtual environment, you will need to activate it before using the dynamo-run
application:
root@dynamo-chatbot:~# . ./dynamo_env/bin/activate
(dynamo_env) root@RayLLM:~# dynamo run in=http out=mistralrs --http-port 8000 Qwen/Qwen2.5-0.5B-Instruct
Using GPUs
The appliance is designed to utilize all available CPU and GPU resources in the VM by default. However, GPU drivers are not pre-installed. To use GPUs, the appropriate drivers must be installed. You can install them during the appliance build process using the NVIDIA_DRIVER_PATH
environment variable when executing the Ray appliance build Makefile recipe. This variable could contain an URL or a local path with the drivers package, e.g.:
sudo NVIDIA_DRIVER_PATH=/path/to/drivers make packer-service_Dynamo
GPUs can be added to the VM using:
- PCI Passthrough
- SR-IOV vGPUs
Some configurations may require downloading proprietary drivers and configuring associated licenses. Note: When using NVIDIA cards, select a profile that supports OpenCL and CUDA applications (e.g., Q-series vGPU types).
After deployment, the application should utilize the GPU resources, as verified using nvidia-smi
:
root@dynamo-chatbot:~# nvidia-smi
Tue Dec 31 15:28:25 2024
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.216.01 Driver Version: 535.216.01 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA A10-24Q On | 00000000:01:01.0 Off | 0 |
| N/A N/A P8 N/A / N/A | 6259MiB / 24576MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| 0 N/A N/A 2286 C Dynamo::ChatBot 6257MiB |
+---------------------------------------------------------------------------------------+