dyn_feature - OpenNebula/one-apps GitHub Wiki

Features and Usage

The appliance deploys the selected LLM using Dynamo for distributing and optimizing computing. There are many engines available (see official Dynamo documentation).

This appliance integrates seamlessly with models available on Hugging Face (note: a Hugging Face account and token may be required for access to some models). The appliance runs the simplest configuration for testing dynamo with a variety of models.

Contextualization

The appliance's behavior and configuration are controlled by contextualization parameters specified in the VM template's Context Section. Below are the primary configurable aspects:

Inference engine parameters

You can select which dynamo infrerence engine to use.

Parameter	Default	Description
`ONEAPP_DYNAMO_ENGINE_NAME`	`vllm`	Name of the dynamo engine to use. (Available engines: mistralrs,sglang,llamacpp,vllm,trtllm,echo_full,echo_core).
`ONEAPP_DYNAMO_ENGINE_EXTRA_ARGS_JSON`	`-`	Engine extra args set in JSON format.
`ONEAPP_DYNAMO_ENGINE_EXTRA_ARGS_JSON_BASE64`	`-`	Engine extra args set in JSON and encoded in base64.

Inference API

The model will be exposed through an API that can be consumed by your application, you can control it with the following parameters. You can also deploy a web application to interact with the model.

Parameter	Default	Description
`ONEAPP_DYNAMO_API_PORT`	`8000`	Port number for the API endpoint.
`ONEAPP_DYNAMO_API_WEB`	`YES`	Deploy a web application to interact with the model.

Inference Model

Parameter	Default	Description
`ONEAPP_DYNAMO_MODEL_ID`	`Qwen/Qwen2.5-1.5B-Instruct`	Determines the LLM model to use for inference.
`ONEAPP_DYNAMO_MODEL_TOKEN`	`-`	Hugging Face token to access the specified LLM model.

Using the Appliance from the CLI

Dynamo is installed in a virtual environment, you will need to activate it before using the dynamo-run application:

root@dynamo-chatbot:~# . ./dynamo_env/bin/activate

(dynamo_env) root@RayLLM:~# dynamo run in=http out=mistralrs --http-port 8000 Qwen/Qwen2.5-0.5B-Instruct

Using GPUs

The appliance is designed to utilize all available CPU and GPU resources in the VM by default. However, GPU drivers are not pre-installed. To use GPUs, the appropriate drivers must be installed. You can install them during the appliance build process using the NVIDIA_DRIVER_PATH environment variable when executing the Ray appliance build Makefile recipe. This variable could contain an URL or a local path with the drivers package, e.g.:

sudo NVIDIA_DRIVER_PATH=/path/to/drivers make packer-service_Dynamo

GPUs can be added to the VM using:

PCI Passthrough
SR-IOV vGPUs

Some configurations may require downloading proprietary drivers and configuring associated licenses. Note: When using NVIDIA cards, select a profile that supports OpenCL and CUDA applications (e.g., Q-series vGPU types).

After deployment, the application should utilize the GPU resources, as verified using nvidia-smi:

root@dynamo-chatbot:~# nvidia-smi
Tue Dec 31 15:28:25 2024
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.216.01             Driver Version: 535.216.01   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA A10-24Q                 On  | 00000000:01:01.0 Off |                    0 |
| N/A   N/A    P8              N/A /  N/A |   6259MiB / 24576MiB |      0%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A      2286      C   Dynamo::ChatBot                            6257MiB |
+---------------------------------------------------------------------------------------+