ray_feature - OpenNebula/one-apps GitHub Wiki
Features and Usage
The Ray appliance comes pre-installed with the Ray framework and its Ray Serve library, enabling the deployment inference APIs. This appliance simplifies the deployment of model-serving applications and integrates seamlessly with models available on Hugging Face (a Hugging Face account and token may be required for certain models).
Contextualization
The appliance's behavior and configuration are controlled by contextualization parameters specified in the VM template's Context Section. Below are the primary configurable aspects:
Ray Application
A simple model-serving application is included with the Ray appliance for testing purposes. See the config.rb file for details. The application deployment can be controlled using the following parameters:
Parameter | Default | Description |
---|---|---|
ONEAPP_RAY_APPLICATION_URL |
- | URL to download the Python application. |
ONEAPP_RAY_APPLICATION_FILE64 |
- | Python application to be deployed in the Ray framework (base64 encoded). |
API Endpoint
Parameter | Default | Description |
---|---|---|
ONEAPP_RAY_API_PORT |
8000 | Port number for the API endpoint. |
ONEAPP_RAY_API_ROUTE |
"/chat" | Route path for the REST API exposed by the Ray application. |
Application Model
Parameter | Default | Description |
---|---|---|
ONEAPP_RAY_MODEL_ID |
meta-llama/Llama-3.2-1B-Instruct | Specifies the AI model(s) used for inference. |
ONEAPP_RAY_MODEL_TEMPERATURE |
0.1 | Controls the randomness of generated text by adjusting the temperature setting. |
ONEAPP_RAY_MODEL_TOKEN |
- | Provides the authentication token required to access the specified AI model. |
Configuration Files
To achieve full control over the application setup, you can provide a configuration file for the Ray Serve application. Refer to the Ray Serve documentation for detailed a description. Use the following parameter to configure this:
Parameter | Default | Description |
---|---|---|
ONEAPP_RAY_CONFIG_FILE64 |
- | Base64-encoded configuration file for the Serve application. |
Using GPUs
The appliance is designed to utilize all available CPU and GPU resources in the VM by default. However, GPU drivers are not pre-installed. To use GPUs, the appropriate drivers must be installed. GPUs can be added to the VM using:
- PCI Passthrough
- SR-IOV vGPUs
Some configurations may require downloading proprietary drivers and configuring associated licenses. Note: When using NVIDIA cards, select a profile that supports OpenCL and CUDA applications (e.g., Q-series vGPU types).
After deployment, the application should utilize the GPU resources, as verified using nvidia-smi
:
root@ray-app-28245:~# nvidia-smi
Tue Dec 31 15:28:25 2024
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.216.01 Driver Version: 535.216.01 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA A10-24Q On | 00000000:01:01.0 Off | 0 |
| N/A N/A P8 N/A / N/A | 6259MiB / 24576MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| 0 N/A N/A 2286 C ray::ServeReplica:app1:ChatBot 6257MiB |
+---------------------------------------------------------------------------------------+