fabric_intro - OpenNebula/one-apps GitHub Wiki

🚀 Overview

The NVIDIA Fabric Manager Service VM appliance is a specialized OpenNebula tool designed to implement the NVIDIA NVSwitch Virtualization Model. This model is essential for virtualizing systems with multiple GPUs interconnected by NVSwitches (such as HGX or DGX platforms), allowing for the creation of hardware partitions for diverse workloads.

This appliance acts as the necessary Service VM on each compute node, taking control of the NVSwitch devices via PCI Passthrough and running the NVIDIA management software to partition the high-speed fabric interconnect.


📦 Appliance Components

The appliance is pre-configured with all components required to deploy the NVSwitch virtualization model:

Component Description
NVIDIA Drivers Proprietary drivers for hardware detection and management.
Fabric Manager Service The core NVIDIA service for managing the NVSwitch fabric.
Fabric Manager SDK & Dev Libraries for custom tool development.
nv-partitioner A custom C++ tool built on the Fabric Manager SDK for logical NVSwitch partitioning.

⬇️ Download and Requirements

Download

The appliance is available in the OpenNebula Marketplace:

Minimum Requirements

Requirement Description
Physical Host Server with NVIDIA GPUs and NVSwitches (e.g., NVIDIA HGX).
VM Resources 2 vCPUs, 4 GB RAM.
PCI Assignment CRITICAL: All server NVSwitch devices must be assigned to the VM using PCI Passthrough.
Host Driver The NVSwitches on the host must be bound to the vfio-pci driver before instantiation.

📝 Release Notes

The appliance is based on a stable Linux distribution.

Component Version
Base OS Ubuntu 22.04 LTS (x86-64)
NVIDIA Driver 570
Fabric Manager 570
nv-partitioner 1.0.0 (Custom Partitioning Tool)

Next: Quick Start