🛠️ Features and Usage

The main purpose of the NVIDIA Fabric Manager Service VM appliance is to expose the NVSwitch partition management functionality through the nv-partitioner tool. Once partitions are configured, Guest VMs can be deployed on the same host to utilize the defined GPU topologies.

⚙️ NVSwitch Partitioning and Management

The nv-partitioner tool is the key component for defining the virtual fabric that the NVSwitches present to the Guest VMs. Note: This tool operates on pre-existing partitions defined by a configuration file external to this utility. Its primary function is to list, activate, and deactivate these configured partitions.

Accessing the Management Tool

All management is performed by SSHing into the Fabric Manager Service VM:

$ onevm ssh service_FabricManager_host1

Key Management Commands (nv-partitioner) The nv-partitioner utility can be run in interactive mode (running without options) or via command-line flags, following this structure:

Usage: nv-partitioner [-i <IP>] -o <OP> [-p <ID>] [-f <FORMAT>]

Flag	Full Name	Description	Example Value(s)
-i	--ip `<IP>`	IP address of Fabric Manager.	Default: `127.0.0.1`
-o	--operation `<N>`	The operation to perform. Required.	`0` (List), `1` (Activate), `2` (Deactivate)
-p	--partition `<ID>`	Partition ID. Required for Activate (1) or Deactivate (2).	Integer ID of the partition
-f	--format `<FORMAT>`	Output format for the List operation (0).	`csv` or `table` (Default: `table`)

🌐 OpenNebula Deployment Flow

The Service VM is part of a two-step process for virtualizing NVSwitch systems:

Fabric Setup (Service VM):

Deploy this Service VM appliance on the target host with PCI Passthrough of the NVSwitches.
Access the VM and use nv-partitioner to Activate the required GPU partitions (e.g., Partition ID 1 for a 4-GPU group).

Host Reporting (Feedback to OpenNebula):

Once a partition is Active, the host will begin reporting the new hardware topology to OpenNebula.
Only the GPUs assigned to the Active partition will be visible and reported by the host, effectively virtualizing the NVSwitch fabric into usable, isolated blocks.

Workload Deployment (Guest VM):

Instantiate the Guest VM (where the actual workload runs) on the same host.
Configure the Guest VM template with PCI Passthrough for the specific GPUs (e.g., GPU 0, 1, 4, 5) that belong to the desired active partition.

The NVSwitch fabric, managed by the Service VM, ensures that the Guest VM's assigned GPUs communicate with each other using the high-speed topology defined by the active partition.

fabric_feature - OpenNebula/one-apps GitHub Wiki

🛠️ Features and Usage

⚙️ NVSwitch Partitioning and Management

Accessing the Management Tool

🌐 OpenNebula Deployment Flow

⚠️ GitHub.com Fallback ⚠️

fabric_feature - OpenNebula/one-apps GitHub Wiki

🛠️ Features and Usage

⚙️ NVSwitch Partitioning and Management

Accessing the Management Tool

🌐 OpenNebula Deployment Flow

⚠️ **GitHub.com Fallback** ⚠️

⚠️ GitHub.com Fallback ⚠️