fabric_quick - OpenNebula/one-apps GitHub Wiki

🚀 Quick Start

This guide outlines the essential steps for deploying the Fabric Manager Service VM appliance and ensuring the correct assignment of NVSwitch devices.

  1. Download the Template Retrieve the appliance template from the OpenNebula marketplace:

    $ onemarketapp export 'Service Fabric Manager' service_FabricManager_ --datastore default

    ⚠️ VM/Template Name requirement: Ensure that the VMs instantiated for this service begin with the name "service_FabricManager_*". Otherwise, the monitoring probes may fail and not correctly report the status of the NVSwitch partitions.

  2. Critical Template Configuration (PCI Passthrough) You must modify the VM template to ensure the Service VM is deployed on the correct physical host and receives the NVSwitch devices via PCI Passthrough.

    • Host Affinity: Set an affinity rule to ensure deployment on the intended Host:
      SCHED_REQUIREMENTS="ID = <Host_ID_with_NVSwitches>"
      
    • PCI Devices: Add all server NVSwitch devices to the template. Replace the addresses with your NVSwitch PCI addresses.
       PCI=[
        SHORT_ADDRESS="07:00.0" ]
      PCI=[
        SHORT_ADDRESS="08:00.0" ]
      PCI=[
        SHORT_ADDRESS="09:00.0" ]
      PCI=[
        SHORT_ADDRESS="0a:00.0" ]
      

    ⚠️ Host Pre-requisite: Ensure the NVSwitch devices on the physical host are configured for passthrough, typically by being bound to the vfio-pci driver.

  3. Instantiate the Template Instantiate the configured template. The VM will boot and attempt to start the Fabric Manager service.

  4. Initial Boot Consideration (Important) The appliance may fail on first boot if it cannot detect the required NVSwitch PCI devices. This happens if the PCI Passthrough setup is not complete.

    • If the VM fails: Verify that the NVSwitches were correctly passed through from the host. If the passthrough is correct, a reboot of the VM (onevm reboot <VM_ID>) should allow the appliance to initialize the drivers and services successfully.
  5. Access and Verification Once the VM is in the RUNNING state, access it via SSH:

    $ onevm ssh NVSwitch_FM_VM

    Inside the VM, verify:

    # 1. NVSwitch devices are present as PCI devices
    $ lspci | grep -i nvswitch 
    
    # 2. The Fabric Manager service is running
    $ systemctl status nvidia-fabricmanager
    
    # 3. List NVSwitch available partitions
    $ nv-partitioner -o 0 

    If checks are successful, the appliance is ready for partitioning management.


Next: Features and Usage

⚠️ **GitHub.com Fallback** ⚠️