One Computer Setup

### Note- Ubuntu 18:04 should not be used without discussion with Konstantin. The default version of ubuntu approved for AD-EYE is 16.04

This guide provides the steps to establish a One Computer Setup, which consists of doing a GPU passthrough (or PCI passthrough) and setup a Virtual Machine that will run on the isolated GPU.

Essentially, with the PCI passthrough, one of the GPUs is isolated from the NVIDIA driver and a dummy driver is loaded there instead.

For its part, the VM allows to have 2 OS running on the same computer at the same moment, with great graphic performance (it is not always the case with standard virtual machines without GPU passthrough).

WARNING : Before attempting anything further, we highly recommend to read this guide in its entirety and the links in the references section.

Specifications

This guide has been tested with the following machine:

AMD Ryzen Threadripper 2950X 16-Core
64 GB of RAM
2x NVidia RTX 2080Ti
Ubuntu 16.04
Windows 10 for the virtual Machine

GPU Isolation

Before doing anything, update BIOS to latest available version.

Then, in BIOS :

Disable all raid configuration (in Advanced -> AMD PBS)
Enable Enumarate all IOMMU in IVRS (in Advanced -> AMD PBS)
Turn on VT-d / SVM Mode (in advanced -> CPU Configuration)

NVidia settings

First of all, attach one monitor to a GPU, and another monitor to the other GPU. Then go to the NVidia settings by typing sudo gksu nvidia-settings (the "sudo" is important as we'll change the configuration of the X-server). If the above command does not work, install gksu using sudo apt-get install gksu

Go to the X server Display configuration. Enable the screen connected to the second GPU. Activate the Xinerama setting as shown on the figure below :

Then you click on Save to X Configuration file which pops up the window :

Save and reboot. Before continuing, Ubuntu should show the display on both monitors.

Enabling IOMMU

First, enable IOMMU by modifying the GRUB config: sudo nano /etc/default/grub

and edit it to match:

GRUB_CMDLINE_LINUX_DEFAULT="amd_iommu=on iommu=pt kvm_amd.npt=1" if you run on an AMD CPU
GRUB_CMDLINE_LINUX_DEFAULT="quiet splash intel_iommu=on vfio_iommu_type1.allow_unsafe_interrupts=1 pcie_acs_override=downstream" if you run on an Intel CPU

Save (Ctrl+x -> Y -> Enter) Afterwards use: sudo update-grub and reboot your system.

Afterwards one can verify if iommu is enabled: dmesg |grep AMD-Vi (for AMD CPU) dmesg | grep -i iommu (for Intel CPU)

You should get this output for an AMD CPU :

adeye@adeye:~$ dmesg | grep AMD-Vi
[0.885677] AMD-Vi: IOMMU performance counters supported
[0.885727] AMD-Vi: IOMMU performance counters supported
[0.903346] AMD-Vi: Found IOMMU at 0000:00:00.2 cap 0x40
[0.903347] AMD-Vi: Extended features (0xf77ef22294ada):
[0.903352] AMD-Vi: Found IOMMU at 0000:40:00.2 cap 0x40
[0.903353] AMD-Vi: Extended features (0xf77ef22294ada):
[0.903356] AMD-Vi: Interrupt remapping enabled
[0.903357] AMD-Vi: virtual APIC enabled
[0.903695] AMD-Vi: Lazy IO/TLB flushing enabled

Identification of the guest GPU

To list all the IOMMU Groups and devices, run the following command:

find /sys/kernel/iommu_groups -type l

You should get an output of this type:

/sys/kernel/iommu_groups/7/devices/0000:00:15.1
/sys/kernel/iommu_groups/7/devices/0000:00:15.0
/sys/kernel/iommu_groups/15/devices/0000:03:00.0
/sys/kernel/iommu_groups/5/devices/0000:00:14.2
/sys/kernel/iommu_groups/5/devices/0000:00:14.0
/sys/kernel/iommu_groups/13/devices/0000:01:00.2
/sys/kernel/iommu_groups/13/devices/0000:01:00.0
/sys/kernel/iommu_groups/13/devices/0000:01:00.3
/sys/kernel/iommu_groups/13/devices/0000:01:00.1
/sys/kernel/iommu_groups/3/devices/0000:00:08.0
/sys/kernel/iommu_groups/11/devices/0000:00:1c.0
/sys/kernel/iommu_groups/1/devices/0000:00:01.0
/sys/kernel/iommu_groups/8/devices/0000:00:16.0
/sys/kernel/iommu_groups/16/devices/0000:04:00.0
/sys/kernel/iommu_groups/6/devices/0000:00:14.3
/sys/kernel/iommu_groups/14/devices/0000:02:00.2
/sys/kernel/iommu_groups/14/devices/0000:02:00.0
/sys/kernel/iommu_groups/14/devices/0000:02:00.3
/sys/kernel/iommu_groups/14/devices/0000:02:00.1
/sys/kernel/iommu_groups/4/devices/0000:00:12.0
/sys/kernel/iommu_groups/12/devices/0000:00:1f.0
/sys/kernel/iommu_groups/12/devices/0000:00:1f.5
/sys/kernel/iommu_groups/12/devices/0000:00:1f.3
/sys/kernel/iommu_groups/12/devices/0000:00:1f.4
/sys/kernel/iommu_groups/2/devices/0000:00:01.1
/sys/kernel/iommu_groups/10/devices/0000:00:1b.0
/sys/kernel/iommu_groups/0/devices/0000:00:00.0
/sys/kernel/iommu_groups/9/devices/0000:00:17.0

OR the following command to get information on the NVIDIA devices :

(for d in /sys/kernel/iommu_groups/*/devices/*; do n=${d#*/iommu_groups/*}; n=${n%%/*}; printf 'IOMMU Group %s ' "$n"; lspci -nns "${d##*/}"; done;) | grep NVIDIA

You should get an output of this type:

IOMMU Group 16 0a:00.0 VGA compatible controller [0300]: NVIDIA Corporation Device [10de:1e04] (rev a1)
IOMMU Group 16 0a:00.1 Audio device [0403]: NVIDIA Corporation Device [10de:10f7] (rev a1)
IOMMU Group 16 0a:00.2 USB controller [0c03]: NVIDIA Corporation Device [10de:1ad6] (rev a1)
IOMMU Group 16 0a:00.3 Serial bus controller [0c80]: NVIDIA Corporation Device [10de:1ad7] (rev a1)
IOMMU Group 34 42:00.0 VGA compatible controller [0300]: NVIDIA Corporation Device [10de:1e04] (rev a1)
IOMMU Group 34 42:00.1 Audio device [0403]: NVIDIA Corporation Device [10de:10f7] (rev a1)
IOMMU Group 34 42:00.2 USB controller [0c03]: NVIDIA Corporation Device [10de:1ad6] (rev a1)
IOMMU Group 34 42:00.3 Serial bus controller [0c80]: NVIDIA Corporation Device [10de:1ad7] (rev a1)

I would recommend to store this output into a text file, so you won't have to run this command multiple times.

Every device group must be passed through together. A passthrough of only one of the devices will not work. Each GPU typically also has an audio device associated with it that must also "passed through". In the latest NVidia GPUs, like the RTX2000 series, you will also see a USB bus manager. That is due to the USB-c port on the GPU.

If the iommu grouping is not successful we need to apply acs patch. Click here for instructions

Isolation of the selected GPU

Create a file /etc/initramfs-tools/scripts/init-top/vfio.sh with the following content:

for dev in 0000:XX:XX.X 0000:XX:XX.X
do
    echo "vfio-pci" > /sys/bus/pci/devices/$dev/driver_override
    echo "$dev" > /sys/bus/pci/drivers/vfio-pci/bind
done

Remark : Replace XX onto actual numbers

make the vfio.sh file executable by running this command :

sudo chmod +x /etc/initramfs-tools/scripts/init-top/vfio.sh

Create file /etc/modprobe.d/vfio.conf with the following content:

options kvm_amd avic=1

Create file /etc/modprobe.d/nvidia.conf with the following content:

blacklist nouveau
blacklist lbm-nouveau
options nouveau modeset=0
alias nouveau off
alias lbm_nouveau off

softdep nvidia-* pre: vfio-pci
softdep nvidia_* pre: vfio-pci
softdep nvidia pre: vfio-pci

Run sudo update-initramfs -k all -u to update your boot image. Reboot.

In case of success, only one of your GPUs will be capable of showing you the login screen, the one that you have not overridden. Else, both screens display information as before.

Run lspci -nk to confirm the isolation. You might get an output as this one:

0a:00.0 0300: 10de:1e04 (rev a1)
	Subsystem: 1462:3711
	Kernel driver in use: vfio-pci
	Kernel modules: nvidiafb, nouveau, nvidia_drm, nvidia
0a:00.1 0403: 10de:10f7 (rev a1)
	Subsystem: 1462:3711
	Kernel driver in use: vfio-pci
	Kernel modules: snd_hda_intel
0a:00.2 0c03: 10de:1ad6 (rev a1)
	Subsystem: 1462:3711
	Kernel driver in use: vfio-pci
0a:00.3 0c80: 10de:1ad7 (rev a1)
	Subsystem: 1462:3711
	Kernel driver in use: vfio-pci

If the isolation is successful, the Kernel driver in use for the isolated GPU is vfio-pci. A failure will show the NVIDIA/nouveau module in use which will mean that you have to debug what went wrong.

Reboot until the command lspci -nnk give you this result (pay attention to Kernel driver) :

0a:00.0 VGA compatible controller [0300]: NVIDIA Corporation Device
[10de:1e04] (rev a1)
	Subsystem: Micro-Star International Co., Ltd. [MSI] Device [1462:3711]
	Kernel driver in use: vfio-pci
	Kernel modules: nvidiafb, nouveau, nvidia_drm, nvidia
0a:00.1 Audio device [0403]: NVIDIA Corporation Device [10de:10f7] (rev a1)
	Subsystem: Micro-Star International Co., Ltd. [MSI] Device [1462:3711]
	Kernel driver in use: vfio-pci
	Kernel modules: snd_hda_intel
0a:00.2 USB controller [0c03]: NVIDIA Corporation Device [10de:1ad6] (rev
a1)
	Subsystem: Micro-Star International Co., Ltd. [MSI] Device [1462:3711]
	Kernel driver in use: xhci_hcd
0a:00.3 Serial bus controller [0c80]: NVIDIA Corporation Device [10de:1ad7]
(rev a1)
	Subsystem: Micro-Star International Co., Ltd. [MSI] Device [1462:3711]
	Kernel driver in use: vfio-pci

Virtual Machine

Prerequisites

Before starting, install the virtualization manager and related software via: sudo apt-get install qemu-kvm libvirt-bin libvirt-daemon-system bridge-utils virt-manager ovmf hugepages

Firstly, if you do not have an administrator who has a licence for Matlab and Prescan, you can just follow the Cloning the VM tutorial and skip the next steps. Cloning-the-VM.

You must download Windows ISO file that will be used later on during the setup: ISO files

Setup of the VM

Open Virt Manager.

We'll first do a VM that will be temporary.

Create a new VM
Select Local install media, and then Forward
Browse the Windows ISO file and select Choose Volume and move Forward
Enter Memory RAM: 32768 MiB, and CPUs: 8, and go Forward
Select Create a disk image with 120 GiB, then Forward (you can name the VM as you want)
Enable Customize configuration before you install, then Finish
In Overview, let BIOS instead of UEFI
In CPUs: In configuration, select core2duo then Apply
In Memory, edit Current allocation to 32136 MiB (which in our case is half the RAM of the computer), then Apply
In IDE Disk 1, change Disk bus to Virtio
In IDE CDROM 1, browse the Windows ISO file (downloaded from Onedrive), then click on Choose Volume
Select Add hardware, go to storage: ● Click on Select or create custom storage, then click on Manage and browse the stable VirtIO file, then click on Choose Volume ● Device type: select CDROM device and then click on Finish.

Remark: Leave all network settings as is (no need to setup a bridge network in our case). Chose e1000

After finishing the previous steps, go to the Boot options, click on VirtIO Disk 1(IDE disk1 before step 10), IDE CDROM 1, and IDE CDROM 2 to enable them. Afterwards, put it in the following order: IDE CDROM 1 > IDE Disk 1 > IDE CDROM 2. Then click on Apply.

Select Add hardware: go to the PCI host device and add the NVIDIA PCIs one by one (you can find them with the IDs used during the Isolation of the selected GPU).

Make sure to remove the internet connection (by deleting the NIC hardware) to make the VM unable to connect to the internet. This will be useful during the Windows installation for it not to ask for a loop.

VM Windows installation

Click on Begin installation Then follow the steps until windows boots on desktop screen :

Click on Load Driver, then Browse. Open CD Drive (E:) virtio -> viostor -> w10.

Click amd64, then OK and then next. From this moment onwards, windows recognizes the partition/drive allocated in the settings (the screenshot is old, as we first tried with 100 Gb, but then had to reset again the VM):

Click on next. Follow the steps for Windows 10 Home installation. Select No for tools asked for installation and make sure to select the Basic installation. Once you've booted in Windows, make sure to re-add the NIC hardware, this should get the internet back.

Then, go to device manager and install the missing drivers from the VirtIO CDROM by following these steps:

You get error code 43 for your GPU, but it is normal. Shut down the Windows VM. This error can occur when the VM detects that the NVIDIA drivers are running in a virtual environment.

Execute the following command to copy the config file of the VM you just set up:

virsh dumpxml temp_VM > New_VM

Replace temp_VM by the name of the temporary VM and New_VM by the name you want for the new VM.

Modify this new xml file with gedit to replace the 3 first lines by these ones :

<domain type='kvm' xmlns:qemu='http://libvirt.org/schemas/domain/qemu/1.0'>
<name>NAME_OF_THE_NEW_VM</name>
<title>NAME_OF_THE_NEW_VM</title>

Replace NAME_OF_THE_NEW_VM by the name given for the new VM. By this way the line defining the uuid of the VM is deleted.

Then copy the following lines between </vcpu> and <os> :

<qemu:commandline>
<qemu:arg value='-cpu'/> 
<qemu:arg value='host,kvm=off,hv_relaxed,hv_spinlocks=0x1fff,hv_vapic,hv_time,hv_vendor_id=whatever'/>
</qemu:commandline>

To have better performance, we'll use hugepages. This feature is enabled by adding the following lines just after the previous qemu:commandline ones:

<memoryBacking>
  <hugepages/>
</memoryBacking>

So the beginning of the xml file looks like:

Important : if you see a path between <nvram> and </nvram> then something went wrong during the installation (Make sure you choose BIOS instead of UEFI as a firmware setting in overview). If you do not find tag, then you can save your changes and carry on.

Execute : virsh define NewXMLfile (We define a new VM from the file we just modified). The output is:

adeye@adeye06u:~$ virsh define New_VM
Domain New_VM defined from New_VM

To ensure that the changes are taken into account, close VirtManager and run sudo systemctl restart libvirtd.

Reboot VirtManager and launch your VM. It should boot on the secondary screen. Make sure Windows detect and use the assign GPU (the one you isolated). If it´s the case then you are almost done with the One Computer Setup!!

Once it has booted on the secondary screen, right click on the Desktop, select Display Settings. Click identify to see the number of each screen and in Multiple Displays select Show only on [the number of the screen on the Ubuntu screen].

Shutdown the VM and assign half the RAM to the VM. For the CPU, check Copy host CPU configuration and manually set CPU topology. For each the RAM and the CPU, in the current allocation, set the maximal allocation.

In windows, make sure it uses the hardware you gave to the VM.

You are done !! Well played !! 🥇

In case of blue screen of death at boot...

If you have a blue screen at startup of the VM, exectute the following command lines :

echo 1 > /sys/module/kvm/parameters/ignore_msrs (root access required)

Then create a .conf file in /etc/modprobe.d/ (for example kvm.conf) that includes the line : options kvm ignore_msrs=1

Set the CPU configuration to "host-passthrough" (enter it by hand as it doesn't exist in the list) in virt Manager.

Also in virt-manager, make sure that the display is configured as "Display VNC" (type: VNC server and Adress: Localhost only).

Once it has booted on the secondary screen, shutdown the VM and assign half the RAM to the VM. For the CPU, check Copy host CPU configuration and manually set CPU topology. In my case, with a 32 threads CPU, I gave 1 socket 8 cores and 2 threads. For each the RAM and and the CPU, in the current allocation, set the maximal allocation (16 cores and 32GB RAM in my case). In windows, make sure it uses the hardware you gave to the VM.

🎊 Congratulations, the setup is finished 🎊