One Computer Setup

This guide provides the steps to establish a One Computer Setup, which consists of doing a GPU passthrough (or PCI passthrough) and setup a Virtual Machine that will run on the isolated GPU.

Essentially, with the PCI passthrough, one of the GPUs is isolated from the NVIDIA driver and a dummy driver is loaded there instead.

For its part, the VM allows to have 2 OS running on the same computer at the same moment, with great graphic performance (it is not always the case with standard virtual machines without GPU passthrough).

WARNING : Before attempting anything further, we highly recommend to read this guide in its entirety and the links in the references section.

Specifications

This guide has been tested with the following machine:

AMD Ryzen Threadripper 2950X 16-Core
64 GB of RAM
2x NVidia RTX 2080Ti
Ubuntu 16.04
Windows 10 for the virtual Machine

GPU Isolation

Before doing anything, update BIOS to latest available version.

Then, in BIOS :

Disable all raid configuration (in Advanced -> AMD PBS)
Enable Enumarate all IOMMU in IVRS (in Advanced -> AMD PBS)
Turn on VT-d / SVM Mode (in advanced -> CPU Configuration)

Enabling IOMMU

First, enable IOMMU by modifying the GRUB config: sudo nano /etc/default/grub

and edit it to match:

GRUB_CMDLINE_LINUX_DEFAULT="amd_iommu=on iommu=pt kvm_amd.npt=1" if you run on an AMD CPU
GRUB_CMDLINE_LINUX_DEFAULT="quiet splash intel_iommu=on vfio_iommu_type1.allow_unsafe_interrupts=1 pcie_acs_override=downstream" if you run on an Intel CPU

Save (Ctrl+x -> Y -> Enter) Afterwards use: sudo update-grub and reboot your system.

Afterwards one can verify if iommu is enabled: dmesg |grep AMD-Vi (for AMD CPU) dmesg | grep -i iommu (for Intel CPU)

You should get this output for an AMD CPU :

adeye@adeye:~$ dmesg | grep AMD-Vi
[0.885677] AMD-Vi: IOMMU performance counters supported
[0.885727] AMD-Vi: IOMMU performance counters supported
[0.903346] AMD-Vi: Found IOMMU at 0000:00:00.2 cap 0x40
[0.903347] AMD-Vi: Extended features (0xf77ef22294ada):
[0.903352] AMD-Vi: Found IOMMU at 0000:40:00.2 cap 0x40
[0.903353] AMD-Vi: Extended features (0xf77ef22294ada):
[0.903356] AMD-Vi: Interrupt remapping enabled
[0.903357] AMD-Vi: virtual APIC enabled
[0.903695] AMD-Vi: Lazy IO/TLB flushing enabled

Identification of the guest GPU

To list all the IOMMU Groups and devices, run the following command:

find /sys/kernel/iommu_groups -type l

You should get an output of this type:

/sys/kernel/iommu_groups/7/devices/0000:00:15.1
/sys/kernel/iommu_groups/7/devices/0000:00:15.0
/sys/kernel/iommu_groups/15/devices/0000:03:00.0
/sys/kernel/iommu_groups/5/devices/0000:00:14.2
/sys/kernel/iommu_groups/5/devices/0000:00:14.0
/sys/kernel/iommu_groups/13/devices/0000:01:00.2
/sys/kernel/iommu_groups/13/devices/0000:01:00.0
/sys/kernel/iommu_groups/13/devices/0000:01:00.3
/sys/kernel/iommu_groups/13/devices/0000:01:00.1
/sys/kernel/iommu_groups/3/devices/0000:00:08.0
/sys/kernel/iommu_groups/11/devices/0000:00:1c.0
/sys/kernel/iommu_groups/1/devices/0000:00:01.0
/sys/kernel/iommu_groups/8/devices/0000:00:16.0
/sys/kernel/iommu_groups/16/devices/0000:04:00.0
/sys/kernel/iommu_groups/6/devices/0000:00:14.3
/sys/kernel/iommu_groups/14/devices/0000:02:00.2
/sys/kernel/iommu_groups/14/devices/0000:02:00.0
/sys/kernel/iommu_groups/14/devices/0000:02:00.3
/sys/kernel/iommu_groups/14/devices/0000:02:00.1

OR the following command to get information on the NVIDIA devices :

(for d in /sys/kernel/iommu_groups/*/devices/*; do n=${d#*/iommu_groups/*}; n=${n%%/*}; printf 'IOMMU Group %s ' "$n"; lspci -nns "${d##*/}"; done;) | grep NVIDIA

You should get an output of this type:

IOMMU Group 16 0a:00.0 VGA compatible controller [0300]: NVIDIA Corporation Device [10de:1e04] (rev a1)
IOMMU Group 16 0a:00.1 Audio device [0403]: NVIDIA Corporation Device [10de:10f7] (rev a1)
IOMMU Group 16 0a:00.2 USB controller [0c03]: NVIDIA Corporation Device [10de:1ad6] (rev a1)
IOMMU Group 16 0a:00.3 Serial bus controller [0c80]: NVIDIA Corporation Device [10de:1ad7] (rev a1)
IOMMU Group 34 42:00.0 VGA compatible controller [0300]: NVIDIA Corporation Device [10de:1e04] (rev a1)
IOMMU Group 34 42:00.1 Audio device [0403]: NVIDIA Corporation Device [10de:10f7] (rev a1)
IOMMU Group 34 42:00.2 USB controller [0c03]: NVIDIA Corporation Device [10de:1ad6] (rev a1)
IOMMU Group 34 42:00.3 Serial bus controller [0c80]: NVIDIA Corporation Device [10de:1ad7] (rev a1)

I would recommend to store this output into a text file, so you won't have to run this command multiple times.

Every device group must be passed through together. A passthrough of only one of the devices will not work. Each GPU typically also has an audio device associated with it that must also "passed through". In the latest NVidia GPUs, like the RTX2000 series, you will also see a USB bus manager. That is due to the USB-c port on the GPU.

If the iommu grouping is not successful we need to apply acs patch. Click here for instructions

Isolation of the selected GPU

Create the file vfio-pci-override-vga.sh which shall be placed in the /sbin folder:

cd /sbin/
sudo gedit vfio-pci-override-vga.sh

Add these lines to the file (which contain overrides the vfio-pci kernel module to isolated GPU):

#!/bin/sh
modprobe -r -v nouveau
modprobe -r -v nvidia
echo "vfio-pci" > /sys/bus/pci/devices/0000:0a:00.0/driver_override
echo "vfio-pci" > /sys/bus/pci/devices/0000:0a:00.1/driver_override
echo "vfio-pci" > /sys/bus/pci/devices/0000:0a:00.2/driver_override
echo "vfio-pci" > /sys/bus/pci/devices/0000:0a:00.3/driver_override
modprobe -i -v vfio-pci
modprobe -i -v nvidia

Remark : the first GPU in the IOMMU Groups is not necessarely the first on the motherboard. BE CAREFUL TO CHANGE THE IDS TO MATCH ISOLATED GPU.

Create file /etc/modprobe.d/vfio.conf

sudo gedit /etc/modprobe.d/vfio.conf

Add the following:

##merged blacklists. current config chooses nvidia drivers.
##rmmod and modprobe -i do not work here. move kernel module commands to the installed .sh file
blacklist nouveau
blacklist lbm-nouveau
options nouveau modeset=0
alias nouveau off
alias lbm_nouveau off
#softdep nouveau pre: vfio-pci
#softdep nvidia pre: nouveau ##check if this line or softdeps are required at all.
install vfio-pci /sbin/vfio-pci-override-vga.sh

make the .sh file executable by running this command :

sudo chmod u+x /sbin/vfio-pci-override-vga.sh

Create separately the file /etc/modprobe.d/nvidia.conf which contains :

softdep nouveau pre: vfio-pci
softdep nvidia pre: vfio-pci
softdep nvidia-* pre: vfio-pci
softdep nvidia_* pre: vfio-pci

Run sudo update-initramfs -u to update your boot image. Reboot.

In case of success, only one of your GPUs will be capable of showing you the login screen, the one that you have not overridden. Else, both screens display information as before.

Run lspci -nk to confirm the isolation. You might get an output as this one:

0a:00.0 0300: 10de:1e04 (rev a1)
	Subsystem: 1462:3711
	Kernel driver in use: vfio-pci
	Kernel modules: nvidiafb, nouveau, nvidia_drm, nvidia
0a:00.1 0403: 10de:10f7 (rev a1)
	Subsystem: 1462:3711
	Kernel driver in use: vfio-pci
	Kernel modules: snd_hda_intel
0a:00.2 0c03: 10de:1ad6 (rev a1)
	Subsystem: 1462:3711
	Kernel driver in use: vfio-pci
0a:00.3 0c80: 10de:1ad7 (rev a1)
	Subsystem: 1462:3711
	Kernel driver in use: vfio-pci

If the isolation is successful, the Kernel driver in use for the isolated GPU is vfio-pci. A failure will show the NVIDIA/nouveau module in use which will mean that you have to debug what went wrong.

Reboot until the command lspci -nnk give you this result (pay attention to Kernel driver) :

0a:00.0 VGA compatible controller [0300]: NVIDIA Corporation Device
[10de:1e04] (rev a1)
	Subsystem: Micro-Star International Co., Ltd. [MSI] Device [1462:3711]
	Kernel driver in use: vfio-pci
	Kernel modules: nvidiafb, nouveau, nvidia_drm, nvidia
0a:00.1 Audio device [0403]: NVIDIA Corporation Device [10de:10f7] (rev a1)
	Subsystem: Micro-Star International Co., Ltd. [MSI] Device [1462:3711]
	Kernel driver in use: vfio-pci
	Kernel modules: snd_hda_intel
0a:00.2 USB controller [0c03]: NVIDIA Corporation Device [10de:1ad6] (rev
a1)
	Subsystem: Micro-Star International Co., Ltd. [MSI] Device [1462:3711]
	Kernel driver in use: xhci_hcd
0a:00.3 Serial bus controller [0c80]: NVIDIA Corporation Device [10de:1ad7]
(rev a1)
	Subsystem: Micro-Star International Co., Ltd. [MSI] Device [1462:3711]
	Kernel driver in use: vfio-pci

Virtual Machine

Prerequisites

Before starting, install the virtualization manager and related software via: sudo apt-get install qemu-kvm libvirt-bin libvirt-daemon-system bridge-utils virt-manager ovmf hugepages

Reboot computer after installation: reboot

Firstly, if you do not have an administrator who has a licence for Matlab and Prescan, you can just follow the Cloning the VM tutorial and skip the next steps. Cloning-the-VM.

You must download 2 ISO files that will be used later on during the setup :

Download the stable VirtIO: https://docs.fedoraproject.org/en-US/quick-docs/creating-windows-virtual-machines-using-virtio-drivers/index.html
The Windows ISO file has to be downloaded from OneDrive.

Setup of the VM

Open Virt Manager.

We'll first do a VM that will be temporary.

Create a new VM with name Win10
Select Local install media, and then Forward
Browse the Windows ISO file and select Choose Volume and move Forward
Enter Memory RAM: 4096 MiB, and CPUs: 2, and go Forward
Select Create a disk image with 150 GiB, then Forward (you can name the VM as you want)
Enable Customize configuration before you install, then Finish
In Overview, let BIOS instead of UEFI
In CPUs: In configuration, select core2duo then Apply
In Memory, edit Current allocation to 32136 MiB (which in our case is half the RAM of the computer), then Apply
In IDE Disk 1, change Disk bus to Virtio
In IDE CDROM 1, browse the Windows ISO file (downloaded from Onedrive), then click on Choose Volume
Select Add hardware, go to storage: ● Click on Select or create custom storage, then click on Manage and browse the stable VirtIO file, then click on Choose Volume ● Device type: select CDROM device and then click on Finish.

Remark: Leave all network settings as is (no need to setup a bridge network in our case).

After finishing the previous steps, go to the Boot options, click on VirtIO Disk 1(IDE disk1 before step 10), IDE CDROM 1, and IDE CDROM 2 to enable them. Afterwards, put it in the following order: IDE CDROM 1 > VirtIO Disk 1 > IDE CDROM 2. Then click on Apply.

Select Add hardware: go to the PCI host device and add the NVIDIA PCIs one by one (you can find them with the IDs used during the Isolation of the selected GPU).

Make sure to remove the internet connection (by deleting the NIC hardware) to make the VM unable to connect to the internet. This will be useful during the Windows installation for it not to ask for a loop.

VM Windows installation

Click on Begin installation Then follow the steps until windows boots on desktop screen :

Click on Load Driver, then Browse. Open CD Drive (E:) virtio -> viostor -> w10.

Click amd64, then OK and then next. From this moment onwards, windows recognizes the partition/drive allocated in the settings (the screenshot is old, as we first tried with 100 Gb, but then had to reset again the VM):

Click on next. Follow the steps for Windows 10 Home installation. Select No for tools asked for installation and make sure to select the Basic installation. Once you've booted in Windows, make sure to re-add the NIC hardware, this should get the internet back. Choose e1000 as the Device model:

Then, go to device manager and install the missing drivers from the VirtIO CDROM by following these steps:

You get error code 43 for your GPU, but it is normal. Shut down the Windows VM. This error can occur when the VM detects that the NVIDIA drivers are running in a virtual environment.

Open the VM configuration with

virsh edit Win10

NOTE: You need to make all changes in the file before saving it (otherwise, some changes may not be stored properly).

Modify the xml file to replace the 3 first lines by these ones:

<domain type='kvm' xmlns:qemu='http://libvirt.org/schemas/domain/qemu/1.0'>
<name>NAME_OF_THE_NEW_VM</name>
<title>NAME_OF_THE_NEW_VM</title>

Then copy the following lines between </vcpu> and <os> :

<qemu:commandline>
<qemu:arg value='-cpu'/> 
<qemu:arg value='host,kvm=off,hv_relaxed,hv_spinlocks=0x1fff,hv_vapic,hv_time,hv_vendor_id=whatever'/>
</qemu:commandline>

To have better performance, we'll use hugepages. This feature is enabled by adding the following lines just before the previous qemu:commandline ones:

<memoryBacking>
  <hugepages/>
</memoryBacking>

So the beginning of the xml file looks like:

Important : if you see a path between <nvram> and </nvram> then something went wrong during the installation (Make sure you choose BIOS instead of UEFI as a firmware setting in overview). If you do not find tag, then you can save your changes and carry on.

Open file /etc/libvirt/qemu/Win10.xml in a text editor and make sure the lines, you've just added, are there.

To ensure that the changes are taken into account, close VirtManager and run sudo systemctl restart libvirtd.

Reboot VirtManager and launch your VM. It should boot on the secondary screen. Make sure Windows detect and use the assign GPU (the one you isolated). If it´s the case then you are almost done with the One Computer Setup!!

Once it has booted on the secondary screen, right click on the Desktop, select Display Settings. Click identify to see the number of each screen and in Multiple Displays select Show only on [the number of the screen on the Ubuntu screen].

Shutdown the VM and assign half the RAM to the VM. For the CPU, check Copy host CPU configuration and manually set CPU topology. For each the RAM and the CPU, in the current allocation, set the maximal allocation.

In windows, make sure it uses the hardware you gave to the VM.

You are done !! Well played !! 🥇

In case of blue screen of death at boot...

If you have a blue screen at startup of the VM, exectute the following command lines :

echo 1 > /sys/module/kvm/parameters/ignore_msrs (root access required)

Then create a .conf file in /etc/modprobe.d/ (for example kvm.conf) that includes the line : options kvm ignore_msrs=1

Set the CPU configuration to "host-passthrough" (enter it by hand as it doesn't exist in the list) in virt Manager.

Also in virt-manager, make sure that the display is configured as "Display VNC" (type: VNC server and Adress: Localhost only).

Once it has booted on the secondary screen, shutdown the VM and assign half the RAM to the VM. For the CPU, check Copy host CPU configuration and manually set CPU topology. In my case, with a 32 threads CPU, I gave 1 socket 8 cores and 2 threads. For each the RAM and and the CPU, in the current allocation, set the maximal allocation (16 cores and 32GB RAM in my case). In windows, make sure it uses the hardware you gave to the VM.

🎊 Congratulations, the setup is finished 🎊