Cloud hypervisor with spdk - AmpereComputing/ampere-lts-kernel---DEPRECATED GitHub Wiki

Ref:vfio-user

When testing CH vfio-user using the spdk virtual NVMe controller (nvmf/vfio-user) (https://github.com/nutanix/libvfio-user/blob/master/docs/spdk.md), we met two issues:

Background

Why CH try to open 512 file descriptors for vfio-user nvme device? refer to: https://github.com/spdk/spdk/commit/a9ff16810793db8ebf235966770a50825bbec6a1. CH try to create an eventfd for each NVME msix irq. SPDK sets NVME_IRQ_MSIX_NUM to 512 (0x1000/8)

We proposed a PR to set ulimit in CH code (https://github.com/cloud-hypervisor/cloud-hypervisor/pull/5545), it was rejected. CH prefers to set ulimit in shell before start CH.

Setup spdk:

Source

git clone https://github.com/spdk/spdk
git submodule update --init
./configure --with-vfio-user
make -j

Create nvme device

scripts/setup.sh
rm ~/images/test-disk.raw
truncate ~/images/test-disk.raw -s 128M
mkfs.ext4  ~/images/test-disk.raw
killall ./build/bin/nvmf_tgt
ulimit -n 2048
./build/bin/nvmf_tgt -i 0 -e 0xFFFF -m 0x1 &
sleep 2
./scripts/rpc.py nvmf_create_transport -t VFIOUSER
rm -rf /var/run
mkdir -p /var/run
./scripts/rpc.py bdev_aio_create ~/images/test-disk.raw test 4096                #or 512
./scripts/rpc.py nvmf_create_subsystem nqn.2019-07.io.spdk:cnode -a -s test
./scripts/rpc.py nvmf_subsystem_add_ns nqn.2019-07.io.spdk:cnode test
./scripts/rpc.py nvmf_subsystem_add_listener nqn.2019-07.io.spdk:cnode -t VFIOUSER -a /var/run -s 0
chown $USER.$USER -R /var/run

Error

EAL: No free 2048 kB hugepages reported on node 1

Solution:

echo 6144 > /proc/sys/vm/nr_hugepages

Config spdk with api-socket

cp /root/workloads/jammy-server-cloudimg-arm64-custom-20220329-0.raw /root/workloads/osdisk.img
rm -rf /root/workloads/cloud-hypervisor.sock

"target/aarch64-unknown-linux-gnu/release/cloud-hypervisor" \
    "--api-socket" "/root/workloads/cloud-hypervisor.sock" \
    "--cpus" "boot=1" "--memory" "size=512M,shared=on,hugepages=on"  \
    "--kernel" "/root/workloads/CLOUDHV_EFI.fd" \
    "--serial" "tty" "--console" "off" \
    "--disk" "path=/root/workloads/osdisk.img" \
    "--disk" "path=/root/workloads/cloudinit" \
    "--net" "tap=,mac=12:34:56:78:90:01,ip=192.168.1.1,mask=255.255.255.0" \
    "-v" --log-file log.txt
target/aarch64-unknown-linux-gnu/release/ch-remote --api-socket /root/workloads/cloud-hypervisor.sock add-user-device socket=/var/run/cntrl,id=vfio_user0

VM crash:

cloud-hypervisor: 40.332077s: <vmm> INFO:vmm/src/lib.rs:1858 -- API request event: VmAddUserDevice(UserDeviceConfig { socket: "/var/run/cntrl", id: Some("vfio_user0"),            pci_segment: 0 }, Sender { .. })
cloud-hypervisor: 40.353919s: <vmm> WARN:/root/.cargo/git/checkouts/vfio-user-1ee9f6371fec66a1/eef6bec/src/lib.rs:645 -- Ignoring unsupported vfio region capability (id = '2')
cloud-hypervisor: 40.481922s: <vmm> INFO:pci/src/vfio_user.rs:329 -- Enabling IRQ 0 number of fds = 1
cloud-hypervisor: 40.611353s: <vcpu0> INFO:pci/src/configuration.rs:984 -- Detected BAR reprogramming: (BAR 4) 0x2fffe000->0x10100000
cloud-hypervisor: 40.611490s: <vcpu0> INFO:pci/src/vfio_user.rs:451 -- Moving BAR 0x2fffe000 -> 0x10100000
cloud-hypervisor: 40.611756s: <vcpu0> INFO:pci/src/vfio_user.rs:492 -- Moved bar 0x2fffe000 -> 0x10100000
cloud-hypervisor: 40.611852s: <vcpu0> INFO:pci/src/configuration.rs:984 -- Detected BAR reprogramming: (BAR 8) 0x2fffc000->0x10102000
cloud-hypervisor: 40.611928s: <vcpu0> INFO:pci/src/vfio_user.rs:451 -- Moving BAR 0x2fffc000 -> 0x10102000
cloud-hypervisor: 40.611988s: <vcpu0> INFO:pci/src/vfio_user.rs:492 -- Moved bar 0x2fffc000 -> 0x10102000
cloud-hypervisor: 40.612072s: <vcpu0> INFO:pci/src/configuration.rs:984 -- Detected BAR reprogramming: (BAR 9) 0x2fffb000->0x10104000
cloud-hypervisor: 40.612141s: <vcpu0> INFO:pci/src/vfio_user.rs:451 -- Moving BAR 0x2fffb000 -> 0x10104000
cloud-hypervisor: 40.612200s: <vcpu0> INFO:pci/src/vfio_user.rs:492 -- Moved bar 0x2fffb000 -> 0x10104000
cloud-hypervisor: 40.690917s: <vcpu0> INFO:pci/src/vfio_user.rs:381 -- Unmasking IRQ 0
cloud-hypervisor: 40.693925s: <vcpu0> INFO:pci/src/vfio_user.rs:366 -- Disabling IRQ 0
cloud-hypervisor: 40.696282s: <vcpu0> ERROR:vmm/src/cpu.rs:1071 -- vCPU thread panicked
cloud-hypervisor: 40.696407s: <vmm> INFO:vmm/src/lib.rs:1828 -- VM exit event

Solution:ulimit -n 2048

VM works well:

cloud-hypervisor: 38.183822s: <vmm> INFO:vmm/src/lib.rs:1858 -- API request event: VmAddUserDevice(UserDeviceConfig { socket: "/var/run/cntrl", id: Some("vfio_user0"),            pci_segment: 0 }, Sender { .. })
cloud-hypervisor: 38.199257s: <vmm> WARN:/root/.cargo/git/checkouts/vfio-user-1ee9f6371fec66a1/eef6bec/src/lib.rs:645 -- Ignoring unsupported vfio region capability (id = '2')
cloud-hypervisor: 38.323265s: <vmm> INFO:pci/src/vfio_user.rs:329 -- Enabling IRQ 0 number of fds = 1
cloud-hypervisor: 38.457331s: <vcpu0> INFO:pci/src/configuration.rs:984 -- Detected BAR reprogramming: (BAR 4) 0x2fffe000->0x10100000
cloud-hypervisor: 38.457450s: <vcpu0> INFO:pci/src/vfio_user.rs:451 -- Moving BAR 0x2fffe000 -> 0x10100000
cloud-hypervisor: 38.457688s: <vcpu0> INFO:pci/src/vfio_user.rs:492 -- Moved bar 0x2fffe000 -> 0x10100000
cloud-hypervisor: 38.457782s: <vcpu0> INFO:pci/src/configuration.rs:984 -- Detected BAR reprogramming: (BAR 8) 0x2fffc000->0x10102000
cloud-hypervisor: 38.457857s: <vcpu0> INFO:pci/src/vfio_user.rs:451 -- Moving BAR 0x2fffc000 -> 0x10102000
cloud-hypervisor: 38.457917s: <vcpu0> INFO:pci/src/vfio_user.rs:492 -- Moved bar 0x2fffc000 -> 0x10102000
cloud-hypervisor: 38.458001s: <vcpu0> INFO:pci/src/configuration.rs:984 -- Detected BAR reprogramming: (BAR 9) 0x2fffb000->0x10104000
cloud-hypervisor: 38.458071s: <vcpu0> INFO:pci/src/vfio_user.rs:451 -- Moving BAR 0x2fffb000 -> 0x10104000
cloud-hypervisor: 38.458131s: <vcpu0> INFO:pci/src/vfio_user.rs:492 -- Moved bar 0x2fffb000 -> 0x10104000
cloud-hypervisor: 38.513253s: <vcpu0> INFO:pci/src/vfio_user.rs:381 -- Unmasking IRQ 0
cloud-hypervisor: 38.516263s: <vcpu0> INFO:pci/src/vfio_user.rs:366 -- Disabling IRQ 0
cloud-hypervisor: 38.560353s: <vcpu0> INFO:pci/src/vfio_user.rs:329 -- Enabling IRQ 2 number of fds = 512
cloud-hypervisor: 38.691981s: <vcpu0> INFO:pci/src/vfio_user.rs:366 -- Disabling IRQ 2
cloud-hypervisor: 38.692697s: <vcpu0> INFO:pci/src/vfio_user.rs:329 -- Enabling IRQ 0 number of fds = 1
cloud-hypervisor: 38.698262s: <vcpu0> INFO:pci/src/vfio_user.rs:366 -- Disabling IRQ 0
cloud-hypervisor: 38.699755s: <vcpu0> INFO:pci/src/vfio_user.rs:329 -- Enabling IRQ 2 number of fds = 512

Start VM with spdk

"target/aarch64-unknown-linux-gnu/release/cloud-hypervisor" \
    "--api-socket" "$path/cloud-hypervisor.sock" \
    "--cpus" "boot=1" \
    "--memory" "size=1G,shared=on" \
    "--kernel" "$path/Image" --cmdline "root=/dev/vda1 console=hvc0" \
    "--disk" "path=$path/osdisk.img" \
    "--disk" "path=$path/cloudinit" \
    "--net" "tap=,mac=12:34:56:78:90:01,ip=192.168.1.1,mask=255.255.255.0" \
    --user-device socket=/var/run/cntrl

Start VM fail:

thread 'vcpu0' panicked at 'Failed cloning interrupt's EventFd: Os { code: 24, kind: Uncategorized, message: "Too many open files" }', vmm/src/interrupt.rs:82:18

Solution:ulimit -n 2048

VM works well:

[    0.813704] nvme nvme0: pci function 0000:00:06.0
[    1.180360] input: gpio-keys as /devices/platform/gpio-keys/input/input0
[    1.307008] nvme nvme0: 1/0/0 default/read/poll queues
[    1.309669] EXT4-fs (vda1): mounted filesystem 24394335-eeb5-4731-9e62-6255dd5a3712 with ordered data mode. Quota mode: disabled.
[    1.309805] VFS: Mounted root (ext4 filesystem) readonly on device 254:1.
[    1.309953] devtmpfs: mounted
[    1.310284] Freeing unused kernel memory: 1728K
cloud-hypervisor: 1.646859s: <vcpu0> WARN:devices/src/legacy/uart_pl011.rs:358 -- [Debug I/O port: Kernel code: 0x41] 1.1643821 seconds
[    1.319101] Run /sbin/init as init process
[    1.346630] systemd[1]: systemd 249.11-0ubuntu2 running in system mode (+PAM +AUDIT +SELINUX +APPARMOR +IMA +SMACK +SECCOMP +GCRYPT +GNUTLS -OPENSSL +ACL +BLKID +CURL +ELFUTILS -FIDO2 +IDN2 -IDN +IPTC +KMOD +LIBCRYPTSETUP -LIBFDISK +PCRE2 -PWQUALITY -P11KIT -QRENCODE +BZIP2 +LZ4 +XZ +ZLIB +ZSTD -XKBCOMMON +UTMP +SYSVINIT default-hierarchy=unified)
[    1.346972] systemd[1]: Detected architecture arm64.

How to verify vfio_user

ls /dev/nvme0n1
mkdir nvme
mount /dev/nvme0n1 nvme
cd nvme
echo "hello world" > test.txt
cat test.txt
cd ..
umount nvme

Some error:

umount: /home/cloud/nvme: target is busy.

Solution:

lsof nvme
kill pid or exit the folder

More info:

Issue:

aarch64: Booting with NVMe vfio-user device fails

Patch:

This is something that the sysadmin should change e.g by editing /etc/security/limits.conf - not something that the process should change.

vfio_user: Fix the issue of 'Too many open files' when start VM with spdk

CONFIG_ARM64_64K_PAGES=y

Setup spdk fail:

sudo ./scripts/rpc.py nvmf_subsystem_add_listener nqn.2019-07.io.spdk:cnode -t VFIOUSER -a /var/run -s 0
[2023-06-30 14:15:05.548885] vfio_user.c:4454:nvmf_vfio_user_listen: *ERROR*: /var/run: error to mmap file /var/run/bar0: Invalid argument.
[2023-06-30 14:15:05.548966] nvmf.c: 685:spdk_nvmf_tgt_listen_ext: *ERROR*: Unable to listen on address '/var/run'

Solution:https://review.spdk.io/gerrit/c/spdk/spdk/+/18627

Start VM fail:

[2023-06-30 14:26:08.000027] vfio_user.c:3091:vfio_user_log: *ERROR*: /var/run: refusing client page size of 4096
[2023-06-30 14:26:08.000103] vfio_user.c:3091:vfio_user_log: *ERROR*: /var/run: failed to recv version: Invalid argument
Error booting VM: VmBoot(DeviceManager(VfioUserCreateClient(StreamRead(Error { kind: UnexpectedEof, message: "failed to fill whole buffer" }))))

Solution:

https://github.com/rust-vmm/vfio-user/pull/15

Cargo.lock
-source = "git+https://github.com/rust-vmm/vfio-user?branch=main#eef6bec4d421f08ed1688fe67c5ea33aabbf5069"
+source = "git+https://github.com/rust-vmm/vfio-user?branch=main#08a42bfc1539ab1315590c9c64b2c417c6d21270"

512 fds for vfio-user

Relative code

Get irq info in CloudHypervisor:

pci/src/vfio.rs
parse_capabilities
  if let Some(irq_info) = self.vfio_wrapper.get_irq_info(VFIO_PCI_MSIX_IRQ_INDEX)    //irq_info.count is 512
  {
      let msix_cap = self.parse_msix_capabilities(cap_next);                         //create 512 fds.
      self.initialize_msix(msix_cap, cap_next as u32, bdf, None);                    
  }

Send irq info in spdk,the count is 512:

libvfio-user/lib/irq.c
handle_device_get_irq_info
    out_info = msg->out.iov.iov_base;
    out_info->argsz = sizeof(*out_info);
    out_info->flags = VFIO_IRQ_INFO_EVENTFD;
    out_info->index = in_info->index;
    out_info->count = vfu_ctx->irq_count[in_info->index];

vhost-user

Ref:vhost-user-blk-testing

             +----+----------+          +-------------+-----------+
             |    |          |          |             |           |
             |    |vhost-user|----------| vhost-user  |    dpdk   |
             |    |blk device|          | port 1      |           |
             |    |          |          |             |           |
             |    +----------+          +-------------+-----------+
             |               |          |                         |
             |      vm       |          |           spdk          |
             |               |          |                         |
          +--+----------------------------------------------------+--+
          |  |                  hugepages                         |  |
          |  +----------------------------------------------------+  |
          |                                                          |
          |                       host                               |
          |                                                          |
          +----------------------------------------------------------+

When the page size is 64K,echo 8 > /proc/sys/vm/nr_hugepages,it means 4GB memory.

cat /proc/meminfo
HugePages_Total:       8
HugePages_Free:        8
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:     524288 kB
Hugetlb:         4194304 kB

./build/bin/vhost -S /var/tmp -s 512 -m 0x3 &
HugePages_Free:        7

scripts/rpc.py bdev_malloc_create 1024 4096 -b Malloc0
HugePages_Free:        4

kill vhost
./build/bin/vhost -S /var/tmp -s 1024 -m 0x3 &
HugePages_Free:        6

scripts/rpc.py bdev_malloc_create 1024 4096 -b Malloc0
HugePages_Free:        3
scripts/rpc.py vhost_create_blk_controller --cpumask 0x1 vhost.1 Malloc0

Start VM,the memory must equal or less than 512*4=2048M,prefault means allocate all memory at startup.

./cloud-hypervisor-static-aarch64 \
        --cpus boot=4,max=8 \
        --memory size=2048M,hugepages=on,hugepage_size=512M,prefault=on \
        --kernel /root/workloads/CLOUDHV_EFI.fd \
        --disk path=/home/dom/images/ubuntu22.04.raw \
        --net fd=3,mac=$mac 3<>$"$tapdevice" \
        --api-socket /tmp/cloud-hypervisor.sock \
        --disk vhost_user=true,socket=/var/tmp/vhost.1,num_queues=4,queue_size=128

dd if=/dev/vdb of=/dev/null bs=2M iflag=direct
dd of=/dev/vdb if=/dev/zero bs=2M oflag=direct count=256

On Host:
HugePages_Free:        0

After shutdown VM

scripts/rpc.py vhost_delete_controller vhost.1
scripts/rpc.py bdev_malloc_delete Malloc0
⚠️ **GitHub.com Fallback** ⚠️