Cloud hypervisor with spdk - AmpereComputing/ampere-lts-kernel---DEPRECATED GitHub Wiki
Ref:vfio-user
When testing CH vfio-user using the spdk virtual NVMe controller (nvmf/vfio-user) (https://github.com/nutanix/libvfio-user/blob/master/docs/spdk.md), we met two issues:
- CH failed to initialize NVME device, since CH exceeds process file descriptors limits: https://github.com/cloud-hypervisor/cloud-hypervisor/issues/5426. Solution:
ulimit -n 2048
. - CH failed to initialize NVME device, when host OS sets CONFIG_ARM64_64K_PAGES=y. To fix, two patches are required:
- SPDK: https://review.spdk.io/gerrit/c/spdk/spdk/+/18627 (under review)
- CH: https://github.com/rust-vmm/vfio-user/pull/15 (merged)
Why CH try to open 512 file descriptors for vfio-user nvme device? refer to: https://github.com/spdk/spdk/commit/a9ff16810793db8ebf235966770a50825bbec6a1. CH try to create an eventfd for each NVME msix irq. SPDK sets NVME_IRQ_MSIX_NUM to 512 (0x1000/8)
We proposed a PR to set ulimit in CH code (https://github.com/cloud-hypervisor/cloud-hypervisor/pull/5545), it was rejected. CH prefers to set ulimit in shell before start CH.
git clone https://github.com/spdk/spdk
git submodule update --init
./configure --with-vfio-user
make -j
scripts/setup.sh
rm ~/images/test-disk.raw
truncate ~/images/test-disk.raw -s 128M
mkfs.ext4 ~/images/test-disk.raw
killall ./build/bin/nvmf_tgt
ulimit -n 2048
./build/bin/nvmf_tgt -i 0 -e 0xFFFF -m 0x1 &
sleep 2
./scripts/rpc.py nvmf_create_transport -t VFIOUSER
rm -rf /var/run
mkdir -p /var/run
./scripts/rpc.py bdev_aio_create ~/images/test-disk.raw test 4096 #or 512
./scripts/rpc.py nvmf_create_subsystem nqn.2019-07.io.spdk:cnode -a -s test
./scripts/rpc.py nvmf_subsystem_add_ns nqn.2019-07.io.spdk:cnode test
./scripts/rpc.py nvmf_subsystem_add_listener nqn.2019-07.io.spdk:cnode -t VFIOUSER -a /var/run -s 0
chown $USER.$USER -R /var/run
EAL: No free 2048 kB hugepages reported on node 1
Solution:
echo 6144 > /proc/sys/vm/nr_hugepages
cp /root/workloads/jammy-server-cloudimg-arm64-custom-20220329-0.raw /root/workloads/osdisk.img
rm -rf /root/workloads/cloud-hypervisor.sock
"target/aarch64-unknown-linux-gnu/release/cloud-hypervisor" \
"--api-socket" "/root/workloads/cloud-hypervisor.sock" \
"--cpus" "boot=1" "--memory" "size=512M,shared=on,hugepages=on" \
"--kernel" "/root/workloads/CLOUDHV_EFI.fd" \
"--serial" "tty" "--console" "off" \
"--disk" "path=/root/workloads/osdisk.img" \
"--disk" "path=/root/workloads/cloudinit" \
"--net" "tap=,mac=12:34:56:78:90:01,ip=192.168.1.1,mask=255.255.255.0" \
"-v" --log-file log.txt
target/aarch64-unknown-linux-gnu/release/ch-remote --api-socket /root/workloads/cloud-hypervisor.sock add-user-device socket=/var/run/cntrl,id=vfio_user0
VM crash:
cloud-hypervisor: 40.332077s: <vmm> INFO:vmm/src/lib.rs:1858 -- API request event: VmAddUserDevice(UserDeviceConfig { socket: "/var/run/cntrl", id: Some("vfio_user0"), pci_segment: 0 }, Sender { .. })
cloud-hypervisor: 40.353919s: <vmm> WARN:/root/.cargo/git/checkouts/vfio-user-1ee9f6371fec66a1/eef6bec/src/lib.rs:645 -- Ignoring unsupported vfio region capability (id = '2')
cloud-hypervisor: 40.481922s: <vmm> INFO:pci/src/vfio_user.rs:329 -- Enabling IRQ 0 number of fds = 1
cloud-hypervisor: 40.611353s: <vcpu0> INFO:pci/src/configuration.rs:984 -- Detected BAR reprogramming: (BAR 4) 0x2fffe000->0x10100000
cloud-hypervisor: 40.611490s: <vcpu0> INFO:pci/src/vfio_user.rs:451 -- Moving BAR 0x2fffe000 -> 0x10100000
cloud-hypervisor: 40.611756s: <vcpu0> INFO:pci/src/vfio_user.rs:492 -- Moved bar 0x2fffe000 -> 0x10100000
cloud-hypervisor: 40.611852s: <vcpu0> INFO:pci/src/configuration.rs:984 -- Detected BAR reprogramming: (BAR 8) 0x2fffc000->0x10102000
cloud-hypervisor: 40.611928s: <vcpu0> INFO:pci/src/vfio_user.rs:451 -- Moving BAR 0x2fffc000 -> 0x10102000
cloud-hypervisor: 40.611988s: <vcpu0> INFO:pci/src/vfio_user.rs:492 -- Moved bar 0x2fffc000 -> 0x10102000
cloud-hypervisor: 40.612072s: <vcpu0> INFO:pci/src/configuration.rs:984 -- Detected BAR reprogramming: (BAR 9) 0x2fffb000->0x10104000
cloud-hypervisor: 40.612141s: <vcpu0> INFO:pci/src/vfio_user.rs:451 -- Moving BAR 0x2fffb000 -> 0x10104000
cloud-hypervisor: 40.612200s: <vcpu0> INFO:pci/src/vfio_user.rs:492 -- Moved bar 0x2fffb000 -> 0x10104000
cloud-hypervisor: 40.690917s: <vcpu0> INFO:pci/src/vfio_user.rs:381 -- Unmasking IRQ 0
cloud-hypervisor: 40.693925s: <vcpu0> INFO:pci/src/vfio_user.rs:366 -- Disabling IRQ 0
cloud-hypervisor: 40.696282s: <vcpu0> ERROR:vmm/src/cpu.rs:1071 -- vCPU thread panicked
cloud-hypervisor: 40.696407s: <vmm> INFO:vmm/src/lib.rs:1828 -- VM exit event
Solution:ulimit -n 2048
VM works well:
cloud-hypervisor: 38.183822s: <vmm> INFO:vmm/src/lib.rs:1858 -- API request event: VmAddUserDevice(UserDeviceConfig { socket: "/var/run/cntrl", id: Some("vfio_user0"), pci_segment: 0 }, Sender { .. })
cloud-hypervisor: 38.199257s: <vmm> WARN:/root/.cargo/git/checkouts/vfio-user-1ee9f6371fec66a1/eef6bec/src/lib.rs:645 -- Ignoring unsupported vfio region capability (id = '2')
cloud-hypervisor: 38.323265s: <vmm> INFO:pci/src/vfio_user.rs:329 -- Enabling IRQ 0 number of fds = 1
cloud-hypervisor: 38.457331s: <vcpu0> INFO:pci/src/configuration.rs:984 -- Detected BAR reprogramming: (BAR 4) 0x2fffe000->0x10100000
cloud-hypervisor: 38.457450s: <vcpu0> INFO:pci/src/vfio_user.rs:451 -- Moving BAR 0x2fffe000 -> 0x10100000
cloud-hypervisor: 38.457688s: <vcpu0> INFO:pci/src/vfio_user.rs:492 -- Moved bar 0x2fffe000 -> 0x10100000
cloud-hypervisor: 38.457782s: <vcpu0> INFO:pci/src/configuration.rs:984 -- Detected BAR reprogramming: (BAR 8) 0x2fffc000->0x10102000
cloud-hypervisor: 38.457857s: <vcpu0> INFO:pci/src/vfio_user.rs:451 -- Moving BAR 0x2fffc000 -> 0x10102000
cloud-hypervisor: 38.457917s: <vcpu0> INFO:pci/src/vfio_user.rs:492 -- Moved bar 0x2fffc000 -> 0x10102000
cloud-hypervisor: 38.458001s: <vcpu0> INFO:pci/src/configuration.rs:984 -- Detected BAR reprogramming: (BAR 9) 0x2fffb000->0x10104000
cloud-hypervisor: 38.458071s: <vcpu0> INFO:pci/src/vfio_user.rs:451 -- Moving BAR 0x2fffb000 -> 0x10104000
cloud-hypervisor: 38.458131s: <vcpu0> INFO:pci/src/vfio_user.rs:492 -- Moved bar 0x2fffb000 -> 0x10104000
cloud-hypervisor: 38.513253s: <vcpu0> INFO:pci/src/vfio_user.rs:381 -- Unmasking IRQ 0
cloud-hypervisor: 38.516263s: <vcpu0> INFO:pci/src/vfio_user.rs:366 -- Disabling IRQ 0
cloud-hypervisor: 38.560353s: <vcpu0> INFO:pci/src/vfio_user.rs:329 -- Enabling IRQ 2 number of fds = 512
cloud-hypervisor: 38.691981s: <vcpu0> INFO:pci/src/vfio_user.rs:366 -- Disabling IRQ 2
cloud-hypervisor: 38.692697s: <vcpu0> INFO:pci/src/vfio_user.rs:329 -- Enabling IRQ 0 number of fds = 1
cloud-hypervisor: 38.698262s: <vcpu0> INFO:pci/src/vfio_user.rs:366 -- Disabling IRQ 0
cloud-hypervisor: 38.699755s: <vcpu0> INFO:pci/src/vfio_user.rs:329 -- Enabling IRQ 2 number of fds = 512
"target/aarch64-unknown-linux-gnu/release/cloud-hypervisor" \
"--api-socket" "$path/cloud-hypervisor.sock" \
"--cpus" "boot=1" \
"--memory" "size=1G,shared=on" \
"--kernel" "$path/Image" --cmdline "root=/dev/vda1 console=hvc0" \
"--disk" "path=$path/osdisk.img" \
"--disk" "path=$path/cloudinit" \
"--net" "tap=,mac=12:34:56:78:90:01,ip=192.168.1.1,mask=255.255.255.0" \
--user-device socket=/var/run/cntrl
Start VM fail:
thread 'vcpu0' panicked at 'Failed cloning interrupt's EventFd: Os { code: 24, kind: Uncategorized, message: "Too many open files" }', vmm/src/interrupt.rs:82:18
Solution:ulimit -n 2048
VM works well:
[ 0.813704] nvme nvme0: pci function 0000:00:06.0
[ 1.180360] input: gpio-keys as /devices/platform/gpio-keys/input/input0
[ 1.307008] nvme nvme0: 1/0/0 default/read/poll queues
[ 1.309669] EXT4-fs (vda1): mounted filesystem 24394335-eeb5-4731-9e62-6255dd5a3712 with ordered data mode. Quota mode: disabled.
[ 1.309805] VFS: Mounted root (ext4 filesystem) readonly on device 254:1.
[ 1.309953] devtmpfs: mounted
[ 1.310284] Freeing unused kernel memory: 1728K
cloud-hypervisor: 1.646859s: <vcpu0> WARN:devices/src/legacy/uart_pl011.rs:358 -- [Debug I/O port: Kernel code: 0x41] 1.1643821 seconds
[ 1.319101] Run /sbin/init as init process
[ 1.346630] systemd[1]: systemd 249.11-0ubuntu2 running in system mode (+PAM +AUDIT +SELINUX +APPARMOR +IMA +SMACK +SECCOMP +GCRYPT +GNUTLS -OPENSSL +ACL +BLKID +CURL +ELFUTILS -FIDO2 +IDN2 -IDN +IPTC +KMOD +LIBCRYPTSETUP -LIBFDISK +PCRE2 -PWQUALITY -P11KIT -QRENCODE +BZIP2 +LZ4 +XZ +ZLIB +ZSTD -XKBCOMMON +UTMP +SYSVINIT default-hierarchy=unified)
[ 1.346972] systemd[1]: Detected architecture arm64.
ls /dev/nvme0n1
mkdir nvme
mount /dev/nvme0n1 nvme
cd nvme
echo "hello world" > test.txt
cat test.txt
cd ..
umount nvme
Some error:
umount: /home/cloud/nvme: target is busy.
Solution:
lsof nvme
kill pid or exit the folder
More info:
aarch64: Booting with NVMe vfio-user device fails
This is something that the sysadmin should change e.g by editing /etc/security/limits.conf - not something that the process should change.
vfio_user: Fix the issue of 'Too many open files' when start VM with spdk
Setup spdk fail:
sudo ./scripts/rpc.py nvmf_subsystem_add_listener nqn.2019-07.io.spdk:cnode -t VFIOUSER -a /var/run -s 0
[2023-06-30 14:15:05.548885] vfio_user.c:4454:nvmf_vfio_user_listen: *ERROR*: /var/run: error to mmap file /var/run/bar0: Invalid argument.
[2023-06-30 14:15:05.548966] nvmf.c: 685:spdk_nvmf_tgt_listen_ext: *ERROR*: Unable to listen on address '/var/run'
Solution:https://review.spdk.io/gerrit/c/spdk/spdk/+/18627
Start VM fail:
[2023-06-30 14:26:08.000027] vfio_user.c:3091:vfio_user_log: *ERROR*: /var/run: refusing client page size of 4096
[2023-06-30 14:26:08.000103] vfio_user.c:3091:vfio_user_log: *ERROR*: /var/run: failed to recv version: Invalid argument
Error booting VM: VmBoot(DeviceManager(VfioUserCreateClient(StreamRead(Error { kind: UnexpectedEof, message: "failed to fill whole buffer" }))))
Solution:
https://github.com/rust-vmm/vfio-user/pull/15
Cargo.lock
-source = "git+https://github.com/rust-vmm/vfio-user?branch=main#eef6bec4d421f08ed1688fe67c5ea33aabbf5069"
+source = "git+https://github.com/rust-vmm/vfio-user?branch=main#08a42bfc1539ab1315590c9c64b2c417c6d21270"
Relative code
Get irq info in CloudHypervisor:
pci/src/vfio.rs
parse_capabilities
if let Some(irq_info) = self.vfio_wrapper.get_irq_info(VFIO_PCI_MSIX_IRQ_INDEX) //irq_info.count is 512
{
let msix_cap = self.parse_msix_capabilities(cap_next); //create 512 fds.
self.initialize_msix(msix_cap, cap_next as u32, bdf, None);
}
Send irq info in spdk,the count is 512:
libvfio-user/lib/irq.c
handle_device_get_irq_info
out_info = msg->out.iov.iov_base;
out_info->argsz = sizeof(*out_info);
out_info->flags = VFIO_IRQ_INFO_EVENTFD;
out_info->index = in_info->index;
out_info->count = vfu_ctx->irq_count[in_info->index];
+----+----------+ +-------------+-----------+
| | | | | |
| |vhost-user|----------| vhost-user | dpdk |
| |blk device| | port 1 | |
| | | | | |
| +----------+ +-------------+-----------+
| | | |
| vm | | spdk |
| | | |
+--+----------------------------------------------------+--+
| | hugepages | |
| +----------------------------------------------------+ |
| |
| host |
| |
+----------------------------------------------------------+
When the page size is 64K,echo 8 > /proc/sys/vm/nr_hugepages,it means 4GB memory.
cat /proc/meminfo
HugePages_Total: 8
HugePages_Free: 8
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 524288 kB
Hugetlb: 4194304 kB
./build/bin/vhost -S /var/tmp -s 512 -m 0x3 &
HugePages_Free: 7
scripts/rpc.py bdev_malloc_create 1024 4096 -b Malloc0
HugePages_Free: 4
kill vhost
./build/bin/vhost -S /var/tmp -s 1024 -m 0x3 &
HugePages_Free: 6
scripts/rpc.py bdev_malloc_create 1024 4096 -b Malloc0
HugePages_Free: 3
scripts/rpc.py vhost_create_blk_controller --cpumask 0x1 vhost.1 Malloc0
Start VM,the memory must equal or less than 512*4=2048M,prefault means allocate all memory at startup.
./cloud-hypervisor-static-aarch64 \
--cpus boot=4,max=8 \
--memory size=2048M,hugepages=on,hugepage_size=512M,prefault=on \
--kernel /root/workloads/CLOUDHV_EFI.fd \
--disk path=/home/dom/images/ubuntu22.04.raw \
--net fd=3,mac=$mac 3<>$"$tapdevice" \
--api-socket /tmp/cloud-hypervisor.sock \
--disk vhost_user=true,socket=/var/tmp/vhost.1,num_queues=4,queue_size=128
dd if=/dev/vdb of=/dev/null bs=2M iflag=direct
dd of=/dev/vdb if=/dev/zero bs=2M oflag=direct count=256
On Host:
HugePages_Free: 0
After shutdown VM
scripts/rpc.py vhost_delete_controller vhost.1
scripts/rpc.py bdev_malloc_delete Malloc0