Cloud Hypervisor Errors and Fixes - AmpereComputing/ampere-lts-kernel---DEPRECATED GitHub Wiki

tty config

Start VM:

sudo ./target/release/cloud-hypervisor \
          --kernel ../linux-cloud-hypervisor/arch/arm64/boot/Image  \
          --disk path=$ROOTFS --disk path=/tmp/ubuntu-cloudinit.img  \
          --cmdline "console=hvc0 root=/dev/vda1 rw" \
          --cpus boot=4   \
          --memory size=0,shared=on  \
          --memory-zone id=mem0,size=1G,shared=on,host_numa_node=0 \
          --net "tap=,mac=,ip=,mask=" \
          --serial tty \
          --console off

The vm stuck after booting

[    1.200267] EXT4-fs (vda1): re-mounted. Opts: (null). Quota mode: disabled.
[    1.888490] squashfs: SQUASHFS error: Xattrs in filesystem, these will be ignored
[    1.890036] unable to read xattr id index table
or
[    0.001761] Console: colour dummy device 80x25
[    0.002461] printk: console [tty0] enabled
[    0.003070] printk: bootconsole [pl11] disabled
cloud-hypervisor: 698.505152ms: <vcpu2> WARN:devices/src/legacy/uart_pl011.rs:358 -- [Debug I/O port: Kernel code: 0x41] 0.695055 seconds

The VM works normally, but the console does not print.

We can log in to the VM through ssh.

The following are the different effects of several different configurations:

--kernel Image --cmdline "console=ttyAMA0" --serial tty --console off  kernel加载ttyAMA0的驱动,可以打印出完整的开机log和login prompt
--kernel Image --serial tty --console off                              未指定tty,只能通过ssh登录,在ssh中echo test > /dev/ttyAMA0,VM中显示test
--kernel Image --cmdline "console=hvc0"                                默认配置是--console tty,kernel 加载hvc0的驱动,该配置有开机log,有login prompt
--kernel Image                                                         不显示开机log,但显示login prompt,在ssh中echo test > /dev/hvc0,VM中显示test

--kernel CLOUDHV_EFI.fd --serial tty --console off                     此时--cmdline无效,使用disk里的/boot/grub/grub.cfg里的配置,有3秒kernel开机log,
                                                                       但没有ubuntu log,有Press ESCAPE for boot options,有login prompt,
                                                                       如果需要ubuntu log,可以手动在grub.cfg里增加console=ttyAMA0
--kernel CLOUDHV_EFI.fd --serial off --console off                     没有任何log,没有login prompt

If --console is not specified, its default value is tty

OVS_DPDK

The following command blocks on ubuntu20.04

service openvswitch start
ovs-vsctl init
ovs-vsctl set Open_vSwitch . other_config:dpdk-init=true

issue 1

Log in /var/log/openvswitch/ovs-vswitchd.log:

2023-04-23T02:20:26.062Z|00086|dpdk(ovs-vswitchd)|INFO|EAL ARGS: ovs-vswitchd --socket-mem 1024,1024,1024,1024,1024,1024,1024,1024 --socket-limit 1024,1024,1024,1024,1024,1024,1024,1024 -l 0.
2023-04-23T02:20:26.068Z|00089|dpdk(ovs-vswitchd)|ERR|EAL: invalid parameters for --socket-mem
2023-04-23T02:20:26.068Z|00090|dpdk(ovs-vswitchd)|ERR|EAL: Invalid 'command line' arguments.
2023-04-23T02:20:26.068Z|00091|dpdk(ovs-vswitchd)|EMER|Unable to initialize DPDK: Invalid argument

There are two solutions:

1.Modify ANC mode from Quadrant to Monolithic in BIOS of Ampera Altra,it reduces the number of NUMA.
2.run 'ovs-vsctl set Open_vSwitch' on ubuntu22.04,it has new version of openvswitch.

issue 2

Log in /var/log/openvswitch/ovs-ctl.log

modprobe: FATAL: Module openvswitch not found in directory /lib/modules/6.6.0-rc6
 * Inserting openvswitch module
rmmod: ERROR: Module bridge is in use by: br_netfilter
 * removing bridge module

Solution:

docker ps
docker exec -it eb63dbf9b52b bash
modinfo openvswitch.ko to check the depends.

insmod /usr/lib/modules/6.6.0-rc6/kernel/net/nsh/nsh.ko
insmod /usr/lib/modules/6.6.0-rc6_vttbr-dirty/kernel/net/netfilter/nf_conncount.ko
insmod /usr/lib/modules/6.6.0-rc6/kernel/net/openvswitch/openvswitch.ko

Anothor simple solution:

on host:
service openvswitch start

ovs and dpdk versions:https://docs.openvswitch.org/en/latest/faq/releases/

Soft lockup

Start the VM with -cpu boot=160, max=160, and it gets stuck and reports soft lockup in all shells.

watchdog: BUG: soft lockup - CPU#130 stuck for 26s

Solution

Enter the BIOS,find chipset,cpu configuration,ARM ERRTA 1542419 workaround,Software solution

CPU numbers

Start the VM with -cpu boot=160, max=160,and lscpu in the VM,it shows 128 cpus.

Solution:

Modify CONFIG_NR_CPUS in kernel's config file.

The virtualbox has been started, and then start the vm at this time, here it prompts that the device is busy

thread 'vmm' panicked at 'called `Result::unwrap()` on an `Err` value: VmCreate(Device or resource busy (os error 16))', vmm/src/vm.rs:863:41

An error is reported when the file specified by --disk in the startup vm does not exist:

Error booting VM: VmBoot(DeviceManager(Disk(Os { code: 2, kind: NotFound, message: "No such file or directory" })))

VM netowrk,reference:

Cloud-hypervisor-network-configuration

brctl stp br0 on,This operation causes the port of the switch to be closed directly, and any device connected to this port has no network connection, so use with caution.

Proxy config

[7] Couldn't connect to server (Failed to connect to 127.0.0.1 port 4781 after 0 ms: Connection refused); class=Net (12)

config.json,Configure the proxy to use the IP of the host instead of 127.0.0.1

/root/.docker/config.json

 {
  "proxies":
  {
    "default":
    {
      "httpProxy": "socks5://10.0.0.41:4781",
      "httpsProxy": "socks5://10.0.0.41:4781",
      "noProxy": "*.test.example.com,.example2.com,127.0.0.0/8"
    }
  }
 }

Restart network

ifconfig eno2 0
ifconfig eno2 up
dhclient eno2

wget does not support socks5 proxy

wget --quiet

It does not print the download process and the error report,unless the quite option is removed:

Error parsing proxy URL socks5://10.0.0.41:4781: Unsupported scheme 'socks5'.

Just replace socks5://10.0.0.41:4781 in /root/.docker/config.json with http://10.0.0.41:4780

MSHV

cargo build --no-default-features --features kvm,mshv --all --release --target aarch64-unknown-linux-gnu
use of undeclared crate or module `mshv`

mshv is Microsoft's contribution to the Linux kernel code, which can run multiple Windows

mshv: Added support for detecting nested hypervisors

hv, mshv : Change the interrupt vector of the nested root partition

boot_time

Test 'boot_time_ms' running .. (control: test_timeout = 2s, test_iterations = 10, overrides: )

thread '<unnamed>' panicked at 'called `Result::unwrap()` on an `Err` value: Os { code: 2, kind: NotFound, message: "No such file or directory" }', test_infra/src/lib.rs:1232:18

See Performance-Metrics

GLIBC

./cloud-hypervisor: /lib/aarch64-linux-gnu/libc.so.6: version `GLIBC_2.33' not found (required by ./cloud-hypervisor)
./cloud-hypervisor: /lib/aarch64-linux-gnu/libc.so.6: version `GLIBC_2.32' not found (required by ./cloud-hypervisor)
./cloud-hypervisor: /lib/aarch64-linux-gnu/libc.so.6: version `GLIBC_2.34' not found (required by ./cloud-hypervisor)

Solution:

sudo vi /etc/apt/sources.list
deb http://th.archive.ubuntu.com/ubuntu jammy main
sudo apt update
sudo apt install libc6

Memory

--memory size=0 --memory-zone id=mem0,size=1G,hotplug_size=1G

cloud-hypervisor: 4.763259ms: <vmm> ERROR:vmm/src/memory_manager.rs:741 -- Invalid to set ACPI hotplug method for memory zones
Error booting VM: VmBoot(MemoryManager(InvalidHotplugMethodWithMemoryZones))

Solution:

--memory size=0,hotplug_method=virtio-mem --memory-zone id=mem0,size=1G,hotplug_size=1G

docker

After docker rmi imageID, the disk is still occupied. If docker build is called again, the old image will be used directly

Solution:

docker system prune

dpdk

apt install openvswitch-switch-dpdk

ovs-vswitchd --version
ovs-vswitchd (Open vSwitch) 2.9.0

If there isn't DPDK,config it:

update-alternatives --set ovs-vswitchd /usr/lib/openvswitch-switch-dpdk/ovs-vswitchd-dpdk

Service OpenVSwitch is probably not installed in system.

modprobe: FATAL: Module openvswitch not found in directory /lib/modules/6.6.0-rc3+
 * Inserting openvswitch module
rmmod: ERROR: Module bridge is in use by: br_netfilter

Solution:

Add "--volume /lib/modules/`uname -r`:/lib/modules/`uname -r`" when docker run.

Compile error:

../../../source/components/utilities/utdebug.c:213:36: error: storing the address of local variable ‘CurrentSp’ in ‘AcpiGbl_LowestStackPointer’ [-Werror=dangling-pointer=]          213 |         AcpiGbl_LowestStackPointer = &CurrentSp;
      |         ~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~

Solution: export CFLAGS="-Wno-error=dangling-pointer=" && make

brotli/c/enc/encode.c:1474:14: error: argument 7 of type ‘uint8_t *’ {aka ‘unsigned char *’} declared as a pointer [-Werror=vla-parameter]
 1474 |     uint8_t* encoded_buffer) {

Soultion

export CFLAGS="-Wno-error=vla-parameter" is useless.
Change the GCC's version to 9 fixed it.
⚠️ **GitHub.com Fallback** ⚠️