Cloud Hypervisor Errors and Fixes - AmpereComputing/ampere-lts-kernel---DEPRECATED GitHub Wiki
Start VM:
sudo ./target/release/cloud-hypervisor \
--kernel ../linux-cloud-hypervisor/arch/arm64/boot/Image \
--disk path=$ROOTFS --disk path=/tmp/ubuntu-cloudinit.img \
--cmdline "console=hvc0 root=/dev/vda1 rw" \
--cpus boot=4 \
--memory size=0,shared=on \
--memory-zone id=mem0,size=1G,shared=on,host_numa_node=0 \
--net "tap=,mac=,ip=,mask=" \
--serial tty \
--console off
The vm stuck after booting
[ 1.200267] EXT4-fs (vda1): re-mounted. Opts: (null). Quota mode: disabled.
[ 1.888490] squashfs: SQUASHFS error: Xattrs in filesystem, these will be ignored
[ 1.890036] unable to read xattr id index table
or
[ 0.001761] Console: colour dummy device 80x25
[ 0.002461] printk: console [tty0] enabled
[ 0.003070] printk: bootconsole [pl11] disabled
cloud-hypervisor: 698.505152ms: <vcpu2> WARN:devices/src/legacy/uart_pl011.rs:358 -- [Debug I/O port: Kernel code: 0x41] 0.695055 seconds
The VM works normally, but the console does not print.
We can log in to the VM through ssh.
The following are the different effects of several different configurations:
--kernel Image --cmdline "console=ttyAMA0" --serial tty --console off kernel加载ttyAMA0的驱动,可以打印出完整的开机log和login prompt
--kernel Image --serial tty --console off 未指定tty,只能通过ssh登录,在ssh中echo test > /dev/ttyAMA0,VM中显示test
--kernel Image --cmdline "console=hvc0" 默认配置是--console tty,kernel 加载hvc0的驱动,该配置有开机log,有login prompt
--kernel Image 不显示开机log,但显示login prompt,在ssh中echo test > /dev/hvc0,VM中显示test
--kernel CLOUDHV_EFI.fd --serial tty --console off 此时--cmdline无效,使用disk里的/boot/grub/grub.cfg里的配置,有3秒kernel开机log,
但没有ubuntu log,有Press ESCAPE for boot options,有login prompt,
如果需要ubuntu log,可以手动在grub.cfg里增加console=ttyAMA0
--kernel CLOUDHV_EFI.fd --serial off --console off 没有任何log,没有login prompt
If --console is not specified, its default value is tty
The following command blocks on ubuntu20.04
service openvswitch start
ovs-vsctl init
ovs-vsctl set Open_vSwitch . other_config:dpdk-init=true
Log in /var/log/openvswitch/ovs-vswitchd.log:
2023-04-23T02:20:26.062Z|00086|dpdk(ovs-vswitchd)|INFO|EAL ARGS: ovs-vswitchd --socket-mem 1024,1024,1024,1024,1024,1024,1024,1024 --socket-limit 1024,1024,1024,1024,1024,1024,1024,1024 -l 0.
2023-04-23T02:20:26.068Z|00089|dpdk(ovs-vswitchd)|ERR|EAL: invalid parameters for --socket-mem
2023-04-23T02:20:26.068Z|00090|dpdk(ovs-vswitchd)|ERR|EAL: Invalid 'command line' arguments.
2023-04-23T02:20:26.068Z|00091|dpdk(ovs-vswitchd)|EMER|Unable to initialize DPDK: Invalid argument
There are two solutions:
1.Modify ANC mode from Quadrant to Monolithic in BIOS of Ampera Altra,it reduces the number of NUMA.
2.run 'ovs-vsctl set Open_vSwitch' on ubuntu22.04,it has new version of openvswitch.
Log in /var/log/openvswitch/ovs-ctl.log
modprobe: FATAL: Module openvswitch not found in directory /lib/modules/6.6.0-rc6
* Inserting openvswitch module
rmmod: ERROR: Module bridge is in use by: br_netfilter
* removing bridge module
Solution:
docker ps
docker exec -it eb63dbf9b52b bash
modinfo openvswitch.ko to check the depends.
insmod /usr/lib/modules/6.6.0-rc6/kernel/net/nsh/nsh.ko
insmod /usr/lib/modules/6.6.0-rc6_vttbr-dirty/kernel/net/netfilter/nf_conncount.ko
insmod /usr/lib/modules/6.6.0-rc6/kernel/net/openvswitch/openvswitch.ko
Anothor simple solution:
on host:
service openvswitch start
ovs and dpdk versions:https://docs.openvswitch.org/en/latest/faq/releases/
Start the VM with -cpu boot=160, max=160, and it gets stuck and reports soft lockup in all shells.
watchdog: BUG: soft lockup - CPU#130 stuck for 26s
Solution
Enter the BIOS,find chipset,cpu configuration,ARM ERRTA 1542419 workaround,Software solution
Start the VM with -cpu boot=160, max=160,and lscpu in the VM,it shows 128 cpus.
Solution:
Modify CONFIG_NR_CPUS in kernel's config file.
The virtualbox has been started, and then start the vm at this time, here it prompts that the device is busy
thread 'vmm' panicked at 'called `Result::unwrap()` on an `Err` value: VmCreate(Device or resource busy (os error 16))', vmm/src/vm.rs:863:41
Error booting VM: VmBoot(DeviceManager(Disk(Os { code: 2, kind: NotFound, message: "No such file or directory" })))
Cloud-hypervisor-network-configuration
brctl stp br0 on
,This operation causes the port of the switch to be closed directly, and any device connected to this port has no network connection, so use with caution.
[7] Couldn't connect to server (Failed to connect to 127.0.0.1 port 4781 after 0 ms: Connection refused); class=Net (12)
config.json,Configure the proxy to use the IP of the host instead of 127.0.0.1
/root/.docker/config.json
{
"proxies":
{
"default":
{
"httpProxy": "socks5://10.0.0.41:4781",
"httpsProxy": "socks5://10.0.0.41:4781",
"noProxy": "*.test.example.com,.example2.com,127.0.0.0/8"
}
}
}
ifconfig eno2 0
ifconfig eno2 up
dhclient eno2
wget --quiet
It does not print the download process and the error report,unless the quite option is removed:
Error parsing proxy URL socks5://10.0.0.41:4781: Unsupported scheme 'socks5'.
Just replace socks5://10.0.0.41:4781 in /root/.docker/config.json with http://10.0.0.41:4780
cargo build --no-default-features --features kvm,mshv --all --release --target aarch64-unknown-linux-gnu
use of undeclared crate or module `mshv`
mshv is Microsoft's contribution to the Linux kernel code, which can run multiple Windows
mshv: Added support for detecting nested hypervisors
hv, mshv : Change the interrupt vector of the nested root partition
Test 'boot_time_ms' running .. (control: test_timeout = 2s, test_iterations = 10, overrides: )
thread '<unnamed>' panicked at 'called `Result::unwrap()` on an `Err` value: Os { code: 2, kind: NotFound, message: "No such file or directory" }', test_infra/src/lib.rs:1232:18
./cloud-hypervisor: /lib/aarch64-linux-gnu/libc.so.6: version `GLIBC_2.33' not found (required by ./cloud-hypervisor)
./cloud-hypervisor: /lib/aarch64-linux-gnu/libc.so.6: version `GLIBC_2.32' not found (required by ./cloud-hypervisor)
./cloud-hypervisor: /lib/aarch64-linux-gnu/libc.so.6: version `GLIBC_2.34' not found (required by ./cloud-hypervisor)
Solution:
sudo vi /etc/apt/sources.list
deb http://th.archive.ubuntu.com/ubuntu jammy main
sudo apt update
sudo apt install libc6
--memory size=0 --memory-zone id=mem0,size=1G,hotplug_size=1G
cloud-hypervisor: 4.763259ms: <vmm> ERROR:vmm/src/memory_manager.rs:741 -- Invalid to set ACPI hotplug method for memory zones
Error booting VM: VmBoot(MemoryManager(InvalidHotplugMethodWithMemoryZones))
Solution:
--memory size=0,hotplug_method=virtio-mem --memory-zone id=mem0,size=1G,hotplug_size=1G
After docker rmi imageID, the disk is still occupied. If docker build is called again, the old image will be used directly
Solution:
docker system prune
apt install openvswitch-switch-dpdk
ovs-vswitchd --version
ovs-vswitchd (Open vSwitch) 2.9.0
If there isn't DPDK,config it:
update-alternatives --set ovs-vswitchd /usr/lib/openvswitch-switch-dpdk/ovs-vswitchd-dpdk
Service OpenVSwitch is probably not installed in system.
modprobe: FATAL: Module openvswitch not found in directory /lib/modules/6.6.0-rc3+
* Inserting openvswitch module
rmmod: ERROR: Module bridge is in use by: br_netfilter
Solution:
Add "--volume /lib/modules/`uname -r`:/lib/modules/`uname -r`" when docker run.
../../../source/components/utilities/utdebug.c:213:36: error: storing the address of local variable ‘CurrentSp’ in ‘AcpiGbl_LowestStackPointer’ [-Werror=dangling-pointer=] 213 | AcpiGbl_LowestStackPointer = &CurrentSp;
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~
Solution:
export CFLAGS="-Wno-error=dangling-pointer=" && make
brotli/c/enc/encode.c:1474:14: error: argument 7 of type ‘uint8_t *’ {aka ‘unsigned char *’} declared as a pointer [-Werror=vla-parameter]
1474 | uint8_t* encoded_buffer) {
Soultion
export CFLAGS="-Wno-error=vla-parameter" is useless.
Change the GCC's version to 9 fixed it.