Basic: CXL Test with CXL emulation in QEMU - moking/moking.github.io GitHub Wiki
CXL (Compute Express Link) is an open standard for high-speed, high capacity central processing unit (CPU)-to-device and CPU-to-memory connections, designed for high performance data center computers. CXL is built on the serial PCI Express (PCIe) physical and electrical interface and includes PCIe-based block input/output protocol (CXL.io) and new cache-coherent protocols for accessing system memory (CXL.cache) and device memory (CXL.mem).--From (wikipedia)
CXL hardware design follows the open specification from CXL consortium (link). The CXL specification is under active development and has evolved from 1.1, 2.0, to 3.0 and as of now the 3.1 specification. The last specification was released August 2023 as the CXL 3.1 specification. Since CXL hardware is not currently widely accessible on the market at the moment, software developers rely on emulation of CXL hardware for debugging and testing CXL code including the CXL Linux kernel drivers. As far as I know, QEMU is the only emulator that supports CXL hardware emulation at the moment.
QEMU is a generic and open-source machine emulator and virtualizer. It can simulate systems with different hardware configurations, including CPUs with different ISA, memory configurations, and peripheral devices. It is noted that QEMU is defined for simulating system functionalities not for timing models, so it is not suitable for performance related simulation and evaluation.
The mainstream QEMU source code can be found (here). CXL related code is located in the following locations in the QEMU source tree:
- hw/cxl/
- include/hw/cxl/
- hw/mem/cxl_type3.c
- qapi/cxl.json
QEMU currently can emulate the following CXL 2.0 compliant CXL system components (Qemu CXL doc):
- CXL Host Bridge (CXL HB): equivalent to PCIe host bridge.
- CXL Root Ports (CXL RP): serves the same purpose as a PCIe Root Port. There are a number of CXL specific Designated Vendor Specific Extended Capabilities (DVSEC) in PCIe Configuration Space and associated component register access via PCI bars.
- CXL Switch: has a similar architecture to those in PCIe, with a single upstream port, internal PCI bus and multiple downstream ports.
- CXL Type 3 memory devices as memory expansion: the device can act as a system RAM or Dax device. Currently, volatile and non-volatile memory emulation has been merged to mainstream. CXL 3.0 introduces a new CXL memory device that implements dynamic capacity-DCD (dynamic capacity device). The support of DCD emulation in QEMU has been posted to the mailing list and will be merged soon.
- CXL Fixed Memory Windows (CFMW): A CFMW consists of a particular range of Host Physical Address space which is routed to particular CXL Host Bridges. At time of generic software initialization it will have a particularly interleaving configuration and associated Quality of Service Throttling Group (QTG). This information is available to system software, when making decisions about how to configure interleave across available CXL memory devices. It is provide as CFMW Structures (CFMWS) in the CXL Early Discovery Table, an ACPI table.
To test a CXL device with QEMU emulation, we need to have the following prerequisites:
- A QEMU emulator either compiled from source code or preinstalled with CXL emulation support;
- A Kernel image with CXL support (compiled in or as modules);
- A file system that serves as the root fs for booting the guest VM.
Building QEMU and the Linux kernel rely on some preinstalled packages. Here we use the Debian distribution ("bookworm") as an example.
sudo apt-get install libglib2.0-dev libgcrypt20-dev zlib1g-dev \ autoconf automake libtool bison flex libpixman-1-dev bc QEMU-kvm \ make ninja-build libncurses-dev libelf-dev libssl-dev debootstrap \ libcap-ng-dev libattr1-dev libslirp-dev libslirp0
It is recommended to build the QEMU from source code for two reason:
- The pre-compiled binary can be old and lack the latest features that are supported by QEMU;
- Building QEMU from source code allows us to customize QEMU based on our needs, including development debugging, applying specific patches to test some un-merged features, or modifying QEMU to test some ideas or fixes, etc.
We can download QEMU source code from different sources, for example:
- Mainstream QEMU: https://github.com/qemu/qemu
- QEMU CXL Maintainer Jonathan's local tree for To-be-Merged patches: https://gitlab.com/jic23/qemu
- Fan Ni's local github tree for Latest DCD emulation: https://github.com/moking/qemu/tree/dcd-v6
Below we will use DCD emulation setup as an example.
Step 1: download QEMU source code
git clone https://github.com/moking/QEMU/tree/dcd-v6
Step 2: configure QEMU
For example, configure QEMU with x86_64 CPU architecture and debug support:
cd $QEMU; ./configure --target-list=x86_64-softmmu --enable-debug
Step 3: Compile QEMU
make -j 16
If compile succeed, a new QEMU binary will be generated under build/ directory:
fan@DT ~/c/QEMU (dcd-v6)> ls build/QEMU-system-x86_64 -lh -rwxr-xr-x 1 fan fan 59M Mar 25 12:12 build/QEMU-system-x86_64
Note: here we build cxl drivers as modules and load/unload them on demand.
Step 1: download Linux Kernel source code
Linux source code can be downloaded from different sources:
- https://git.kernel.org/
- CXL related development repository: https://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl.git/?h=fixes
- Kernel with DCD drivers:https://github.com/weiny2/linux-kernel/tree/dcd-2024-03-24
Below we will use DCD kernel code as an example.
git clone https://github.com/weiny2/linux-kernel/tree/dcd-2024-03-24
Step 2: configure kernel
After we downloaded the source code, we need to configure the kernel features we want to pick up in the following compile step.
make menuconfig
or,
make kconfig
After the kernel is configured, a .config file will be generated under the root directory of linux kernel source code.
To enable CXL related code support, we need to enable following configurations in .config file.
fan@DT ~/c/r/k/linux-dcd (dcd-2024-03-24)> cat .config | egrep "CXL|DAX|_ND_" CONFIG_ARCH_WANT_OPTIMIZE_DAX_VMEMMAP=y CONFIG_CXL_BUS=m CONFIG_CXL_PCI=m CONFIG_CXL_MEM_RAW_COMMANDS=y CONFIG_CXL_ACPI=m CONFIG_CXL_PMEM=m CONFIG_CXL_MEM=m CONFIG_CXL_PORT=m CONFIG_CXL_SUSPEND=y CONFIG_CXL_REGION=y CONFIG_CXL_REGION_INVALIDATION_TEST=y CONFIG_CXL_PMU=m CONFIG_ND_CLAIM=y CONFIG_ND_BTT=m CONFIG_ND_PFN=m CONFIG_NVDIMM_DAX=y CONFIG_DAX=m CONFIG_DEV_DAX=m CONFIG_DEV_DAX_PMEM=m CONFIG_DEV_DAX_HMEM=m CONFIG_DEV_DAX_CXL=m CONFIG_DEV_DAX_HMEM_DEVICES=y CONFIG_DEV_DAX_KMEM=m
Step 3: Compile Kernel
make -j 16
After a successful compile, a new file vmlinux will be generated under kernel root directory. A compressed kernel image will also be available.
fan@DT ~/c/r/k/linux-dcd (dcd-2024-03-24)> ls arch/x86/boot/bzImage -lh -rw-r--r-- 1 fan fan 13M Mar 25 09:30 arch/x86/boot/bzImage
Step 4: Install kernel modules
sudo make modules_install
To create a disk image as root file system of the guest VM, we need to leverage the tools generated by compiling QEMU source code.
fan@DT ~/c/QEMU (dcd-v6)> find . -name "QEMU-img" ./build/QEMU-bundle/usr/local/bin/QEMU-img ./build/QEMU-img
- Create a QEMU image with QEMU-image (e.g., 16G).
QEMU-image create $IMG 16G
- Create a file system for the image.
sudo mkfs.ext4 $IMG
- Mount the file system to a directory.
mkdir $DIR sudo mount -o loop $IMG $DIR
- Install the debian distribution to the file system.
sudo debootstrap --arch amd64 stable $DIR
- Setup host and guest VM directory sharing
echo "#! /bin/bash mount -t 9p -o trans=virtio homeshare /home/fan mount -t 9p -o trans=virtio modshare /lib/modules " > /tmp/rc.local chmod a+x /tmp/rc.local sudo cp /tmp/rc.local $DIR/etc/ sudo mkdir -p $DIR/home/fan sudo mkdir -p $DIR/lib/modules/
- Setup network for guest VM
Create a config.yaml file with following content under $DIR/etc/netplan.
network: version: 2 renderer: networkd ethernets: enp0s2: dhcp4: true
- sudo umount $DIR
Example 1: boot up VM with a CXL persistent memory sized 512MiB, directly attached to the root port of a host bridge.
QEMU-system-x86_64 -s -kernel /home/fan/cxl/repos/kdevops/linux-dcd/arch/x86/boot/bzImage -append root=/dev/sda rw console=ttyS0,115200 ignore_loglevel nokaslr \ cxl_acpi.dyndbg=+fplm cxl_pci.dyndbg=+fplm cxl_core.dyndbg=+fplm cxl_mem.dyndbg=+fplm cxl_pmem.dyndbg=+fplm cxl_port.dyndbg=+fplm cxl_region.dyndbg=+fplm \ cxl_test.dyndbg=+fplm cxl_mock.dyndbg=+fplm cxl_mock_mem.dyndbg=+fplm dax.dyndbg=+fplm dax_cxl.dyndbg=+fplm device_dax.dyndbg=+fplm \ -smp 1 -accel kvm -serial mon:stdio -nographic -qmp tcp:localhost:4444,server,wait=off -netdev user,id=network0,hostfwd=tcp::2024-:22 -device e1000,netdev=network0 \ -monitor telnet:127.0.0.1:12345,server,nowait -drive file=/home/fan/cxl/images/QEMU-image.img,index=0,media=disk,format=raw \ -machine q35,cxl=on -m 8G,maxmem=32G,slots=8 -virtfs local,path=/lib/modules,mount_tag=modshare,security_model=mapped \ -virtfs local,path=/home/fan,mount_tag=homeshare,security_model=mapped \ -object memory-backend-file,id=cxl-mem1,share=on,mem-path=/tmp/cxltest.raw,size=512M \ -object memory-backend-file,id=cxl-lsa1,share=on,mem-path=/tmp/lsa.raw,size=512M \ -device pxb-cxl,bus_nr=12,bus=pcie.0,id=cxl.1 -device cxl-rp,port=0,bus=cxl.1,id=root_port13,chassis=0,slot=2 \ -device cxl-type3,bus=root_port13,memdev=cxl-mem1,lsa=cxl-lsa1,id=cxl-pmem0 \ -M cxl-fmw.0.targets.0=cxl.1,cxl-fmw.0.size=4G,cxl-fmw.0.interleave-granularity=8k
Example 2: boot up VM with CXL DCD setup: the device is directly attached to the only root port of a host bridge. The device has two dynamic capacity regions, with each region being 2GiB in size.
QEMU-system-x86_64 -s -kernel /home/fan/cxl/repos/kdevops/linux-dcd/arch/x86/boot/bzImage -append root=/dev/sda rw console=ttyS0,115200 ignore_loglevel nokaslr \ cxl_acpi.dyndbg=+fplm cxl_pci.dyndbg=+fplm cxl_core.dyndbg=+fplm cxl_mem.dyndbg=+fplm cxl_pmem.dyndbg=+fplm cxl_port.dyndbg=+fplm cxl_region.dyndbg=+fplm \ cxl_test.dyndbg=+fplm cxl_mock.dyndbg=+fplm cxl_mock_mem.dyndbg=+fplm dax.dyndbg=+fplm dax_cxl.dyndbg=+fplm device_dax.dyndbg=+fplm \ -smp 1 -accel kvm -serial mon:stdio -nographic -qmp tcp:localhost:4444,server,wait=off -netdev user,id=network0,hostfwd=tcp::2024-:22 -device e1000,netdev=network0 \ -monitor telnet:127.0.0.1:12345,server,nowait -drive file=/home/fan/cxl/images/QEMU-image.img,index=0,media=disk,format=raw \ -machine q35,cxl=on -m 8G,maxmem=32G,slots=8 -virtfs local,path=/lib/modules,mount_tag=modshare,security_model=mapped \ -virtfs local,path=/home/fan,mount_tag=homeshare,security_model=mapped \ -device pxb-cxl,bus_nr=12,bus=pcie.0,id=cxl.1 -device cxl-rp,port=13,bus=cxl.1,id=root_port13,chassis=0,slot=2 \ -object memory-backend-file,id=dhmem0,share=on,mem-path=/tmp/dhmem0.raw,size=4G \ -object memory-backend-file,id=lsa0,share=on,mem-path=/tmp/lsa0.raw,size=512M \ -device cxl-type3,bus=root_port13,volatile-dc-memdev=dhmem0,num-dc-regions=2,id=cxl-memdev0 \ -M cxl-fmw.0.targets.0=cxl.1,cxl-fmw.0.size=4G,cxl-fmw.0.interleave-granularity=8K
After the guest VM is started, we can install ndctl tool for managing the CXL device.
note: Following steps happen in QEMU VM.
git clone https://github.com/pmem/ndctl.git cd ndctl; meson setup build; meson compile -C build; meson install -C build
After a successful compile, three tools will be generated under the build directory.
root@DT:~/ndctl# ls build/daxctl/daxctl -lh -rwxr-xr-x 1 root root 181K Nov 30 22:04 build/daxctl/daxctl root@DT:~/ndctl# ls build/cxl/cxl -lh -rwxr-xr-x 1 root root 318K Nov 30 22:04 build/cxl/cxl root@DT:~/ndctl# ls build/daxctl/daxctl -lh -rwxr-xr-x 1 root root 181K Nov 30 22:04 build/daxctl/daxctl
modprobe -a cxl_acpi cxl_core cxl_pci cxl_port cxl_mem cxl_pmem root@DT:~# cxl list -u { "memdev":"mem0", "pmem_size":"512.00 MiB (536.87 MB)", "serial":"0", "host":"0000:0d:00.0" }
Create a cxl region:
cxl create-region -m -d decoder0.0 -w 1 mem0 -s 512M { "region":"region0", "resource":"0xa90000000", "size":"512.00 MiB (536.87 MB)", "interleave_ways":1, "interleave_granularity":256, "decode_state":"commit", "mappings":[ { "position":0, "memdev":"mem0", "decoder":"decoder2.0" } ] } cxl region: cmd_create_region: created 1 region
Create a namespace for the region:
ndctl create-namespace -m dax -r region0 { "dev":"namespace0.0", "mode":"devdax", "map":"dev", "size":257949696, "uuid":"8fb092ba-a4ef-4a9a-8d83-d022c518ddf7", "daxregion":{ "id":0, "size":257949696, "align":2097152, "devices":[ { "chardev":"dax0.0", "size":257949696, "target_node":1, "align":2097152, "mode":"devdax" } ] }, "align":2097152 }
Converting a regular devdax mode device to system-ram mode with daxctl:
daxctl reconfigure-device --mode=system-ram --no-online dax0.0 reconfigured 1 device [ { "chardev":"dax0.0", "size":257949696, "target_node":1, "align":2097152, "mode":"system-ram", "online_memblocks":0, "total_memblocks":1 } ]
Show system memory:
lsmem RANGE SIZE STATE REMOVABLE BLOCK 0x0000000000000000-0x000000007fffffff 2G online yes 0-15 0x0000000100000000-0x000000027fffffff 6G online yes 32-79 0x0000000a98000000-0x0000000a9fffffff 128M offline 339 Memory block size: 128M Total online memory: 8G Total offline memory: 128M
After that, we can see a new 128M memory block has shown up.
- wiki: https://en.wikipedia.org/wiki/Compute_Express_Link
- Setting up QEMU emulation of CXL
- QEMU CXL Page: Compute Express Link (CXL)
- CXL mailing list: https://lore.kernel.org/linux-cxl/