nvmf setup experiments - animeshtrivedi/notes GitHub Wiki
Deploying nvmf between a pair of VMs
Using node1
to setup two VM setting. /dev/sdb
is the new flash device, 4TB which is mounted at /mnt/sdb/
inside which I have atr
folder where images are. I used the stosys VM image. Made a base image then use a diff to boot from it.
I followed the instructions from here:
Setup host bridge
Follow: https://futurewei-cloud.github.io/ARM-Datacenter/qemu/network-aarch64-qemu-guests/
On the host Linux machine:
sudo ip link add br0 type bridge
sudo ip addr add 192.168.0.1/24 dev br0
sudo ip link set br0 up
# I put the echo in the local file
echo 'allow br0' | sudo tee -a /home/atr/src/qemu-6.1.0/etc/qemu/bridge.conf
Then one can add a virt-io device to the VM.
-netdev bridge,id=hn1,br=br??? -device virtio-net,netdev=hn1,mac=e6:c8:ff:09:76:99
When starting the VM if you get (because you have not installed the QEMU in the global path name and config file)
qemu-system-x86_64: bridge helper failed
Then you have to pass the helper script address, something like this
-netdev bridge,id=hn0,br=br1,helper=/home/atr/src/qemu-6.1.0/build/qemu-bridge-helper
See https://lists.gnu.org/archive/html/qemu-discuss/2021-05/msg00069.html
Other references
Create a base image and do a diff boot
From a base image, make two images:
sudo qemu-img create -f qcow2 -F qcow2 -b ./RO-ubuntu-20.04-stosys-v5.12.qcow ./vm-initiator.qcow
sudo qemu-img create -f qcow2 -F qcow2 -b ./RO-ubuntu-20.04-stosys-v5.12.qcow ./vm-target.qcow
Then use these diff images:
Target boot:
sudo /home/atr/src/qemu-6.1.0/build//qemu-system-x86_64 -name qemuzns -m 32G --enable-kvm -cpu host -smp 4 -hda /mnt/sdb/atr/vm-target.qcow -net user,hostfwd=tcp::7777-:22 -net nic -drive file=/mnt/sdb/atr/nvmessd-4G.img,id=nvme-device,format=raw,if=none -device nvme,drive=nvme-device,serial=nvme-dev,physical_block_size=512,logical_block_size=512 -drive file=/mnt/sdb/atr/nvmessd2-4G.img,id=nvme-device2,format=raw,if=none -device nvme,drive=nvme-device2,serial=nvme-dev2,physical_block_size=512,logical_block_size=512 -netdev bridge,id=hn0,br=br1,helper=/home/atr/src/qemu-6.1.0/build/qemu-bridge-helper -device virtio-net-pci,netdev=hn0,id=nic1,mac=e6:c8:ff:09:76:99 --daemonize
Initiator boot:
/home/atr/src/qemu-6.1.0/build//qemu-system-x86_64 -name qemuzns2 -m 32G --enable-kvm -cpu host -smp 4 -hda /mnt/sdb/atr/vm-initiator.qcow -net user,hostfwd=tcp::8888-:22 -net nic -netdev bridge,id=hn0,br=br1,helper=/home/atr/src/qemu-6.1.0/build/qemu-bridge-helper -device virtio-net-pci,netdev=hn0,id=nic1,mac=e6:c8:ff:09:76:9c -daemonize
Some issues:
- Always specify mac address, the default MAC address generated is the same. In that case the host bridge does not know how to forward packets
Setup hostname
https://www.cyberciti.biz/faq/ubuntu-20-04-lts-change-hostname-permanently/
sudo hostnamectl set-hostname newNameHere
# change all references here
sudo vim /etc/hosts
sudo reboot
Assigning static IPs
NIC=ens6
sudo ip addr add 190.160.10.8/24 dev $NIC
sudo ip link set $NIC up
Alternate way, https://bytefreaks.net/gnulinux/how-to-set-a-static-ip-address-from-the-command-line-in-gnulinux-using-ifconfig-and-route
sudo ifconfig eth0 192.168.1.2 netmask 255.255.255.0;
sudo route add default gw 192.168.1.1 eth0;
I have not looked into the netplan: https://linuxconfig.org/how-to-configure-static-ip-address-on-ubuntu-18-10-cosmic-cuttlefish-linux
Other issues:
- terminator typing double chars, https://askubuntu.com/questions/1189243/why-is-terminator-sending-double-characters-to-terminals-that-arent-selected-in
- netperf git address: https://github.com/HewlettPackard/netperf
Setting up NVM-Fabrics:
Following: https://futurewei-cloud.github.io/ARM-Datacenter/qemu/nvme-of-tcp-vms/
Target:
sudo modprobe nvmet
sudo modprobe nvmet-tcp
cd /sys/kernel/config/nvmet/subsystems
sudo mkdir nvme-test-target
cd nvme-test-target/
echo 1 | sudo tee -a attr_allow_any_host > /dev/null
sudo mkdir namespaces/1
cd namespaces/1
echo -n /dev/nvme0n1 |sudo tee -a device_path > /dev/null
echo 1|sudo tee -a enable > /dev/null
sudo mkdir /sys/kernel/config/nvmet/ports/1
cd /sys/kernel/config/nvmet/ports/1
echo 192.168.0.16 |sudo tee -a addr_traddr > /dev/null
echo tcp|sudo tee -a addr_trtype > /dev/null
echo 4420|sudo tee -a addr_trsvcid > /dev/null
echo ipv4|sudo tee -a addr_adrfam > /dev/null
sudo ln -s /sys/kernel/config/nvmet/subsystems/nvme-test-target/ /sys/kernel/config/nvmet/ports/1/subsystems/nvme-test-target
Initiator:
sudo modprobe nvme
sudo modprobe nvme-tcp
sudo nvme discover -t tcp -a 192.168.0.16 -s 4420 --hostnqn=nqn.2014-08.org.nvmexpress:uuid:1b4e28ba-2fa1-11d2-883f-0016d3ccabcd
sudo nvme connect -t tcp -n nvme-test-target -a 192.168.0.16 -s 4420 --hostnqn=nqn.2014-08.org.nvmexpress:uuid:1b4e28ba-2fa1-11d2-883f-0016d3ccabcd
sudo nvme list
sudo nvme disconnect /dev/nvme0n1 -n nvme-test-target
See the file https://github.com/animeshtrivedi/utilities/tree/master/qemu/ files for these scripts.
NVMe-InfiniBand setup: https://www.linuxjournal.com/content/data-flash-part-ii-using-nvme-drives-and-creating-nvme-over-fabrics-network and TCP: https://www.linuxjournal.com/content/data-flash-part-iii-nvme-over-fabrics-using-tcp (another TCP example: https://blogs.oracle.com/linux/post/nvme-over-tcp)
Unresolved issue on 5.19 kernel with initiator:
atr@node6:~$ sudo nvme connect -t tcp -a node3 -s 4420 --hostnqn=nqn.2014-08.org.nvmexpress:uuid:1b4e28ba-2fa1-11d2-883f-0016d3ccabcd -n nvme-atr-target-1gbps
Failed to write to /dev/nvme-fabrics: Invalid argument
no controller found: failed to write to nvme-fabrics device
(reverse-i-search)`ta': sudo nvme connect -t tcp -a node3 -s 4420 --hostnqn=nqn.2014-08.org.nvmexpress:uuid:1b4e28ba-2fa1-11d2-883f-0016d3ccabcd -n nvme-atr-^Crget-1gbps
atr@node6:~$ sudo tail -f /var/log/kern.log
Nov 8 13:19:27 node6 kernel: [2757932.499818] nvme2: Admin Cmd(0x6), I/O Error (sct 0x0 / sc 0x2) DNR
Nov 8 13:19:27 node6 kernel: [2757932.500916] nvme3: Admin Cmd(0x6), I/O Error (sct 0x0 / sc 0x2) DNR
Nov 8 13:19:27 node6 kernel: [2757932.518615] nvme nvme6: Invalid MNAN value 1024
Filebench issues
Filebench quick reading: https://www.usenix.org/system/files/login/articles/login_spring16_02_tarasov.pdf
Compiling filebench:
gcc: error: proto3-lexer.c: No such file or directory
gcc: fatal error: no input files
#Then
sudo apt install bison
sudo apt install flex
# reconfigure project
autoreconf -i
./configure
make
Filebench issue
11.060: Unexpected Process termination Code 3, Errno 0 around line 10
12.060: Run took 1 seconds...
12.060: NO VALID RESULTS! Filebench run terminated prematurely around line 10
Then disable randomized address space, https://github.com/filebench/filebench/issues/112
root# echo 0 > /proc/sys/kernel/randomize_va_space
could not obtain a file.
https://github.com/filebench/filebench/issues/127- All ops: https://github.com/filebench/filebench/wiki/Workload-model-language
- Workload personalities: https://github.com/filebench/filebench/wiki/Predefined-personalities
- Workload examples codes: https://github.com/filebench/filebench/tree/master/workloads
- Large files and shared memory issue: https://stackoverflow.com/questions/22688406/filebench-error-out-of-shared-memory-when-i-try-to-set-nfiles-a-very-big-numbe and https://github.com/filebench/filebench/issues/90
- I am not completely sure of dirwidth issue: https://github.com/filebench/filebench/issues/98
I could not run filebench reliably, so I wrote a small program for myself to understand.
- fsync man page, https://man7.org/linux/man-pages/man2/fsync.2.html
Misc
- Linux net namespace, https://linuxhint.com/use-linux-network-namespace/