zns - animeshtrivedi/notes GitHub Wiki
- They say that 5.4 should have all the support. NVMe comes after 5.9 http://zonedstorage.io/linux/overview/#linux-zoned-storage-support-overview
The kernel build is badly broken, tried multiple things, now doing a deb package build and see if that succeeds.
Unlike SMR disks, NVMe ZNS do not mix conventional and SMR zones in a same namespace: http://zonedstorage.io/introduction/zns/ . A single drive, or controller can have different namespaces, one with the conventional tracks and the other with ZNS tracks.
http://zonedstorage.io/introduction/zns/
-
ZAC/ZBA commands: zone size as the total number of logical blocks
-
NVMe ZNS: zone capacity, the number of usable logical blocks within each zone, capacity <= size. Why this distinction? GC issues? Other internal provisioning?
- This new attribute was introduced to allow for the zone size to remain a power of two number of logical blocks (facilitating easy logical block to zone number conversions) while allowing optimized mapping of a zone storage capacity to the underlying media characteristics. For instance, in the case a flash based device, a zone capacity can be aligned to the size of flash erase blocks without requiring that the device implements a power-of-two sized erased block.
-
Hence, an actual NVMe ZNS size is actually the sum of individual "capacities" not the sizes of different zones.
-
Limits:
- General ZNS limit: simultaneously in the implicit open or explicit open conditions
- NVMe ZNS limit: Active zones, which are number of zones that can be in the implicit open, explicit open or closed conditions. limit on the maximum number of "Active" zones.
- the maximum number of active zones imposes a limit on the number of zones that an application can choose for storing data. How do you deal with it? Isn't this the maximum amount of data storage possible?
-
ZNS NVMe: Zone Append -- special nameless write type command, that let the device have multiple outstanding requests, and let the host know the written location instead of the host telling the device where to write.
- Based on the command responses in the zone append, the host can discover the write order. In a case where the host was to force a certain order, the host must issue command one by one with the right zone write pointer offset, otherwise fail.
- So in the normal case, tracking write point offset is the responsibility of the host? I think it can be queried from the device.
- Based on the command responses in the zone append, the host can discover the write order. In a case where the host was to force a certain order, the host must issue command one by one with the right zone write pointer offset, otherwise fail.
-
ZONE APPEND: is a cool feature. The point being that the device is doing the block allocation for you, and freeing the CPU. Shows in the performance gain, see this https://www.youtube.com/watch?v=9yVWb3rbces. Build a tx log, or other applications? Build a single machine Tango design.
-
Write Ordering Control -- so now the new problem, with the zone writing on SCSI devices - if the kernel reorders the writes then the offsets will be wrong, and hence, failed writes. To avoid this, there is a serialization point in the kernel, which is now only enabled with mq-deadline scheduled, not noop, http://zonedstorage.io/getting-started/prerequisite/
- How does this work with NVMe? With deep queues?
- util-linux is missing blkzone command on Ubuntu 18
-
What is
libzbc
--> libzbc is a user space library providing functions for manipulating ZBC and ZAC disks.
git clone https://github.com/karelzak/util-linux.git
# install autopoint and then usual configure --prefix=/home/atr/local; make; make install
# make install fails due to
# chgrp tty /home/atr/local/bin/wall # atr not being part of the tty group
# add an existing user to an exisiting group, https://askubuntu.com/questions/79565/how-to-add-existing-user-to-an-existing-group
usermod -a -G tty atr
#only comes into effect log out :(
# for now I have added the build path in the $PATH, works
# I installed this for /home/atr/vu/github/util-linux//blkzone command (and other new utilities?)
# modprobe null_blk nr_devices=1 zoned=1 # works
# modinfo
...
parm: no_sched:No io scheduler (int)
parm: submit_queues:Number of submission queues (int)
parm: home_node:Home node for the device (int)
parm: queue_mode:Block interface to use (0=bio,1=rq,2=multiqueue)
parm: gb:Size in GB (int)
parm: bs:Block size (in bytes) (int)
parm: nr_devices:Number of devices to register (uint)
parm: blocking:Register as a blocking blk-mq driver device (bool)
parm: shared_tags:Share tag set between devices for blk-mq (bool)
parm: irqmode:IRQ completion handler. 0-none, 1-softirq, 2-timer
parm: completion_nsec:Time in ns to complete a request in hardware. Default: 10,000ns (ulong)
parm: hw_queue_depth:Queue depth for each hardware queue. Default: 64 (int)
parm: use_per_node_hctx:Use per-node allocation for hardware context queues. Default: false (bool)
parm: zoned:Make device as a host-managed zoned block device. Default: false (bool)
parm: zone_size:Zone size in MB when block device is zoned. Must be power-of-two: Default: 256 (ulong)
parm: zone_nr_conv:Number of conventional zones when block device is zoned. Default: 0 (uint)
Features
atr@atr-XPS-13:~$ cat /sys/kernel/config/nullb/features
memory_backed,discard,bandwidth,cache,badblocks,zoned,zone_size
atr@atr-XPS-13:~$
# location of the script, http://zonedstorage.io/getting-started/nullblk/
atr@atr-XPS-13:~/local/bin$ ./create-nullblk-zone.sh 4096 64 4 8
atr@atr-XPS-13:~$ sudo /home/atr/vu/github/util-linux//blkzone report /dev/nullb1
start: 0x000000000, len 0x020000, cap 0x020000, wptr 0x000000 reset:0 non-seq:0, zcond: 0(nw) [type: 1(CONVENTIONAL)]
start: 0x000020000, len 0x020000, cap 0x020000, wptr 0x000000 reset:0 non-seq:0, zcond: 0(nw) [type: 1(CONVENTIONAL)]
start: 0x000040000, len 0x020000, cap 0x020000, wptr 0x000000 reset:0 non-seq:0, zcond: 0(nw) [type: 1(CONVENTIONAL)]
start: 0x000060000, len 0x020000, cap 0x020000, wptr 0x000000 reset:0 non-seq:0, zcond: 0(nw) [type: 1(CONVENTIONAL)]
start: 0x000080000, len 0x020000, cap 0x020000, wptr 0x000000 reset:0 non-seq:0, zcond: 1(em) [type: 2(SEQ_WRITE_REQUIRED)]
start: 0x0000a0000, len 0x020000, cap 0x020000, wptr 0x000000 reset:0 non-seq:0, zcond: 1(em) [type: 2(SEQ_WRITE_REQUIRED)]
start: 0x0000c0000, len 0x020000, cap 0x020000, wptr 0x000000 reset:0 non-seq:0, zcond: 1(em) [type: 2(SEQ_WRITE_REQUIRED)]
start: 0x0000e0000, len 0x020000, cap 0x020000, wptr 0x000000 reset:0 non-seq:0, zcond: 1(em) [type: 2(SEQ_WRITE_REQUIRED)]
start: 0x000100000, len 0x020000, cap 0x020000, wptr 0x000000 reset:0 non-seq:0, zcond: 1(em) [type: 2(SEQ_WRITE_REQUIRED)]
start: 0x000120000, len 0x020000, cap 0x020000, wptr 0x000000 reset:0 non-seq:0, zcond: 1(em) [type: 2(SEQ_WRITE_REQUIRED)]
start: 0x000140000, len 0x020000, cap 0x020000, wptr 0x000000 reset:0 non-seq:0, zcond: 1(em) [type: 2(SEQ_WRITE_REQUIRED)]
start: 0x000160000, len 0x020000, cap 0x020000, wptr 0x000000 reset:0 non-seq:0, zcond: 1(em) [type: 2(SEQ_WRITE_REQUIRED)]
# example of configuration space,
atr@atr-XPS-13:~$ cat /sys/block/nullb0/queue/zoned
host-managed
atr@atr-XPS-13:~$
installed nvme-cli tools, the old one do not have support for ZNS
/home/atr/vu/github/storage/nvme-cli
atr@atr-XPS-13:~/vu/github/storage/nvme-cli$ nvme zns
nvme-1.14
usage: nvme zns <command> [<device>] [<args>]
The '<device>' may be either an NVMe character device (ex: /dev/nvme0) or an
nvme block device (ex: /dev/nvme0n1).
Zoned Namespace Command Set
The following are all implemented sub-commands:
id-ctrl Retrieve ZNS controller identification
id-ns Retrieve ZNS namespace identification
zone-mgmt-recv Sends the zone management receive command
zone-mgmt-send Sends the zone management send command
report-zones Retrieve the Report Zones report
close-zone Closes one or more zones
finish-zone Finishes one or more zones
open-zone Opens one or more zones
reset-zone Resets one or more zones
offline-zone Offlines one or more zones
set-zone-desc Attaches zone descriptor extension data
zone-append Writes data and metadata (if applicable), appended to the end of the requested zone
changed-zone-list Retrieves the changed zone list log
version Shows the program version
help Display this help
See 'nvme zns help <command>' for more information on a specific command
atr@atr-XPS-13:~/vu/github/storage/nvme-cli$
With the new compiled nvme commands
The nullb0 device is not recognize as a NVMe device, hence, no interaction with the nvme zns command.
Links
- RocksDB used zenfs which internally uses libzbd.
- https://github.com/westerndigitalcorporation/libzbd
- https://github.com/westerndigitalcorporation/zenfs Follow the instructions and it mostly worked.
sudo apt install autoconf
sudo apt-get install libgflags-dev
sudo apt-get install libtool
sudo apt install autoconf-archive
The autoconf-archive is confusing...without it got error like
atr@stosys-qemu-vm:/home/atr/src/libzbd$ ./configure
checking for a BSD-compatible install... /usr/bin/install -c
checking whether build environment is sane... yes
checking for a thread-safe mkdir -p... /usr/bin/mkdir -p
checking for gawk... gawk
checking whether make sets $(MAKE)... yes
checking whether make supports nested variables... yes
checking whether make supports nested variables... (cached) yes
checking for gcc... gcc
checking whether the C compiler works... yes
checking for C compiler default output file name... a.out
checking for suffix of executables...
checking whether we are cross compiling... no
checking for suffix of object files... o
checking whether we are using the GNU C compiler... yes
checking whether gcc accepts -g... yes
checking for gcc option to accept ISO C89... none needed
checking whether gcc understands -c and -o together... yes
checking whether make supports the include directive... yes (GNU style)
checking dependency style of gcc... gcc3
checking how to run the C preprocessor... gcc -E
checking for grep that handles long lines and -e... /usr/bin/grep
checking for egrep... /usr/bin/grep -E
checking for ANSI C header files... yes
checking for sys/types.h... yes
checking for sys/stat.h... yes
checking for stdlib.h... yes
checking for string.h... yes
checking for memory.h... yes
checking for strings.h... yes
checking for inttypes.h... yes
checking for stdint.h... yes
checking for unistd.h... yes
checking minix/config.h usability... no
checking minix/config.h presence... no
checking for minix/config.h... no
checking whether it is safe to define __EXTENSIONS__... yes
checking for special C compiler options needed for large files... no
checking for _FILE_OFFSET_BITS value needed for large files... no
checking for ar... ar
checking the archiver (ar) interface... ar
checking build system type... x86_64-pc-linux-gnu
checking host system type... x86_64-pc-linux-gnu
checking how to print strings... printf
checking for a sed that does not truncate output... /usr/bin/sed
checking for fgrep... /usr/bin/grep -F
checking for ld used by gcc... /usr/bin/ld
checking if the linker (/usr/bin/ld) is GNU ld... yes
checking for BSD- or MS-compatible name lister (nm)... /usr/bin/nm -B
checking the name lister (/usr/bin/nm -B) interface... BSD nm
checking whether ln -s works... yes
checking the maximum length of command line arguments... 1572864
checking how to convert x86_64-pc-linux-gnu file names to x86_64-pc-linux-gnu format... func_convert_file_noop
checking how to convert x86_64-pc-linux-gnu file names to toolchain format... func_convert_file_noop
checking for /usr/bin/ld option to reload object files... -r
checking for objdump... objdump
checking how to recognize dependent libraries... pass_all
checking for dlltool... no
checking how to associate runtime and link libraries... printf %s\n
checking for archiver @FILE support... @
checking for strip... strip
checking for ranlib... ranlib
checking command to parse /usr/bin/nm -B output from gcc object... ok
checking for sysroot... no
checking for a working dd... /usr/bin/dd
checking how to truncate binary pipes... /usr/bin/dd bs=4096 count=1
checking for mt... mt
checking if mt is a manifest tool... no
checking for dlfcn.h... yes
checking for objdir... .libs
checking if gcc supports -fno-rtti -fno-exceptions... no
checking for gcc option to produce PIC... -fPIC -DPIC
checking if gcc PIC flag -fPIC -DPIC works... yes
checking if gcc static flag -static works... yes
checking if gcc supports -c -o file.o... yes
checking if gcc supports -c -o file.o... (cached) yes
checking whether the gcc linker (/usr/bin/ld -m elf_x86_64) supports shared libraries... yes
checking whether -lc should be explicitly linked in... no
checking dynamic linker characteristics... GNU/Linux ld.so
checking how to hardcode library paths into programs... immediate
checking whether stripping libraries is possible... yes
checking if libtool supports shared libraries... yes
checking whether to build shared libraries... yes
checking whether to build static libraries... yes
./configure: line 9226: AX_PTHREAD: command not found
checking for rpmbuild... notfound
checking for rpm... notfound
checking for libgen.h... no
configure: error: Couldn't find libgen.h
atr@stosys-qemu-vm:/home/atr/src/libzbd$ locate libgen.h
/usr/include/libgen.h
Hint: https://github.com/ANSSI-FR/libapn/issues/1
$# Update the make file to locate the path of the library
atr@atr-XPS-13:~/vu/github/storage/rocksdb$ git diff
diff --git a/Makefile b/Makefile
index 8571facaa..67dda5456 100644
--- a/Makefile
+++ b/Makefile
@@ -252,8 +252,8 @@ LIB_SOURCES += utilities/env_librados.cc
LDFLAGS += -lrados
endif
-AM_LINK = $(AM_V_CCLD)$(CXX) -L. $(patsubst lib%.a, -l%, $(patsubst lib%.$(PLATFORM_SHARED_EXT), -l%, $^)) $(EXEC_LDFLAGS) -o $@ $(LDFLAGS) $(COVERAGEFLAGS)
-AM_SHARE = $(AM_V_CCLD) $(CXX) $(PLATFORM_SHARED_LDFLAGS)$@ -L. $(patsubst lib%.$(PLATFORM_SHARED_EXT), -l%, $^) $(LDFLAGS) -o $@
+AM_LINK = $(AM_V_CCLD)$(CXX) -L/home/atr/local/lib -L/home/atr/local/usr/local/lib/ -L. $(patsubst lib%.a, -l%, $(patsubst lib%.$(PLATFORM_SHARED_EXT), -l%, $^)) $(EXEC_LDFLAGS) -o $@ $(LDFLAGS) $(COVERAGEFLAGS)
+AM_SHARE = $(AM_V_CCLD) $(CXX) $(PLATFORM_SHARED_LDFLAGS)$@ -L/home/atr/local/lib -L/home/atr/local/usr/local/lib/ -L. $(patsubst lib%.$(PLATFORM_SHARED_EXT), -l%, $^) $(LDFLAGS) -o $@
# Detect what platform we're building on.
# Export some common variables that might have been passed as Make variables
@@ -1455,7 +1455,7 @@ librocksdb_env_basic_test.a: $(OBJ_DIR)/env/env_basic_test.o $(LIB_OBJECTS) $(TE
$(AM_V_at)$(AR) $(ARFLAGS) $@ $^
db_bench: $(OBJ_DIR)/tools/db_bench.o $(BENCH_OBJECTS) $(TESTUTIL) $(LIBRARY)
- $(AM_LINK)
+ $(AM_LINK) -Wl,-rpath,/home/atr/local/lib/ -Wl,-rpath,/home/atr/local/usr/local/lib/
trace_analyzer: $(OBJ_DIR)/tools/trace_analyzer.o $(ANALYZE_OBJECTS) $(TOOLS_LIBRARY) $(LIBRARY)
$(AM_LINK)
atr@atr-XPS-13:~/vu/github/storage/rocksdb$
Building and running
$ DEBUG_LEVEL=0 ROCKSDB_PLUGINS=zenfs DESTDIR=/home/atr/local/ make -j db_bench install
$ sudo ./db_bench --fs_uri=zenfs://dev:nullb0 --benchmarks=fillrandom --use_direct_io_for_flush_and_compaction --compression_type=none
Some more issues, if I do not do an install then failed compilation like this:
atr@stosys-qemu-vm:/home/atr/src/rocksdb/plugin/zenfs/util$ make
Package rocksdb was not found in the pkg-config search path.
Perhaps you should add the directory containing `rocksdb.pc'
to the PKG_CONFIG_PATH environment variable
No package 'rocksdb' found
Package rocksdb was not found in the pkg-config search path.
Perhaps you should add the directory containing `rocksdb.pc'
to the PKG_CONFIG_PATH environment variable
No package 'rocksdb' found
g++ -o zenfs zenfs.cc
zenfs.cc:15:10: fatal error: rocksdb/file_system.h: No such file or directory
15 | #include <rocksdb/file_system.h>
| ^~~~~~~~~~~~~~~~~~~~~~~
compilation terminated.
make: *** [Makefile:12: zenfs] Error 1
atr@stosys-qemu-vm:/home/atr/src/rocksdb/plugin/zenfs/util$ cd -
What did not work
so much pain, so much pain
- gcc option is -I (modify it in the Makefile, but somehow it did not pick properly)
- export CPATH=:/home/atr/local/include/:/home/atr/local/usr/local/include/: (put this in the bashrc)
https://github.com/facebook/rocksdb/blob/master/INSTALL.md
$DEBUG_LEVEL=0 ROCKSDB_PLUGINS=zenfs make -j db_bench #(i skipped the install step, but include db_bench)
Some part of bash/sudo can be preserved with sudo env "PATH=$PATH;LD_LIBRARY_PATH=$LD_LIBRARY_PATH" zbd
atr@node3:/home/atr/src/storage/rocksdb$ ldd ./plugin/zenfs/util/zenfs
linux-vdso.so.1 (0x00007fff8d54c000)
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007fc3b17b4000)
libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007fc3b17ae000)
libgflags.so.2.2 => /lib/x86_64-linux-gnu/libgflags.so.2.2 (0x00007fc3b1783000)
libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007fc3b1767000)
libzbd-1.3.0.so => /home/atr/local/lib/libzbd-1.3.0.so (0x00007fc3b175e000)
libstdc++.so.6 => /lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007fc3b157b000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007fc3b142c000)
libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007fc3b1411000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fc3b121f000)
/lib64/ld-linux-x86-64.so.2 (0x00007fc3b1ca1000)
atr@node3:/home/atr/src/storage/rocksdb$ sudo ldd ./plugin/zenfs/util/zenfs
linux-vdso.so.1 (0x00007ffd17f9b000)
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007fb2f8f13000)
libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007fb2f8f0d000)
libgflags.so.2.2 => /lib/x86_64-linux-gnu/libgflags.so.2.2 (0x00007fb2f8ee2000)
libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007fb2f8ec6000)
libzbd-1.3.0.so => not found
libstdc++.so.6 => /lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007fb2f8ce3000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007fb2f8b94000)
libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007fb2f8b79000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fb2f8987000)
/lib64/ld-linux-x86-64.so.2 (0x00007fb2f9400000)
I had to modify the Makefile in zenfs/utils/Makefile
to explicity put in the path with the loader flags
-Wl,-rpath,/home/atr/local/lib/ -Wl,/home/atr/local/usr/local/lib/
atr@atr-XPS-13:~/vu/github/storage/rocksdb/plugin/zenfs$ git diff
diff --git a/util/Makefile b/util/Makefile
index 3bd0ea1..e3dc5b4 100644
--- a/util/Makefile
+++ b/util/Makefile
@@ -9,7 +9,7 @@ LIBS = $(shell pkg-config --static --libs rocksdb)
all: $(TARGET)
$(TARGET): $(TARGET).cc
- $(CC) $(CPPFLAGS) -o $(TARGET) $< $(LIBS)
+ $(CC) $(CPPFLAGS) -L/home/atr/local/lib/ -L/home/atr/local/usr/local/lib/ -Wl,-rpath=/home/atr/local/lib -o $(TARGET) $< $(LIBS)
clean:
$(RM) $(TARGET)
atr@atr-XPS-13:~/vu/github/storage/rocksdb/plugin/zenfs$
Again there was a bit of mess where to put these flags in the Makefile, AF_LINK flags from RocsDB file did not pick it up.
These build and compile dependencies are picked up (unsuccessfuly by zenfs building)
export PKG_CONFIG_PATH=/home/atr/local/usr/local/lib/pkgconfig/:$PKG_CONFIG_PATH
./plugin/zenfs/util/zenfs mkfs --zbd=/dev/<zoned block device> --aux-path=<path to store LOG and LOCK files>
What is the name you should pass here? there is a typo, it should be just --zbd=<zoned block device>
(without the dev)
Also there is a size issue,
atr@stosys-qemu-vm:/home/atr/src/rocksdb$ ~/src/zns-resources/scripts/create-nullblk-zone.sh
Usage: /home/atr//src/zns-resources/scripts/create-nullblk-zone.sh <sect size (B)> <zone size (MB)> <nr conv zones> <nr seq zones>
atr@stosys-qemu-vm:/home/atr/src/rocksdb$ sudo ~/src/zns-resources/scripts/create-nullblk-zone.sh 4096 1 0 8
Created /dev/nullb0
atr@stosys-qemu-vm:/home/atr/src/rocksdb$ sudo ./plugin/zenfs/util/zenfs mkfs --zbd=nullb0 --aux-path=/home/atr/rocksdb-aux-path/
Error: aux path exists
atr@stosys-qemu-vm:/home/atr/src/rocksdb$ sudo ./plugin/zenfs/util/zenfs mkfs --zbd=nullb0 --aux-path=/home/atr/rocksdb-aux-path-nullb0/
Failed to open zoned block device: nullb0, error: Not implemented: To few zones on zoned block device (32 required)
atr@stosys-qemu-vm:/home/atr/src/rocksdb$ sudo ~/src/zns-resources/scripts/destroy-nullblk-zone.sh
Usage: /home/atr//src/zns-resources/scripts/destroy-nullblk-zone.sh <nullb ID>
atr@stosys-qemu-vm:/home/atr/src/rocksdb$ sudo ~/src/zns-resources/scripts/destroy-nullblk-zone.sh nullb0
/dev/nullbnullb0: No such device
atr@stosys-qemu-vm:/home/atr/src/rocksdb$ sudo ~/src/zns-resources/scripts/destroy-nullblk-zone.sh 0
Destroyed /dev/nullb0
atr@stosys-qemu-vm:/home/atr/src/rocksdb$ sudo ~/src/zns-resources/scripts/create-nullblk-zone.sh 4096 1 0 32
Created /dev/nullb0
atr@stosys-qemu-vm:/home/atr/src/rocksdb$ sudo ./plugin/zenfs/util/zenfs mkfs --zbd=nullb0 --aux-path=/home/atr/rocksdb-aux-path-nullb0/
INFO: For ZBD nullb0, device scheduler is set to mq-deadline.
ZenFS file system created. Free space: 29 MB
atr@node3:/home/atr/src/storage/rocksdb$ sudo ./db_bench --fs_uri=zenfs://dev:nullb0 --benchmarks=fillrandom --use_direct_io_for_flush_and_compaction
I am here withthe path nullb0
HERE zdb_open with /dev/nullb0 ret code 3
Error: Not implemented: IO error: No such file or directory: While mkdir if missing: ~/tmp/: No such file or directory: zenfs://dev:nullb0
# At this point, I recreated the zenfs setup
atr@node3:/home/atr/src/storage/rocksdb$ sudo ./plugin/zenfs/util/zenfs mkfs --zbd=nullb0 --aux-path=/home/atr/tmp/zns/ --force
INFO: For ZBD nullb0, device scheduler is set to mq-deadline.
ZenFS file system created. Free space: 1856 MB
atr@node3:/home/atr/src/storage/rocksdb$ sudo ./db_bench --fs_uri=zenfs://dev:nullb0 --benchmarks=fillrandom --use_direct_io_for_flush_and_compaction
Initializing RocksDB Options from the specified file
Initializing RocksDB Options from command-line flags
open error: Invalid argument: Compression type Snappy is not linked with the binary.
https://github.com/facebook/rocksdb/issues/761
./db_bench --fs_uri=zenfs://dev:nullb0 --benchmarks=fillrandom --use_direct_io_for_flush_and_compaction --compression_type=none
atr@node3:/home/atr/src/storage/rocksdb$ sudo ./db_bench --fs_uri=zenfs://dev:nullb0 --benchmarks=fillrandom --use_direct_io_for_flush_and_compaction --compression_type=none
I am here withthe path nullb0
HERE zdb_open with /dev/nullb0 ret code 3
Initializing RocksDB Options from the specified file
Initializing RocksDB Options from command-line flags
RocksDB: version 6.19
Date: Sat May 1 09:56:17 2021
CPU: 40 * Intel(R) Xeon(R) Silver 4210R CPU @ 2.40GHz
CPUCache: 14080 KB
Keys: 16 bytes each (+ 0 bytes user-defined timestamp)
Values: 100 bytes each (50 bytes after compression)
Entries: 1000000
Prefix: 0 bytes
Keys per prefix: 0
RawSize: 110.6 MB (estimated)
FileSize: 62.9 MB (estimated)
Write rate: 0 bytes/second
Read rate: 0 ops/second
Compression: NoCompression
Compression sampling rate: 0
Memtablerep: skip_list
Perf Level: 1
------------------------------------------------
Initializing RocksDB Options from the specified file
Initializing RocksDB Options from command-line flags
DB path: [rocksdbtest/dbbench]
fillrandom : 3.258 micros/op 306884 ops/sec; 33.9 MB/s
atr@node3:/home/atr/src/storage/rocksdb$
Doing ls -l on the zenfs, kind of primitive and broken
atr@node3:/home/atr/src/storage/rocksdb/plugin/zenfs$ sudo ./util/zenfs list --zbd nullb0 --path rocksdbtest/dbbench/
HDHDHDHD
I am here withthe path nullb0
HERE zdb_open with /dev/nullb0 ret code 3
0 LOCK
31418 LOG
Failed to get size of file 000009.sst
$ #you can see more details from the zenfs log file which files are written on the zenfs
$ less /tmp/zenfs_nullb0_2021-05-04_09\:28\:20.log
atr@node3:/home/atr/src/storage/rocksdb/plugin/zenfs$ cat /tmp/zenfs_nullb0_2021-05-04_09\:28\:20.log | grep "New writable file"
2021/05/04-09:28:20.262233 7f170499aac0 [DEBUG] New writable file: rocksdbtest/dbbench/000000.dbtmp direct: 0
2021/05/04-09:28:20.262809 7f170499aac0 [DEBUG] New writable file: rocksdbtest/dbbench/MANIFEST-000001 direct: 0
2021/05/04-09:28:20.263134 7f170499aac0 [DEBUG] New writable file: rocksdbtest/dbbench/000001.dbtmp direct: 0
2021/05/04-09:28:20.266392 7f170499aac0 [DEBUG] New writable file: rocksdbtest/dbbench/MANIFEST-000004 direct: 0
2021/05/04-09:28:20.266729 7f170499aac0 [DEBUG] New writable file: rocksdbtest/dbbench/000004.dbtmp direct: 0
2021/05/04-09:28:20.267530 7f170499aac0 [DEBUG] New writable file: rocksdbtest/dbbench/000005.log direct: 0
2021/05/04-09:28:20.268298 7f170499aac0 [DEBUG] New writable file: rocksdbtest/dbbench/OPTIONS-000006.dbtmp direct: 0
2021/05/04-09:28:20.301842 7f170499aac0 [DEBUG] New writable file: rocksdbtest/dbbench/000000.dbtmp direct: 0
2021/05/04-09:28:20.302332 7f170499aac0 [DEBUG] New writable file: rocksdbtest/dbbench/MANIFEST-000001 direct: 0
2021/05/04-09:28:20.302628 7f170499aac0 [DEBUG] New writable file: rocksdbtest/dbbench/000001.dbtmp direct: 0
2021/05/04-09:28:20.305607 7f170499aac0 [DEBUG] New writable file: rocksdbtest/dbbench/MANIFEST-000004 direct: 0
2021/05/04-09:28:20.305901 7f170499aac0 [DEBUG] New writable file: rocksdbtest/dbbench/000004.dbtmp direct: 0
2021/05/04-09:28:20.306770 7f170499aac0 [DEBUG] New writable file: rocksdbtest/dbbench/000005.log direct: 0
2021/05/04-09:28:20.307551 7f170499aac0 [DEBUG] New writable file: rocksdbtest/dbbench/OPTIONS-000006.dbtmp direct: 0
2021/05/04-09:28:29.584170 7f16fa993700 [DEBUG] New writable file: rocksdbtest/dbbench/000008.log direct: 0
2021/05/04-09:28:29.584808 7f16fc196700 [DEBUG] New writable file: rocksdbtest/dbbench/000009.sst direct: 1
2021/05/04-09:28:38.192693 7f16fa993700 [DEBUG] New writable file: rocksdbtest/dbbench/000010.log direct: 0
2021/05/04-09:28:38.193608 7f16fc196700 [DEBUG] New writable file: rocksdbtest/dbbench/000011.sst direct: 1
atr@node3:/home/atr/src/storage/rocksdb/plugin/zenfs$
DB log
atr@node3:/home/atr/src/storage/rocksdb$ sudo ./db_bench --fs_uri=zenfs://dev:nullb0 --benchmarks=fillrandom --use_direct_io_for_flush_and_compaction --compression_type=none
[atr] Allocating a new NewZenFS from uri = zenfs://dev:nullb0 and dev = nullb0
[atr] readonly = 0 read_f_ 4 read_direct_f (why a second pointer) 5 and write_f_ 6 , info.mode is ZBD_DM_HOST_MANAGED
[atr] total number of zones reported are 40
[all] m = 0 and i = 0 , addr 00000000000000 and wp 00000067108864 capacity is 00000067108864 len is 00000067108864
[all] m = 0 and i = 1 , addr 00000067108864 and wp 00000134217728 capacity is 00000067108864 len is 00000067108864
[all] m = 0 and i = 2 , addr 00000134217728 and wp 00000201326592 capacity is 00000067108864 len is 00000067108864
[all] m = 0 and i = 3 , addr 00000201326592 and wp 00000268435456 capacity is 00000067108864 len is 00000067108864
[all] m = 0 and i = 4 , addr 00000268435456 and wp 00000335544320 capacity is 00000067108864 len is 00000067108864
[all] m = 0 and i = 5 , addr 00000335544320 and wp 00000402653184 capacity is 00000067108864 len is 00000067108864
[all] m = 0 and i = 6 , addr 00000402653184 and wp 00000469762048 capacity is 00000067108864 len is 00000067108864
[all] m = 0 and i = 7 , addr 00000469762048 and wp 00000536870912 capacity is 00000067108864 len is 00000067108864
[all] m = 0 and i = 8 , addr 00000536870912 and wp 00000537174016 capacity is 00000067108864 len is 00000067108864
[atr] metadata zone being pushed, start 00000536870912 and write pointers 00000537174016 written data: 00000000303104
[all] m = 1 and i = 9 , addr 00000603979776 and wp 00000603979776 capacity is 00000067108864 len is 00000067108864
[atr] metadata zone being pushed, start 00000603979776 and write pointers 00000603979776 written data: 00000000000000
[all] m = 2 and i = 10 , addr 00000671088640 and wp 00000671088640 capacity is 00000067108864 len is 00000067108864
[atr] metadata zone being pushed, start 00000671088640 and write pointers 00000671088640 written data: 00000000000000
Initializing RocksDB Options from the specified file
Initializing RocksDB Options from command-line flags
RocksDB: version 6.19
Date: Tue May 4 09:41:39 2021
CPU: 40 * Intel(R) Xeon(R) Silver 4210R CPU @ 2.40GHz
CPUCache: 14080 KB
Keys: 16 bytes each (+ 0 bytes user-defined timestamp)
Values: 100 bytes each (50 bytes after compression)
Entries: 1000000
Prefix: 0 bytes
Keys per prefix: 0
RawSize: 110.6 MB (estimated)
FileSize: 62.9 MB (estimated)
Write rate: 0 bytes/second
Read rate: 0 ops/second
Compression: NoCompression
Compression sampling rate: 0
Memtablerep: skip_list
Perf Level: 1
WARNING: Optimization is disabled: benchmarks unnecessarily slow
WARNING: Assertions are enabled; benchmarks unnecessarily slow
------------------------------------------------
Initializing RocksDB Options from the specified file
Initializing RocksDB Options from command-line flags
DB path: [rocksdbtest/dbbench]
fillrandom : 19.850 micros/op 50378 ops/sec; 5.6 MB/s
$
From the log, what was eventually written and where
2021/05/04-09:28:40.083910 7f170499aac0 ZenFS shutting down
2021/05/04-09:28:40.083918 7f170499aac0 [DEBUG] Zone 0x2C000000 used capacity: 304 bytes (0 MB)
2021/05/04-09:28:40.083923 7f170499aac0 [DEBUG] Zone 0x30000000 used capacity: 6152 bytes (0 MB)
2021/05/04-09:28:40.083927 7f170499aac0 [DEBUG] Zone 0x34000000 used capacity: 44053053 bytes (42 MB)
2021/05/04-09:28:40.083930 7f170499aac0 [DEBUG] Zone 0x38000000 used capacity: 58307167 bytes (55 MB)
2021/05/04-09:28:40.083936 7f170499aac0 Files:
2021/05/04-09:28:40.083942 7f170499aac0 rocksdbtest/dbbench/000009.sst sz: 43993188 lh: 3
2021/05/04-09:28:40.083946 7f170499aac0 Extent 0 {start=0x38000000, zone=14, len=43993188}
2021/05/04-09:28:40.083952 7f170499aac0 rocksdbtest/dbbench/000010.log sz: 14313979 lh: 2
2021/05/04-09:28:40.083955 7f170499aac0 Extent 0 {start=0x3a9f5000, zone=14, len=14313979}
2021/05/04-09:28:40.083959 7f170499aac0 rocksdbtest/dbbench/000011.sst sz: 44053053 lh: 3
2021/05/04-09:28:40.083963 7f170499aac0 Extent 0 {start=0x34000000, zone=13, len=44053053}
2021/05/04-09:28:40.083967 7f170499aac0 rocksdbtest/dbbench/CURRENT sz: 16 lh: 0
2021/05/04-09:28:40.083970 7f170499aac0 Extent 0 {start=0x30000000, zone=12, len=16}
2021/05/04-09:28:40.083974 7f170499aac0 rocksdbtest/dbbench/IDENTITY sz: 37 lh: 0
2021/05/04-09:28:40.083977 7f170499aac0 Extent 0 {start=0x2c000000, zone=11, len=37}
2021/05/04-09:28:40.083981 7f170499aac0 rocksdbtest/dbbench/MANIFEST-000004 sz: 267 lh: 0
2021/05/04-09:28:40.083984 7f170499aac0 Extent 0 {start=0x2c003000, zone=11, len=57}
2021/05/04-09:28:40.083987 7f170499aac0 Extent 1 {start=0x2c004000, zone=11, len=104}
2021/05/04-09:28:40.083991 7f170499aac0 Extent 2 {start=0x2c005000, zone=11, len=106}
2021/05/04-09:28:40.083995 7f170499aac0 rocksdbtest/dbbench/OPTIONS-000007 sz: 6136 lh: 0
2021/05/04-09:28:40.083998 7f170499aac0 Extent 0 {start=0x30001000, zone=12, len=6136}
2021/05/04-09:28:40.084001 7f170499aac0 Sum of all files: 97 MB of data
- libzbd - libzbd is a user library providing functions for manipulating zoned block devices. http://zonedstorage.io/projects/libzbd/
- Unlike the libzbc library, libzbd does not implement direct command access to zoned block devices. Rather, libzbd uses the kernel provided zoned block device interface based on the ioctl() system call. A direct consequence of this is that libzbd will only allow access to zoned block devices supported by the kernel running. This includes both physical devices such as hard-disks supporting the ZBC and ZAC standards, as well as all logical block devices implemented by various device drivers such as nullblk and device mapper drivers.
- Hence, whatever kernel supports we get that even though device might have supported for newer commands and standards.
- zenfs uses this, https://github.com/westerndigitalcorporation/zenfs
- Zenfs says that: ZenFS depends on libzbd and Linux kernel 5.4 or later to perform zone management operations. To use ZenFS on SSDs with Zoned Namespaces kernel 5.9 or later is required.
- libzdb also have a nice GUI visualizer
-
sudo blkzone report /dev/nullb0
command is somehow broken in the size of zones it reports.
atr@node3:/home/atr/src/storage/rocksdb$ sudo blkzone report /dev/nullb0
start: 0x000000000, len 0x020000, wptr 0x000000 reset:0 non-seq:0, zcond: 0(nw) [type: 1(CONVENTIONAL)]
start: 0x000020000, len 0x020000, wptr 0x000000 reset:0 non-seq:0, zcond: 0(nw) [type: 1(CONVENTIONAL)]
start: 0x000040000, len 0x020000, wptr 0x000000 reset:0 non-seq:0, zcond: 0(nw) [type: 1(CONVENTIONAL)]
start: 0x000060000, len 0x020000, wptr 0x000000 reset:0 non-seq:0, zcond: 0(nw) [type: 1(CONVENTIONAL)]
start: 0x000080000, len 0x020000, wptr 0x000000 reset:0 non-seq:0, zcond: 0(nw) [type: 1(CONVENTIONAL)]
start: 0x0000a0000, len 0x020000, wptr 0x000000 reset:0 non-seq:0, zcond: 0(nw) [type: 1(CONVENTIONAL)]
start: 0x0000c0000, len 0x020000, wptr 0x000000 reset:0 non-seq:0, zcond: 0(nw) [type: 1(CONVENTIONAL)]
start: 0x0000e0000, len 0x020000, wptr 0x000000 reset:0 non-seq:0, zcond: 0(nw) [type: 1(CONVENTIONAL)]
start: 0x000100000, len 0x020000, wptr 0x000250 reset:0 non-seq:0, zcond: 2(oi) [type: 2(SEQ_WRITE_REQUIRED)]
start: 0x000120000, len 0x020000, wptr 0x000000 reset:0 non-seq:0, zcond: 1(em) [type: 2(SEQ_WRITE_REQUIRED)]
start: 0x000140000, len 0x020000, wptr 0x000000 reset:0 non-seq:0, zcond: 1(em) [type: 2(SEQ_WRITE_REQUIRED)]
start: 0x000160000, len 0x020000, wptr 0x000030 reset:0 non-seq:0, zcond: 2(oi) [type: 2(SEQ_WRITE_REQUIRED)]
start: 0x000180000, len 0x020000, wptr 0x000018 reset:0 non-seq:0, zcond: 2(oi) [type: 2(SEQ_WRITE_REQUIRED)]
start: 0x0001a0000, len 0x020000, wptr 0x015018 reset:0 non-seq:0, zcond: 2(oi) [type: 2(SEQ_WRITE_REQUIRED)]
start: 0x0001c0000, len 0x020000, wptr 0x01bce0 reset:0 non-seq:0, zcond: 2(oi) [type: 2(SEQ_WRITE_REQUIRED)]
start: 0x0001e0000, len 0x020000, wptr 0x01d7f8 reset:0 non-seq:0, zcond: 2(oi) [type: 2(SEQ_WRITE_REQUIRED)]
start: 0x000200000, len 0x020000, wptr 0x000000 reset:0 non-seq:0, zcond: 1(em) [type: 2(SEQ_WRITE_REQUIRED)]
start: 0x000220000, len 0x020000, wptr 0x000000 reset:0 non-seq:0, zcond: 1(em) [type: 2(SEQ_WRITE_REQUIRED)]
start: 0x000240000, len 0x020000, wptr 0x000000 reset:0 non-seq:0, zcond: 1(em) [type: 2(SEQ_WRITE_REQUIRED)]
start: 0x000260000, len 0x020000, wptr 0x000000 reset:0 non-seq:0, zcond: 1(em) [type: 2(SEQ_WRITE_REQUIRED)]
start: 0x000280000, len 0x020000, wptr 0x000000 reset:0 non-seq:0, zcond: 1(em) [type: 2(SEQ_WRITE_REQUIRED)]
start: 0x0002a0000, len 0x020000, wptr 0x000000 reset:0 non-seq:0, zcond: 1(em) [type: 2(SEQ_WRITE_REQUIRED)]
start: 0x0002c0000, len 0x020000, wptr 0x000000 reset:0 non-seq:0, zcond: 1(em) [type: 2(SEQ_WRITE_REQUIRED)]
start: 0x0002e0000, len 0x020000, wptr 0x000000 reset:0 non-seq:0, zcond: 1(em) [type: 2(SEQ_WRITE_REQUIRED)]
start: 0x000300000, len 0x020000, wptr 0x000000 reset:0 non-seq:0, zcond: 1(em) [type: 2(SEQ_WRITE_REQUIRED)]
start: 0x000320000, len 0x020000, wptr 0x000000 reset:0 non-seq:0, zcond: 1(em) [type: 2(SEQ_WRITE_REQUIRED)]
start: 0x000340000, len 0x020000, wptr 0x000000 reset:0 non-seq:0, zcond: 1(em) [type: 2(SEQ_WRITE_REQUIRED)]
start: 0x000360000, len 0x020000, wptr 0x000000 reset:0 non-seq:0, zcond: 1(em) [type: 2(SEQ_WRITE_REQUIRED)]
start: 0x000380000, len 0x020000, wptr 0x000000 reset:0 non-seq:0, zcond: 1(em) [type: 2(SEQ_WRITE_REQUIRED)]
start: 0x0003a0000, len 0x020000, wptr 0x000000 reset:0 non-seq:0, zcond: 1(em) [type: 2(SEQ_WRITE_REQUIRED)]
start: 0x0003c0000, len 0x020000, wptr 0x000000 reset:0 non-seq:0, zcond: 1(em) [type: 2(SEQ_WRITE_REQUIRED)]
start: 0x0003e0000, len 0x020000, wptr 0x000000 reset:0 non-seq:0, zcond: 1(em) [type: 2(SEQ_WRITE_REQUIRED)]
start: 0x000400000, len 0x020000, wptr 0x000000 reset:0 non-seq:0, zcond: 1(em) [type: 2(SEQ_WRITE_REQUIRED)]
start: 0x000420000, len 0x020000, wptr 0x000000 reset:0 non-seq:0, zcond: 1(em) [type: 2(SEQ_WRITE_REQUIRED)]
start: 0x000440000, len 0x020000, wptr 0x000000 reset:0 non-seq:0, zcond: 1(em) [type: 2(SEQ_WRITE_REQUIRED)]
start: 0x000460000, len 0x020000, wptr 0x000000 reset:0 non-seq:0, zcond: 1(em) [type: 2(SEQ_WRITE_REQUIRED)]
start: 0x000480000, len 0x020000, wptr 0x000000 reset:0 non-seq:0, zcond: 1(em) [type: 2(SEQ_WRITE_REQUIRED)]
start: 0x0004a0000, len 0x020000, wptr 0x000000 reset:0 non-seq:0, zcond: 1(em) [type: 2(SEQ_WRITE_REQUIRED)]
start: 0x0004c0000, len 0x020000, wptr 0x000000 reset:0 non-seq:0, zcond: 1(em) [type: 2(SEQ_WRITE_REQUIRED)]
start: 0x0004e0000, len 0x020000, wptr 0x000000 reset:0 non-seq:0, zcond: 1(em) [type: 2(SEQ_WRITE_REQUIRED)]
atr@node3:/home/atr/src/storage/rocksdb$
-
libzbc : http://zonedstorage.io/projects/libzbc/ libzbc is a user library providing functions for manipulating ZBC and ZAC disks. libzbc command implementation is compliant with the latest published versions of the ZBC and ZAC standards defined by INCITS technical committees T10 and T13 (respectively). See the image here: http://zonedstorage.io/assets/img/projects-libzbc.png
- See about the generic SCSI target /dev/sgX in the README, https://github.com/westerndigitalcorporation/libzbc#library-overview
- http://sg.danny.cz/sg/
- https://olegkutkov.me/2020/02/10/linux-block-device-driver/
- https://www.opennet.ru/docs/FAQ/OS/Linux/SCSI-Generic-FAQ.html
-
libnvme : in the same spirit libnvme is an open source user library providing defintions and functions for interacting with nvme devices. While nvme-cli provides convenient ways for a user to interact with nvme devices from the shell, libnvme provides similiar access for other programs. http://zonedstorage.io/projects/libnvme/
What does fio
supports, http://zonedstorage.io/benchmarking/fio/
- libzbc : https://github.com/axboe/fio/blob/master/engines/libzbc.c
- libzbd : https://github.com/axboe/fio/blob/master/zbd.h
git clone [email protected]:westerndigitalcorporation/libzbc.git
cd libzdb
atr@node1:/home/atr/src/storage/libzbc$ sh ./autogen.sh
atr@node1:/home/atr/src/storage/libzbc$ ./configure --prefix=/home/atr/local/
checking for a BSD-compatible install... /usr/bin/install -c
checking whether build environment is sane... yes
checking for a thread-safe mkdir -p... /usr/bin/mkdir -p
checking for gawk... gawk
checking whether make sets $(MAKE)... yes
checking whether make supports nested variables... yes
checking whether make supports nested variables... (cached) yes
./configure: line 2920: AX_RPM_INIT: command not found
./configure: line 2922: syntax error near unexpected token no,
./configure: line 2922: AX_CHECK_ENABLE_DEBUG(no, _DBG_)
atr@node1:/home/atr/src/storage/libzbc$ sudo apt install autoconf-archive
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following NEW packages will be installed:
autoconf-archive
0 upgraded, 1 newly installed, 0 to remove and 124 not upgraded.
Need to get 665 kB of archives.
After this operation, 5,894 kB of additional disk space will be used.
Get:1 http://archive.ubuntu.com/ubuntu focal/universe amd64 autoconf-archive all 20190106-2.1ubuntu1 [665 kB]
Fetched 665 kB in 0s (9,783 kB/s)
Selecting previously unselected package autoconf-archive.
(Reading database ... 221809 files and directories currently installed.)
Preparing to unpack .../autoconf-archive_20190106-2.1ubuntu1_all.deb ...
Unpacking autoconf-archive (20190106-2.1ubuntu1) ...
Setting up autoconf-archive (20190106-2.1ubuntu1) ...
Processing triggers for install-info (6.7.0.dfsg.2-5) ...
atr@node1:/home/atr/src/storage/libzbc$ sh ./autogen.sh
atr@node1:/home/atr/src/storage/libzbc$ sh ./autogen.sh
libtoolize: putting auxiliary files in AC_CONFIG_AUX_DIR, 'build-aux'.
libtoolize: copying file 'build-aux/ltmain.sh'
libtoolize: putting macros in AC_CONFIG_MACRO_DIRS, 'm4'.
libtoolize: copying file 'm4/libtool.m4'
libtoolize: copying file 'm4/ltoptions.m4'
libtoolize: copying file 'm4/ltsugar.m4'
libtoolize: copying file 'm4/ltversion.m4'
libtoolize: copying file 'm4/lt~obsolete.m4'
configure.ac:33: installing 'build-aux/compile'
configure.ac:25: installing 'build-aux/missing'
lib/Makefile.am: installing 'build-aux/depcomp'
atr@node1:/home/atr/src/storage/libzbc$ ./configure --prefix=/home/atr/local/
checking for a BSD-compatible install... /usr/bin/install -c
checking whether build environment is sane... yes
checking for a thread-safe mkdir -p... /usr/bin/mkdir -p
checking for gawk... gawk
checking whether make sets $(MAKE)... yes
checking whether make supports nested variables... yes
checking whether make supports nested variables... (cached) yes
checking build system type... x86_64-pc-linux-gnu
checking host system type... x86_64-pc-linux-gnu
Not trying to build rpms for your system (use --enable-rpm-rules to override)
checking whether to enable debugging... no
checking for gcc... gcc
checking whether the C compiler works... yes
checking for C compiler default output file name... a.out
checking for suffix of executables...
checking whether we are cross compiling... no
checking for suffix of object files... o
checking whether we are using the GNU C compiler... yes
checking whether gcc accepts -g... yes
atr@node1:/home/atr/src/storage/libzbc$ make install
----------------------------------------------------------------------
Libraries have been installed in:
/home/atr/local/lib
If you ever happen to want to link against installed libraries
in a given directory, LIBDIR, you must either use libtool, and
specify the full pathname of the library, or use the '-LLIBDIR'
flag during linking and do at least one of the following:
- add LIBDIR to the 'LD_LIBRARY_PATH' environment variable
during execution
- add LIBDIR to the 'LD_RUN_PATH' environment variable
during linking
- use the '-Wl,-rpath -Wl,LIBDIR' linker flag
- have your system administrator add LIBDIR to '/etc/ld.so.conf'
See any operating system documentation about shared libraries for
more information, such as the ld(1) and ld.so(8) manual pages.
----------------------------------------------------------------------
/usr/bin/mkdir -p '/home/atr/local/lib/pkgconfig'
/usr/bin/install -c -m 644 libzbc.pc '/home/atr/local/lib/pkgconfig'
/usr/bin/mkdir -p '/home/atr/local/include/libzbc'
/usr/bin/install -c -m 644 ../include/libzbc/zbc.h '/home/atr/local/include/libzbc'
make[2]: Leaving directory '/home/atr/src/storage/libzbc/lib'
make[1]: Leaving directory '/home/atr/src/storage/libzbc/lib'
Making install in tools
make[1]: Entering directory '/home/atr/src/storage/libzbc/tools'
Making install in .
make[2]: Entering directory '/home/atr/src/storage/libzbc/tools'
make[3]: Entering directory '/home/atr/src/storage/libzbc/tools'
/usr/bin/mkdir -p '/home/atr/local/bin'
/bin/bash ../libtool --mode=install /usr/bin/install -c zbc_info zbc_report_zones zbc_reset_zone zbc_open_zone zbc_close_zone zbc_finish_zone zbc_read_zone zbc_write_zone zbc_set_write_ptr zbc_set_zones gzbc gzviewer '/home/atr/local/bin'
libtool: install: /usr/bin/install -c .libs/zbc_info /home/atr/local/bin/zbc_info
libtool: install: /usr/bin/install -c .libs/zbc_report_zones /home/atr/local/bin/zbc_report_zones
libtool: install: /usr/bin/install -c .libs/zbc_reset_zone /home/atr/local/bin/zbc_reset_zone
libtool: install: /usr/bin/install -c .libs/zbc_open_zone /home/atr/local/bin/zbc_open_zone
libtool: install: /usr/bin/install -c .libs/zbc_close_zone /home/atr/local/bin/zbc_close_zone
libtool: install: /usr/bin/install -c .libs/zbc_finish_zone /home/atr/local/bin/zbc_finish_zone
libtool: install: /usr/bin/install -c .libs/zbc_read_zone /home/atr/local/bin/zbc_read_zone
libtool: install: /usr/bin/install -c .libs/zbc_write_zone /home/atr/local/bin/zbc_write_zone
libtool: install: /usr/bin/install -c .libs/zbc_set_write_ptr /home/atr/local/bin/zbc_set_write_ptr
libtool: install: /usr/bin/install -c .libs/zbc_set_zones /home/atr/local/bin/zbc_set_zones
libtool: install: /usr/bin/install -c .libs/gzbc /home/atr/local/bin/gzbc
libtool: install: /usr/bin/install -c .libs/gzviewer /home/atr/local/bin/gzviewer
/usr/bin/mkdir -p '/home/atr/local/share/man/man8'
/usr/bin/install -c -m 644 info/zbc_info.8 report_zones/zbc_report_zones.8 reset_zone/zbc_reset_zone.8 open_zone/zbc_open_zone.8 close_zone/zbc_close_zone.8 finish_zone/zbc_finish_zone.8 read_zone/zbc_read_zone.8 write_zone/zbc_write_zone.8 set_write_ptr/zbc_set_write_ptr.8 set_zones/zbc_set_zones.8 gui/gzbc.8 viewer/gzviewer.8 '/home/atr/local/share/man/man8'
make[3]: Leaving directory '/home/atr/src/storage/libzbc/tools'
make[2]: Leaving directory '/home/atr/src/storage/libzbc/tools'
make[1]: Leaving directory '/home/atr/src/storage/libzbc/tools'
make[1]: Entering directory '/home/atr/src/storage/libzbc'
make[2]: Entering directory '/home/atr/src/storage/libzbc'
make[2]: Nothing to be done for 'install-exec-am'.
make[2]: Nothing to be done for 'install-data-am'.
make[2]: Leaving directory '/home/atr/src/storage/libzbc'
make[1]: Leaving directory '/home/atr/src/storage/libzbc'
atr@node1:/home/atr/src/storage/libzbc$
Now setup the fio compilation ....
atr@node3:/home/atr/src/storage/fio$ sudo ./fio --name=animesh --filename=/dev/nullb0 --direct=1 --zonemode=zbd --offset=$((512*1024*1024)) --size=4G --ioengine=io_uring --iodepth=8 --rw=write --bs=4K
animesh: (g=0): rw=write, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=io_uring, iodepth=8
fio-3.26-39-g6308
Starting 1 process
/dev/nullb0: rounded down io_size from 4294967296 to 2147483648
Jobs: 1 (f=0): [f(1)][100.0%][w=522MiB/s][w=134k IOPS][eta 00m:00s]
animesh: (groupid=0, jobs=1): err= 0: pid=973937: Tue May 4 12:37:48 2021
write: IOPS=136k, BW=531MiB/s (557MB/s)(2048MiB/3855msec); 32 zone resets
slat (usec): min=4, max=444, avg= 6.64, stdev= 2.05
clat (nsec): min=1633, max=516246, avg=51482.56, stdev=13817.98
lat (usec): min=7, max=524, avg=58.24, stdev=15.55
clat percentiles (usec):
| 1.00th=[ 43], 5.00th=[ 44], 10.00th=[ 44], 20.00th=[ 45],
| 30.00th=[ 45], 40.00th=[ 45], 50.00th=[ 49], 60.00th=[ 50],
| 70.00th=[ 50], 80.00th=[ 52], 90.00th=[ 66], 95.00th=[ 95],
| 99.00th=[ 103], 99.50th=[ 104], 99.90th=[ 122], 99.95th=[ 159],
| 99.99th=[ 225]
bw ( KiB/s): min=511640, max=569544, per=100.00%, avg=547178.29, stdev=19402.23, samples=7
iops : min=127910, max=142384, avg=136794.57, stdev=4850.23, samples=7
lat (usec) : 2=0.01%, 4=0.01%, 10=0.01%, 20=0.01%, 50=68.81%
lat (usec) : 100=29.19%, 250=1.97%, 500=0.01%, 750=0.01%
cpu : usr=11.11%, sys=36.22%, ctx=524329, majf=0, minf=28
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=100.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.1%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwts: total=0,524288,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=8
Run status group 0 (all jobs):
WRITE: bw=531MiB/s (557MB/s), 531MiB/s-531MiB/s (557MB/s-557MB/s), io=2048MiB (2147MB), run=3855-3855msec
Disk stats (read/write):
nullb0: ios=31/494194, merge=0/0, ticks=1/1904, in_queue=0, util=96.87%
atr@node3:/home/atr/src/storage/fio$
- the size determines the amount to be written, hence the number of zones reset.
FIO example combinations
1071 sudo ./fio --name=animesh --filename=/dev/nullb0 --direct=1 --zonemode=zbd --offset=140660178944 --size=$((512*1024*1024))
1073 sudo ./fio --name=animesh --filename=/dev/nullb0 --direct=1 --zonemode=zbd --offset=$((512*1024*1024))--size=1G
1074 sudo ./fio --name=animesh --filename=/dev/nullb0 --direct=1 --zonemode=zbd --offset=$((512*1024*1024))--size=2G
1075 sudo ./fio --name=animesh --filename=/dev/nullb0 --direct=1 --zonemode=zbd --offset=$((512*1024*1024))--size=20G
1076 sudo ./fio --name=animesh --filename=/dev/nullb0 --direct=1 --zonemode=zbd --offset=$((512*1024*1024))--size=20G --ioengine=libaio --iodepth=8 --rw=write --bs=256K
1077 sudo ./fio --name=animesh --filename=/dev/nullb0 --direct=1 --zonemode=zbd --offset=$((512*1024*1024))--size=20G --ioengine=io_uring --iodepth=8 --rw=write --bs=256K
1078 sudo ./fio --name=animesh --filename=/dev/nullb0 --direct=1 --zonemode=zbd --offset=$((512*1024*1024))--size=1G --ioengine=io_uring --iodepth=8 --rw=write --bs=256K
1079 sudo ./fio --name=animesh --filename=/dev/nullb0 --direct=1 --zonemode=zbd --offset=$((512*1024*1024))--size=1G --ioengine=io_uring --iodepth=8 --rw=write --bs=64M
1080 sudo ./fio --name=animesh --filename=/dev/nullb0 --direct=1 --zonemode=zbd --offset=$((512*1024*1024))--size=1G --ioengine=io_uring --iodepth=8 --rw=write --bs=128K
1081 sudo ./fio --name=animesh --filename=/dev/nullb0 --direct=1 --zonemode=zbd --offset=$((512*1024*1024))--size=1G --ioengine=io_uring --iodepth=8 --rw=write --bs=4K
1084 sudo ./fio --name=animesh --filename=/dev/nullb0 --direct=1 --zonemode=zbd --offset=$((512*1024*1024))--size=512M --ioengine=io_uring --iodepth=8 --rw=write --bs=4K
1085 sudo ./fio --name=animesh --filename=/dev/nullb0 --direct=1 --zonemode=zbd --offset=$((512*1024*1024)) --size=512M --ioengine=io_uring --iodepth=8 --rw=write --bs=4K
1086 sudo ./fio --name=animesh --filename=/dev/nullb0 --direct=1 --zonemode=zbd --offset=$((512*1024*1024)) --size=1G --ioengine=io_uring --iodepth=8 --rw=write --bs=4K
1087 sudo ./fio --name=animesh --filename=/dev/nullb0 --direct=1 --zonemode=zbd --offset=$((512*1024*1024)) --size=2G --ioengine=io_uring --iodepth=8 --rw=write --bs=4K
1088 sudo ./fio --name=animesh --filename=/dev/nullb0 --direct=1 --zonemode=zbd --offset=$((512*1024*1024)) --size=4G --ioengine=io_uring --iodepth=8 --rw=write --bs=4K
1089 sudo ./fio --name=animesh --filename=/dev/nullb0 --direct=1 --zonemode=zbd --offset=$((512*1024*1024)) --size=4G --ioengine=io_uring --iodepth=8 --rw=write --bs=4K --time_based
1090 sudo ./fio --name=animesh --filename=/dev/nullb0 --direct=1 --zonemode=zbd --offset=$((512*1024*1024)) --size=4G --ioengine=io_uring --iodepth=8 --rw=write --bs=4K --help
1091 sudo ./fio --name=animesh --filename=/dev/nullb0 --direct=1 --zonemode=zbd --offset=$((512*1024*1024)) --size=4G --ioengine=io_uring --iodepth=8 --rw=write --bs=4K runtime=30s --time_based
1092 sudo ./fio --name=animesh --filename=/dev/nullb0 --direct=1 --zonemode=zbd --offset=$((512*1024*1024)) --size=4G --ioengine=io_uring --iodepth=8 --rw=write --bs=4K --runtime=30s --time_based
1102 sudo ./fio --name=animesh --filename=/dev/nullb0 --direct=1 --zonemode=zbd --offset=$((512*1024*1024)) --size=4G --ioengine=io_uring --iodepth=8 --rw=write --bs=4K --runtime=30s --time_based
1103 sudo ./fio --name=animesh --filename=/dev/nullb0 --direct=1 --zonemode=zbd --offset=$((512*1024*1024)) --size=4G --ioengine=libaio --iodepth=1 --rw=write --bs=4K --runtime=10s --time_based
1104 sudo ./fio --name=animesh --filename=/dev/nullb0 --direct=1 --zonemode=zbd --offset=$((512*1024*1024)) --size=4G --ioengine=libaio --iodepth=2 --rw=write --bs=4K --runtime=10s --time_based
1105 sudo ./fio --name=animesh --filename=/dev/nullb0 --direct=1 --zonemode=zbd --offset=$((512*1024*1024)) --size=4G --ioengine=libaio --iodepth=2 --rw=write --bs=4K --runtime=10s --time_based --thread=2
1106 sudo ./fio --name=animesh --filename=/dev/nullb0 --direct=1 --zonemode=zbd --offset=$((512*1024*1024)) --size=4G --ioengine=libaio --iodepth=1 --rw=write --bs=4K --runtime=20s --time_based --thread=2
1107 sudo ./fio --name=animesh --filename=/dev/nullb0 --direct=1 --zonemode=zbd --offset=$((512*1024*1024)) --size=4G --ioengine=libaio --iodepth=1 --rw=write --bs=4K --runtime=20s --time_based --thread=4
1108 sudo ./fio --name=animesh --filename=/dev/nullb0 --direct=1 --zonemode=zbd --offset=$((512*1024*1024)) --size=4G --ioengine=libaio --iodepth=1 --rw=write --bs=4K --runtime=20s --time_based --thread=1
1146 cd /home/atr/src/storage/fio
1147 sudo ./fio --name=animesh --filename=/dev/nullb0 --direct=1 --zonemode=zbd --offset=$((512*1024*1024)) --size=4G --ioengine=libaio --iodepth=1 --rw=write --bs=4K --runtime=20s --time_based --thread=1
1148 sudo ./fio --name=animesh --filename=/dev/nullb0 --direct=1 --zonemode=zbd --offset=$((512*1024*1024)) --size=4G --ioengine=libaio --iodepth=1 --rw=write --bs=4K --runtime=20s --time_based --thread=1x
1149 sudo ./fio --name=animesh --filename=/dev/nullb0 --direct=1 --zonemode=zbd --offset=$((512*1024*1024)) --size=4G --ioengine=libaio --iodepth=1 --rw=write --bs=4K --runtime=20s --time_based --thread=2
1150 sudo ./fio --name=animesh --filename=/dev/nullb0 --direct=1 --zonemode=zbd --offset=$((512*1024*1024)) --size=2G --ioengine=libaio --iodepth=64 --rw=write --bs=4K --runtime=20s --time_based --thread=2
1151 sudo ./fio --name=animesh --filename=/dev/nullb0 --direct=1 --zonemode=zbd --offset=$((512*1024*1024)) --size=2G --ioengine=libaio --iodepth=64 --rw=write --bs=4K --runtime=20s --time_based --thread=1
1152 sudo ./fio --name=animesh --filename=/dev/nullb0 --direct=1 --zonemode=zbd --offset=$((512*1024*1024)) --size=2G --ioengine=libaio --iodepth=64 --rw=write --bs=4K --runtime=30s --time_based --thread=1
1153 sudo ./fio --name=animesh --filename=/dev/nullb0 --direct=1 --zonemode=zbd --offset=$((512*1024*1024)) --size=2G --ioengine=libaio --iodepth=64 --rw=read --bs=4K --runtime=30s --time_based --thread=1
1171 cd /home/atr/src/storage/fio
1172 sudo ./fio --name=animesh --filename=/dev/nullb0 --direct=1 --zonemode=zbd --offset=$((512*1024*1024)) --size=2G --ioengine=libaio --iodepth=64 --rw=read --bs=4K --runtime=30s --time_based --thread=1
1205 cd /home/atr/src/storage/fio
1206 sudo ./fio --name=animesh --filename=/dev/nullb0 --direct=1 --zonemode=zbd --offset=$((512*1024*1024)) --size=2G --ioengine=libaio --iodepth=64 --rw=read --bs=4K --runtime=30s --time_based --thread=1
1208 sudo ./fio --name=animesh --filename=/dev/nullb0 --direct=1 --zonemode=zbd --offset=$((512*1024*1024)) --size=2G --ioengine=libaio --iodepth=64 --rw=write --bs=4K --runtime=30s --time_based --thread=1
1266 cd /home/atr/src/storage/fio
1267 sudo ./fio --name=animesh --filename=/dev/nullb0 --direct=1 --zonemode=zbd --offset=$((512*1024*1024)) --size=2G --ioengine=libaio --iodepth=64 --rw=write --bs=4K --runtime=30s --time_based --thread=1
1271 #sudo ./fio --name=animesh --filename=/dev/nullb0 --direct=1 --zonemode=zbd --offset=$((512*1024*1024)) --size=2G --ioengine=libaio --iodepth=64 --rw=write --bs=4K --runtime=30s --time_based --thread=1
1272 sudo ./fio --name=animesh --filename=/dev/ram0 --direct=1 --offset=$((0*1024*1024)) --size=1G --ioengine=libaio --iodepth=64 --rw=write --bs=4K --runtime=30s --time_based --thread=1
1273 sudo ./fio --name=animesh --filename=/dev/ram0 --direct=1 --offset=$((0*1024*1024)) --size=1G --ioengine=libaio --iodepth=1 --rw=write --bs=4K --runtime=30s --time_based --thread=1
1274 sudo ./fio --name=animesh --filename=/dev/ram0 --direct=1 --offset=$((0*1024*1024)) --size=1G --ioengine=libaio --iodepth=1 --rw=read --bs=4K --runtime=30s --time_based --thread=1
1275 sudo ./fio --name=animesh --filename=/dev/nullb0 --direct=1 --zonemode=zbd --offset=$((512*1024*1024)) --size=2G --ioengine=libaio --iodepth=1 --rw=write --bs=4K --runtime=30s --time_based --thread=1
1276 sudo ./fio --name=animesh --filename=/dev/nullb0 --direct=1 --zonemode=zbd --offset=$((512*1024*1024)) --size=2G --ioengine=iouring --iodepth=1 --rw=write --bs=4K --runtime=30s --time_based --thread=1
1277 sudo ./fio --name=animesh --filename=/dev/nullb0 --direct=1 --zonemode=zbd --offset=$((512*1024*1024)) --size=2G --ioengine=uring --iodepth=1 --rw=write --bs=4K --runtime=30s --time_based --thread=1
1278 sudo ./fio --name=animesh --filename=/dev/nullb0 --direct=1 --zonemode=zbd --offset=$((512*1024*1024)) --size=2G --ioengine=ioring --iodepth=1 --rw=write --bs=4K --runtime=30s --time_based --thread=1
1279 sudo ./fio --name=animesh --filename=/dev/nullb0 --direct=1 --zonemode=zbd --offset=$((512*1024*1024)) --size=2G --ioengine=io_uring --iodepth=1 --rw=write --bs=4K --runtime=30s --time_based --thread=1
1280 sudo ./fio --name=animesh --numjobs=2 --filename=/dev/nullb0 --direct=1 --zonemode=zbd --offset=$((512*1024*1024)) --size=2G --ioengine=io_uring --iodepth=1 --rw=write --bs=4K --runtime=30s --time_based --thread=1
1281 sudo ./fio --name=animesh --numjobs=2 --filename=/dev/nullb0 --direct=1 --zonemode=zbd --offset=$((512*1024*1024)) --size=2G --ioengine=io_uring --iodepth=1 --rw=write --bs=4K --runtime=30s --time_based --thread
1282 sudo ./fio --name=animesh --numjobs=2 --filename=/dev/nullb0 --direct=1 --zonemode=zbd --offset=$((512*1024*1024)) --size=2G --ioengine=io_uring --iodepth=1 --rw=write --bs=4K --runtime=30s --time_based
1283 sudo ./fio --name=animesh --numjobs=2 --filename=/dev/nullb0 --direct=1 --zonemode=zbd --offset=$((512*1024*1024)) --size=2G --ioengine=io_uring --iodepth=1 --rw=write --bs=4K --runtime=30s --time_based --group_reporting
1284 sudo ./fio --name=animesh --numjobs=2 --filename=/dev/nullb0 --direct=1 --zonemode=zbd --offset=$((512*1024*1024)) --size=2G --ioengine=io_uring --iodepth=1 --rw=write --bs=4K --runtime=30s --time_based --group_reporting --thread
- the ioctl interface and primay code is
- oslib/blkzoned.h https://github.com/axboe/fio/blob/master/oslib/blkzoned.h
- oslib/linux-blkzoned.c https://github.com/axboe/fio/blob/master/oslib/linux-blkzoned.c (includes, #include <linux/blkzoned.h>)
Makefile is broken
atr@atr-XPS-13:~/vu/github/storage/libnvme$ git diff
diff --git a/test/Makefile b/test/Makefile
index 1620622..aafd2de 100644
--- a/test/Makefile
+++ b/test/Makefile
@@ -1,5 +1,5 @@
CFLAGS ?= -g -O2
-override CFLAGS += -Wall -D_GNU_SOURCE -L../src/ -I../src/ -luuid
+override CFLAGS += -Wall -D_GNU_SOURCE -L../src/ -I../src/
include ../Makefile.quiet
@@ -23,10 +23,10 @@ all: $(all_targets)
CXXFLAGS ?= -lstdc++
%: %.cc
- $(QUIET_CC)$(CC) $(CFLAGS) $(LDFLAGS) $(CXXFLAGS) -o $@ $< -lnvme
+ $(QUIET_CC)$(CXX) $(CFLAGS) $(LDFLAGS) $(CXXFLAGS) -o $@ $< -lnvme
%: %.c
- $(QUIET_CC)$(CC) $(CFLAGS) $(LDFLAGS) -o $@ $< -lnvme
+ $(QUIET_CC)$(CC) $(CFLAGS) $(LDFLAGS) -o $@ $< -lnvme -luuid
clean:
rm -f $(all_targets)
atr@atr-XPS-13:~/vu/github/storage/libnvme$
#!/bin/bash
#qemu-img create -f raw znsssd.img 16777216
echo "needs qemu 6.0.0 or above"
sudo /home/atr/src/qemu-6.0.0/build/qemu-system-x86_64 -name qemuzns -m 4G --enable-kvm -cpu host -smp 2 \
-hda /home/atr/xfs/images/ubuntu-20.04-zns.qcow \
-net user,hostfwd=tcp::7777-:22,hostfwd=tcp::2222-:2000 -net nic \
-drive file=/home/atr/xfs/images/znsssd.img,id=znsd,format=raw,if=none \
-drive file=/home/atr/xfs/images/nvmessd.img,id=nvmd,format=raw,if=none \
-device nvme,drive=nvmd,serial=1234,physical_block_size=4096,logical_block_size=4096\
-device nvme,serial=baz,id=nvme2,zoned.zasl=7\
-device nvme-ns,id=ns2,drive=znsd,nsid=2,logical_block_size=4096,physical_block_size=4096,zoned=true,zoned.zone_size=131072,zoned.zone_capacity=131072,zoned.max_open=0,zoned.max_active=0,bus=nvme2
# https://github.com/qemu/qemu/blob/master/hw/nvme/ctrl.c
# * - `zoned.zasl`
# * Indicates the maximum data transfer size for the Zone Append command. Like
# * `mdts`, the value is specified as a power of two (2^n) and is in units of
# * the minimum memory page size (CAP.MPSMIN). The default value is 0 (i.e.
# * defaulting to the value of `mdts`).
# *
# * Setting `zoned` to true selects Zoned Command Set at the namespace.
# * In this case, the following namespace properties are available to configure
# * zoned operation:
# * zoned.zone_size=<zone size in bytes, default: 128MiB>
# * The number may be followed by K, M, G as in kilo-, mega- or giga-.
# *
# * zoned.zone_capacity=<zone capacity in bytes, default: zone size>
# * The value 0 (default) forces zone capacity to be the same as zone
# * size. The value of this property may not exceed zone size.
# *
# * zoned.descr_ext_size=<zone descriptor extension size, default 0>
# * This value needs to be specified in 64B units. If it is zero,
# * namespace(s) will not support zone descriptor extensions.
# *
# * zoned.max_active=<Maximum Active Resources (zones), default: 0>
# * The default value means there is no limit to the number of
# * concurrently active zones.
# *
# * zoned.max_open=<Maximum Open Resources (zones), default: 0>
# * The default value means there is no limit to the number of
# * concurrently open zones.
# *
# * zoned.cross_read=<enable RAZB, default: false>
# * Setting this property to true enables Read Across Zone Boundaries.
# */
# --older verion
sudo /home/atr/src/qemu-6.0.0/build/qemu-system-x86_64 -name qemuzns -m 4G --enable-kvm -cpu host -smp 2 \
-hda /home/atr/xfs/images/ubuntu-20.04-zns.qcow \
-net user,hostfwd=tcp::7777-:22,hostfwd=tcp::2222-:2000 -net nic \
-drive file=/home/atr/xfs/images/znsssd.img,id=mynvme,format=raw,if=none \
-device nvme,serial=baz,id=nvme2,zoned.zasl=7\
-device nvme-ns,id=ns2,drive=mynvme,nsid=2,logical_block_size=4096,physical_block_size=4096,zoned=true,zoned.zone_size=131072,zoned.zone_capacity=131072,zoned.max_open=0,zoned.max_active=0,bus=nvme2
All code is at https://github.com/animeshtrivedi/zns-resources
http://zonedstorage.io/projects/zns/
they have basic r/w tests with zone appends as well. It worked in the VM
atr@stosys-qemu-vm:/home/atr/src/zns-resources/scripts$ cat /sys/block/nvme0n1/queue/zoned
host-managed
atr@stosys-qemu-vm:/home/atr/src/zns-resources/scripts$ cat /sys/block/nvme0n1/queue/chunk_sectors
256
atr@stosys-qemu-vm:/home/atr/src/zns-resources/scripts$ cat /sys/block/nvme0n1/queue/nr_zones
128
atr@stosys-qemu-vm:/home/atr/src/zns-resources/scripts$ sudo nvme zns id-ctrl /dev/nvme0n1
NVMe ZNS Identify Controller:
zasl : 7
atr@stosys-qemu-vm:/home/atr/src/zns-resources/scripts$ sudo nvme zns id-ns /dev/nvme0n1
ZNS Command Set Identify Namespace:
zoc : 0
ozcs : 0
mar : 0xffffffff
mor : 0xffffffff
rrl : 0
frl : 0
lbafe 0: zsze:0x20 zdes:0
lbafe 1: zsze:0x20 zdes:0 (in use)
atr@stosys-qemu-vm:/home/atr/src/zns-resources/scripts$ echo "hello world" | sudo nvme zns zone-append /dev/nvme0n1 -z 4096
Success appended data to LBA 2
atr@stosys-qemu-vm:/home/atr/src/zns-resources/scripts$
- they have common, renamed structures definitions in
- nvme-cli/linux/nvme.h
- libnvme/src/nvme/types.h
- nvme-cli has
- nvme-cli has support for SUBMIT_IO and PASSTRHOUGH (passthrough goes directly to the device)
- SUBMIT_IO needs support from the kernel for a particular feature and command
- libnvme only has passthrough as it does not use SUBMIT_IO command set
https://github.com/qemu/qemu/blob/master/hw/nvme/ctrl.c#L23
atr@node3:/home/atr/src/zns-resources$ sudo nvme get-feature -H -f 1 /dev/nvme1n1
get-feature:0x1 (Arbitration), Current value:00000000
High Priority Weight (HPW): 1
Medium Priority Weight (MPW): 1
Low Priority Weight (LPW): 1
Arbitration Burst (AB): 1
atr@node3:/home/atr/src/zns-resources$
All code is in the repo zns-example repo.
- TRIM command is deallocate, and uncorrectable write is like artifically introducing errors for not being able to read certain LBA ranges.
- There is a bit of mess between how libnvme is developed (missing error codes) and examples in nvme-cli. The current zns-example repo has code copied from all over the place.
- LBA is a address on +1 offsets, not LBA_SIZE offsets.
-
nvme-cli
code is in nvme_ioctl.[ch] files.- It also has a passthrough function but it is not used anywhere.
- in
libnvme
, it hastypes.h
which has many definitions, and then other logic inioctl.[ch]
files. It only uses passthrough.
The 4096 are response structures for a always 64B command. See example of CSI NS and Ctrl identify examples in the main.cpp
-
NVMe commands have a common structure and typically have DW10,11,12,14,14,15,16 etc. for command specific customization. See section 4.2 and figure 105 for the common command strucutre in the NVMe 1.4 base specification.
-
Zones can be managed using Zone Management send/recv commands. See section 4.3 and 4.4. There is a working code. I have not tired supporting the extended report attributes.
-
ZNS read/write commands are the same as in the NVMe base specification. Only Zone management and appends are new, and the associated logic how to manage zones with their associated transitions.
the LBA size can be extracted from identify the namespace command, like
atr@node3:/home/atr/src/zns-resources/zns-rw-example$ sudo nvme id-ns /dev/nvme1n1
NVME Identify Namespace 1:
nsze : 0x209a97b0
ncap : 0x209a97b0
nuse : 0x209a97b0
nsfeat : 0
nlbaf : 0
flbas : 0
mc : 0
dpc : 0
dps : 0
nmic : 0
rescap : 0
fpi : 0
dlfeat : 0
nawun : 0
nawupf : 0
nacwu : 0
nabsn : 0
nabo : 0
nabspf : 0
noiob : 0
nvmcap : 0
mssrl : 0
mcl : 0
msrc : 0
anagrpid: 0
nsattr : 0
nvmsetid: 1
endgid : 1
nguid : 00000000000000000000000000000000
eui64 : 0000000000000000
lbaf 0 : ms:0 lbads:9 rp:0x2 (in use) # This is 9, 2^9 = 512 bytes
atr@node3:/home/atr/src/zns-resources/zns-rw-example$
Another examples from the VM
atr@stosys-qemu-vm:~$ sudo nvme id-ns /dev/nvme0n1
NVME Identify Namespace 1:
nsze : 0x800
ncap : 0x800
nuse : 0x800
nsfeat : 0x14
nlbaf : 1
flbas : 0x1
mc : 0
dpc : 0
dps : 0
nmic : 0
rescap : 0
fpi : 0
dlfeat : 1
nawun : 0
nawupf : 0
nacwu : 0
nabsn : 0
nabo : 0
nabspf : 0
noiob : 0
nvmcap : 0
npwg : 0
npwa : 0
npdg : 0
npda : 0
nows : 0
mssrl : 0
mcl : 0
msrc : 0
anagrpid: 0
nsattr : 0
nvmsetid: 0
endgid : 0
nguid : 00000000000000000000000000000000
eui64 : 0000000000000000
lbaf 0 : ms:0 lbads:9 rp:0
lbaf 1 : ms:0 lbads:12 rp:0 (in use)
atr@stosys-qemu-vm:~$
It has two supported sizes, 512 bytes and 4096 bytes. The latter one is in use.
I also print this in the ZNS example code
a specific device name is passed : /dev/nvme1n1
device /dev/nvme1n1 opened successfully 3
nsze : 0x1000
ncap : 0x1000
nuse : 0x1000
nsfeat : 0x14
[4:4] : 0x1 NPWG, NPWA, NPDG, NPDA, and NOWS are Supported
[3:3] : 0 NGUID and EUI64 fields if non-zero, Reused
[2:2] : 0x1 Deallocated or Unwritten Logical Block error Supported
[1:1] : 0 Namespace uses AWUN, AWUPF, and ACWU
[0:0] : 0 Thin Provisioning Not Supported
nlbaf : 1
flbas : 0x1
[4:4] : 0 Metadata Transferred in Separate Contiguous Buffer
[3:0] : 0x1 Current LBA Format Selected
mc : 0
[1:1] : 0 Metadata Pointer Not Supported
[0:0] : 0 Metadata as Part of Extended Data LBA Not Supported
dpc : 0
[4:4] : 0 Protection Information Transferred as Last 8 Bytes of Metadata Not Supported
[3:3] : 0 Protection Information Transferred as First 8 Bytes of Metadata Not Supported
[2:2] : 0 Protection Information Type 3 Not Supported
[1:1] : 0 Protection Information Type 2 Not Supported
[0:0] : 0 Protection Information Type 1 Not Supported
dps : 0
[3:3] : 0 Protection Information is Transferred as Last 8 Bytes of Metadata
[2:0] : 0 Protection Information Disabled
nmic : 0
[0:0] : 0 Namespace Multipath Not Capable
rescap : 0
[7:7] : 0 Ignore Existing Key - Used as defined in revision 1.2.1 or earlier
[6:6] : 0 Exclusive Access - All Registrants Not Supported
[5:5] : 0 Write Exclusive - All Registrants Not Supported
[4:4] : 0 Exclusive Access - Registrants Only Not Supported
[3:3] : 0 Write Exclusive - Registrants Only Not Supported
[2:2] : 0 Exclusive Access Not Supported
[1:1] : 0 Write Exclusive Not Supported
[0:0] : 0 Persist Through Power Loss Not Supported
fpi : 0
[7:7] : 0 Format Progress Indicator Not Supported
dlfeat : 1
[4:4] : 0 Guard Field of Deallocated Logical Blocks is set to 0xFFFF
[3:3] : 0 Deallocate Bit in the Write Zeroes Command is Not Supported
[2:0] : 0x1 Bytes Read From a Deallocated Logical Block and its Metadata are 0x00
nawun : 0
nawupf : 0
nacwu : 0
nabsn : 0
nabo : 0
nabspf : 0
noiob : 0
nvmcap : 0
npwg : 0
npwa : 0
npdg : 0
npda : 0
nows : 0
mssrl : 128
mcl : 128
msrc : 127
anagrpid: 0
nsattr : 0
nvmsetid: 0
endgid : 0
nguid : 00000000000000000000000000000000
eui64 : 0000000000000000
LBA Format 0 : Metadata Size: 0 bytes - Data Size: 512 bytes - Relative Performance: 0 Best
LBA Format 1 : Metadata Size: 0 bytes - Data Size: 4096 bytes - Relative Performance: 0 Best (in use)
size of vs is 3712
number of LBA formats? 1 (a zero based value)
the LBA size is 4096
...
So, the LBA size is in the min transfer units. There are more data strucutres asociated with the controller too, like
- Maximum Data Transfer Unit (MDTS)
- CAP.MPSMIN
atr@stosys-qemu-vm:~$ sudo nvme id-ctrl /dev/nvme0n1
NVME Identify Controller:
vid : 0x1b36
ssvid : 0x1af4
sn : 1234
mn : QEMU NVMe Ctrl
fr : 1.0
rab : 6
ieee : 525400
cmic : 0
mdts : 7
cntlid : 0
ver : 0x10400
rtd3r : 0
rtd3e : 0
oaes : 0x100
ctratt : 0
rrls : 0
cntrltype : 1
fguid :
crdt1 : 0
crdt2 : 0
crdt3 : 0
oacs : 0xa
acl : 3
aerl : 3
frmw : 0x3
lpa : 0x7
elpe : 0
npss : 0
avscc : 0
apsta : 0
wctemp : 343
cctemp : 373
mtfa : 0
hmpre : 0
hmmin : 0
tnvmcap : 0
unvmcap : 0
rpmbs : 0
edstt : 0
dsto : 0
fwug : 0
kas : 0
hctma : 0
mntmt : 0
mxtmt : 0
sanicap : 0
hmminds : 0
hmmaxd : 0
nsetidmax : 0
endgidmax : 0
anatt : 0
anacap : 0
anagrpmax : 0
nanagrpid : 0
pels : 0
sqes : 0x66
cqes : 0x44
maxcmd : 0
nn : 256
oncs : 0x15d
fuses : 0
fna : 0
vwc : 0x7
awun : 0
awupf : 0
icsvscc : 0
nwpc : 0
acwu : 0
sgls : 0x10001
mnan : 0
subnqn : nqn.2019-08.org.qemu:1234
ioccsz : 0
iorcsz : 0
icdoff : 0
fcatt : 0
msdbd : 0
ofcs : 0
ps 0 : mp:25.00W operational enlat:16 exlat:4 rrt:0 rrl:0
rwt:0 rwl:0 idle_power:- active_power:-
atr@stosys-qemu-vm:~$
See NVMe identify controller, figure 251 Bytes 77 in 1.4 specification.
MDTS says 7, which is The value is in units of the minimum memory page size (CAP.MPSMIN) and is reported as a power of two (2^n).
See below, so the MDTS for this device is : 128 x 4kb = 512kB
Memory Page Size Minimum (MPSMIN): This field indicates the minimum host memory page size that the controller supports. The minimum memory page size is (2 ^ (12 + MPSMIN)). The host shall not configure a memory page size in CC.MPS that is smaller than this value.
This information is in the Offset 0h: CAP – Controller Capabilities register. See section 3.1.1 in the NVMe specification (figure 69). How do I read this register? Using nvme-cli command
atr@stosys-qemu-vm:~$ sudo nvme show-regs -H /dev/nvme1
cap : 4018200f0107ff
Controller Memory Buffer Supported (CMBS): The Controller Memory Buffer is Not Supported
Persistent Memory Region Supported (PMRS): The Persistent Memory Region is Not Supported
Memory Page Size Maximum (MPSMAX): 65536 bytes
Memory Page Size Minimum (MPSMIN): 4096 bytes
Boot Partition Support (BPS): No
Command Sets Supported (CSS): NVM command set is Supported
One or more I/O Command Sets are Supported
NVM Subsystem Reset Supported (NSSRS): No
Doorbell Stride (DSTRD): 4 bytes
Timeout (TO): 7500 ms
Arbitration Mechanism Supported (AMS): Weighted Round Robin with Urgent Priority Class is not supported
Contiguous Queues Required (CQR): Yes
Maximum Queue Entries Supported (MQES): 2048
version : 10400
NVMe specification 1.4
cc : 460061
I/O Completion Queue Entry Size (IOCQES): 16 bytes
I/O Submission Queue Entry Size (IOSQES): 64 bytes
Shutdown Notification (SHN): No notification; no effect
Arbitration Mechanism Selected (AMS): Round Robin
Memory Page Size (MPS): 4096 bytes
I/O Command Set Selected (CSS): All supported I/O Command Sets
Enable (EN): Yes
csts : 1
Processing Paused (PP): No
NVM Subsystem Reset Occurred (NSSRO): No
Shutdown Status (SHST): Normal operation (no shutdown has been requested)
Controller Fatal Status (CFS): False
Ready (RDY): Yes
nssr : 0
NVM Subsystem Reset Control (NSSRC): 0
intms : 0
Interrupt Vector Mask Set (IVMS): 0
intmc : 0
Interrupt Vector Mask Clear (IVMC): 0
aqa : 1f001f
Admin Completion Queue Size (ACQS): 32
Admin Submission Queue Size (ASQS): 32
asq : 11112d000
Admin Submission Queue Base (ASQB): 11112d000
acq : 111362000
Admin Completion Queue Base (ACQB): 111362000
cmbloc : 0
Controller Memory Buffer feature is not supported
cmbsz : 0
Controller Memory Buffer feature is not supported
bpinfo : 0
Boot Partition feature is not supported
bprsel : 0
Boot Partition feature is not supported
bpmbl : 0
Boot Partition feature is not supported
cmbmsc : 0
Controller Base Address (CBA): 0
Controller Memory Space Enable (CMSE): 0
Capabilities Registers Enabled (CRE): CMBLOC and CMBSZ registers are NOT enabled
cmbsts : 0
Controller Base Address Invalid (CBAI): 0
pmrcap : 0
Controller Memory Space Supported (CMSS): Referencing PMR with host supplied addresses is Not Supported
Persistent Memory Region Timeout (PMRTO): 0
Persistent Memory Region Write Barrier Mechanisms (PMRWBM): 0
Persistent Memory Region Time Units (PMRTU): PMR time unit is 500 milliseconds
Base Indicator Register (BIR): 0
Write Data Support (WDS): Write data to the PMR is not supported
Read Data Support (RDS): Read data from the PMR is not supported
pmrctl : 0
Enable (EN): PMR is Disabled
pmrsts : 0
Controller Base Address Invalid (CBAI): 0
Health Status (HSTS): Normal Operation
Not Ready (NRDY): The Persistent Memory Region is Not Ready to process PCI Express memory read and write requests
Error (ERR): 0
pmrebs : 0
PMR Elasticity Buffer Size Base (PMRWBZ): 0
Read Bypass Behavior : memory reads not conflicting with memory writes in the PMR Elasticity Buffer MAY bypass those memory writes
PMR Elasticity Buffer Size Units (PMRSZU): Bytes
pmrswtp : 0
PMR Sustained Write Throughput (PMRSWTV): 0
PMR Sustained Write Throughput Units (PMRSWTU): Bytes/second
pmrmscl : 0
Controller Base Address (CBA): 0
Controller Memory Space Enable (CMSE): 0
pmrmscu : 0
Controller Base Address (CBA): 0
atr@stosys-qemu-vm:~$
Figure 78, section 3.1.5 shows the register that can be used to change these values. I have no tried how to change these values yet. It does not look like it is supported directly in the nvme-cli command. For example, you can set the Memory Page Size (MPS) in the bit offset (10:07).
See above, the controller register dump also shows the supported NVMe version. We have
- on Dell XPS
version : 10300 NVMe specification 1.3
- on Node 3
version : 10000 NVMe specification 1.0
- Inside the Ubuntu VM
version : 10400 NVMe specification 1.4
With the command nvme show-regs -H /dev/nvme1
(needs the character device)
Section 5.15 has a set of identify commands and associated components that should reply to them. There is CNS (figure 245) and associated values in figure 248. The two important ones are namespace (0) and controller (1). The namespace return data structure is defined in figure 249 and is 4096 bytes. The controller is in figure 251.
The values in figure 248 are not aware of ZNS extensions. Hence in the libnvme, they have
a new non-standard extension value for CNN as (in ioctl.h
)
enum nvme_identify_cns {
NVME_IDENTIFY_CNS_NS = 0x00,
NVME_IDENTIFY_CNS_CTRL = 0x01,
NVME_IDENTIFY_CNS_NS_ACTIVE_LIST = 0x02,
NVME_IDENTIFY_CNS_NS_DESC_LIST = 0x03,
NVME_IDENTIFY_CNS_NVMSET_LIST = 0x04,
NVME_IDENTIFY_CNS_CSI_NS = 0x05, /* XXX: Placeholder until assigned */
NVME_IDENTIFY_CNS_CSI_CTRL = 0x06, /* XXX: Placeholder until assigned */
NVME_IDENTIFY_CNS_ALLOCATED_NS_LIST = 0x10,
NVME_IDENTIFY_CNS_ALLOCATED_NS = 0x11,
NVME_IDENTIFY_CNS_NS_CTRL_LIST = 0x12,
NVME_IDENTIFY_CNS_CTRL_LIST = 0x13,
NVME_IDENTIFY_CNS_PRIMARY_CTRL_CAP = 0x14,
NVME_IDENTIFY_CNS_SECONDARY_CTRL_LIST = 0x15,
NVME_IDENTIFY_CNS_NS_GRANULARITY = 0x16,
NVME_IDENTIFY_CNS_UUID_LIST = 0x17,
NVME_IDENTIFY_CNS_CSI_ALLOCATED_NS = 0x18, /* XXX: Placeholder until assigned */
};
You can see the new placeholders. Plus they also use a new Command Set Identifier (CSI) as
/**
* enum nvme_csi - Defined command set indicators
* @NVME_CSI_NVM: NVM Command Set Indicator
*/
enum nvme_csi {
NVME_CSI_NVM = 0,
NVME_CSI_ZNS = 2,
};
The combination of CNS and CSI is now used to identify the controller and namespace. This is how the identify command is packed
int nvme_identify(int fd, enum nvme_identify_cns cns, __u32 nsid, __u16 cntid,
__u16 nvmsetid, __u8 uuidx, __u8 csi, void *data)
{
__u32 cdw10 = NVME_SET(cntid, IDENTIFY_CDW10_CNTID) |
NVME_SET(cns, IDENTIFY_CDW10_CNS);
__u32 cdw11 = NVME_SET(nvmsetid, IDENTIFY_CDW11_NVMSETID) |
NVME_SET(csi, IDENTIFY_CDW11_CSI);
__u32 cdw14 = NVME_SET(uuidx, IDENTIFY_CDW14_UUID);
See figure 245 (DWord10) and 246 (DWord11) for packing the identify command. The funny thing is that NVM_CSI_NVM is just artifical as it has a value zero. Hence, all zero value DWOrd11 are regular NVMe devices.
why are we doing this? To identify if a device is a ZNS device or a regular NVMe device.
So now, there are few combinations possible:
- Using the standard defined combination: NVME_IDENTIFY_CNS_NS and NVME_IDENTIFY_CNS_CTRL (and implicit CSI_NVM)
- They return the well defined
struct nvme_id_ns
(figure 249) andstruct nvme_id_ctrl
(figure 251) as their responses. This only works on the standard NVMe devices. - in QEMU, that supports this CSI extensions, we get two different data structures for the controller combinations
-
NVME_IDENTIFY_CNS_CTRL
withNVM_CSI_NVM
=struct nvme_id_ctrl_nvm
(I am not sure what this is modeled after?) (works with normal QEMU NVMe and ZNS devices) -
NVME_IDENTIFY_CNS_CTRL
withNVM_CSI_ZNS
=struct nvme_zns_id_ctrl
(Defined in the ZNS specification at figure 10, section 3.1.2) (works with normal QEMU NVMe and ZNS devices)
-
- For QEMU namespaces:
-
NVME_IDENTIFY_CNS_NS
withNVM_CSI_NVM
=struct nvme_id_ns
(I am not sure if NVMe CSI have other ds?) (works with normal QEMU NVMe and ZNS devices) -
NVME_IDENTIFY_CNS_NS
withNVM_CSI_ZNS
=struct nvme_zns_id_ns
(Defined in the ZNS specification at figure 10, section 3.1.2) (fails with normal NVMe device, works with ZNS devices) The failure of the identify ZNS namespace is how currently I am figuring out a ZNS device.
-
- They return the well defined
QEMU VM restart unconditionally initialized Zones inside the device. See here https://github.com/qemu/qemu/blob/master/hw/nvme/ns.c#L235
This could be a small project to fix this. There are clear shutdown and init functions that can be used to read/write metadata and data back to the device.
atr@stosys-qemu-vm:/home/atr/src/zns-resources/zns-rw-example$ sudo nvme zns zone-append -zslba 0 -z 4096 -d ./4096slba0 -f /dev/nvme1n1
Success appended data to LBA 2
atr@stosys-qemu-vm:/home/atr/src/zns-resources/zns-rw-example$ sudo nvme zns zone-append -zslba 0 -z 4096 -d ./4096slba0 -f /dev/nvme1n1
Success appended data to LBA 3
atr@stosys-qemu-vm:/home/atr/src/zns-resources/zns-rw-example$ sudo nvme zns zone-append -zslba 0 -z 4096 -d ./4096slba0 -f /dev/nvme1n1
Success appended data to LBA 4
atr@stosys-qemu-vm:/home/atr/src/zns-resources/zns-rw-example$ sudo nvme zns zone-append -zslba 0 -z 4096 -d ./4096slba0 -f /dev/nvme1n1
Success appended data to LBA 5
atr@stosys-qemu-vm:/home/atr/src/zns-resources/zns-rw-example$ sudo nvme zns zone-append -zslba 0 -z 4096 -d ./4096slba0 -f /dev/nvme1n1
Success appended data to LBA 6
atr@stosys-qemu-vm:/home/atr/src/zns-resources/zns-rw-example$ sudo nvme zns zone-append -zslba 0 -z 4096 -d ./4096slba0 -f /dev/nvme1n1
Success appended data to LBA 7
atr@stosys-qemu-vm:/home/atr/src/zns-resources/zns-rw-example$ sudo nvme zns zone-append -zslba 0 -z 4096 -d ./4096slba0 -f /dev/nvme1n1
Success appended data to LBA 8
atr@stosys-qemu-vm:/home/atr/src/zns-resources/zns-rw-example$ sudo nvme zns report-zones /dev/nvme1n1 | less
atr@stosys-qemu-vm:/home/atr/src/zns-resources/zns-rw-example$ sudo nvme zns report-zones /dev/nvme1n1 | head -5
nr_zones: 128
SLBA: 0x0 WP: 0x9 Cap: 0x20 State: IMP_OPENED Type: SEQWRITE_REQ Attrs: 0x0
SLBA: 0x20 WP: 0x20 Cap: 0x20 State: EMPTY Type: SEQWRITE_REQ Attrs: 0x0
SLBA: 0x40 WP: 0x40 Cap: 0x20 State: EMPTY Type: SEQWRITE_REQ Attrs: 0x0
SLBA: 0x60 WP: 0x60 Cap: 0x20 State: EMPTY Type: SEQWRITE_REQ Attrs: 0x0
atr@stosys-qemu-vm:/home/atr/src/zns-resources/zns-rw-example$
alias sudo='sudo env "PATH=$PATH;LD_LIBRARY_PATH=$LD_LIBRARY_PATH"'
atr@atr-XPS-13:~/vu/github/animeshtrivedi/qemu$ ./configure --target-list=x86_64-softmmu --enable-kvm --enable-linux-aio --enable-trace-backends=log --disable-werror
All QEMU NVMe config
atr@node3:/home/atr/src/zns-resources/scripts$ qemu-system-x86_64 -device nvme,help
nvme options:
acpi-index=<uint32> - (default: 0)
addr=<int32> - Slot and optional function number, example: 06.0 or 06 (default: -1)
aer_max_queued=<uint32> - (default: 64)
aerl=<uint8> - (default: 3)
bootindex=<int32>
cmb_size_mb=<uint32> - (default: 0)
discard_granularity=<size> - (default: 4294967295)
drive=<str> - Node name or ID of a block device to use as a backend
failover_pair_id=<str>
logical_block_size=<size> - A power of two between 512 B and 2 MiB (default: 0)
max_ioqpairs=<uint32> - (default: 64)
mdts=<uint8> - (default: 7)
min_io_size=<size> - (default: 0)
msix_qsize=<uint16> - (default: 65)
multifunction=<bool> - on/off (default: false)
num_queues=<uint32> - (default: 0)
opt_io_size=<size> - (default: 0)
physical_block_size=<size> - A power of two between 512 B and 2 MiB (default: 0)
pmrdev=<link<memory-backend>>
rombar=<uint32> - (default: 1)
romfile=<str>
romsize=<uint32> - (default: 4294967295)
serial=<str>
share-rw=<bool> - (default: false)
smart_critical_warning=<uint8>
subsys=<link<nvme-subsys>>
use-intel-id=<bool> - (default: false)
vsl=<uint8> - (default: 7)
write-cache=<OnOffAuto> - on/off/auto (default: "auto")
x-pcie-extcap-init=<bool> - on/off (default: true)
x-pcie-lnksta-dllla=<bool> - on/off (default: true)
zoned.zasl=<uint8> - (default: 0)
atr@node3:/home/atr/src/zns-resources/scripts$
the device is too small
atr@stosys-qemu-vm:/home/atr/src/rocksdb$ sudo ./db_bench --fs_uri=zenfs://dev:nvme1n1 --benchmarks=fillrandom --use_direct_io_for_flush_and_compaction --compression_type=none
Initializing RocksDB Options from the specified file
Initializing RocksDB Options from command-line flags
RocksDB: version 6.20
Date: Tue May 25 13:09:47 2021
CPU: 8 * Intel(R) Xeon(R) Silver 4210R CPU @ 2.40GHz
CPUCache: 16384 KB
Keys: 16 bytes each (+ 0 bytes user-defined timestamp)
Values: 100 bytes each (50 bytes after compression)
Entries: 1000000
Prefix: 0 bytes
Keys per prefix: 0
RawSize: 110.6 MB (estimated)
FileSize: 62.9 MB (estimated)
Write rate: 0 bytes/second
Read rate: 0 ops/second
Compression: NoCompression
Compression sampling rate: 0
Memtablerep: skip_list
Perf Level: 1
------------------------------------------------
Initializing RocksDB Options from the specified file
Initializing RocksDB Options from command-line flags
DB path: [rocksdbtest/dbbench]
put error: IO error: No space left on device: Zone allocation failure
Now I made a device with 1MB zones, 4096 LBA and size 1GB, then
atr@stosys-qemu-vm:/home/atr/src/rocksdb$ sudo ./db_bench --fs_uri=zenfs://dev:nvme1n1 --benchmarks=fillrandom --use_direct_io_for_flush_and_compaction --compression_type=none
Initializing RocksDB Options from the specified file
Initializing RocksDB Options from command-line flags
RocksDB: version 6.20
Date: Tue May 25 13:24:08 2021
CPU: 8 * Intel(R) Xeon(R) Silver 4210R CPU @ 2.40GHz
CPUCache: 16384 KB
Keys: 16 bytes each (+ 0 bytes user-defined timestamp)
Values: 100 bytes each (50 bytes after compression)
Entries: 1000000
Prefix: 0 bytes
Keys per prefix: 0
RawSize: 110.6 MB (estimated)
FileSize: 62.9 MB (estimated)
Write rate: 0 bytes/second
Read rate: 0 ops/second
Compression: NoCompression
Compression sampling rate: 0
Memtablerep: skip_list
Perf Level: 1
------------------------------------------------
Initializing RocksDB Options from the specified file
Initializing RocksDB Options from command-line flags
DB path: [rocksdbtest/dbbench]
fillrandom : 2.448 micros/op 408442 ops/sec; 45.2 MB/s
then the zone reports as
atr@stosys-qemu-vm:/home/atr/src/rocksdb$ sudo nvme zns report-zones /dev/nvme1n1
nr_zones: 1024
SLBA: 0x0 WP: 0x0 Cap: 0x100 State: EMPTY Type: SEQWRITE_REQ Attrs: 0x0
SLBA: 0x100 WP: 0x142 Cap: 0x100 State: IMP_OPENED Type: SEQWRITE_REQ Attrs: 0x0
SLBA: 0x200 WP: 0x200 Cap: 0x100 State: EMPTY Type: SEQWRITE_REQ Attrs: 0x0
SLBA: 0x300 WP: 0x305 Cap: 0x100 State: IMP_OPENED Type: SEQWRITE_REQ Attrs: 0x0
SLBA: 0x400 WP: 0x403 Cap: 0x100 State: IMP_OPENED Type: SEQWRITE_REQ Attrs: 0x0
SLBA: 0x500 WP: 0xffffffffffffffff Cap: 0x100 State: FULL Type: SEQWRITE_REQ Attrs: 0x0
SLBA: 0x600 WP: 0xffffffffffffffff Cap: 0x100 State: FULL Type: SEQWRITE_REQ Attrs: 0x0
SLBA: 0x700 WP: 0xffffffffffffffff Cap: 0x100 State: FULL Type: SEQWRITE_REQ Attrs: 0x0
SLBA: 0x800 WP: 0xffffffffffffffff Cap: 0x100 State: FULL Type: SEQWRITE_REQ Attrs: 0x0
SLBA: 0x900 WP: 0xffffffffffffffff Cap: 0x100 State: FULL Type: SEQWRITE_REQ Attrs: 0x0
SLBA: 0xa00 WP: 0xffffffffffffffff Cap: 0x100 State: FULL Type: SEQWRITE_REQ Attrs: 0x0
SLBA: 0xb00 WP: 0xffffffffffffffff Cap: 0x100 State: FULL Type: SEQWRITE_REQ Attrs: 0x0
SLBA: 0xc00 WP: 0xffffffffffffffff Cap: 0x100 State: FULL Type: SEQWRITE_REQ Attrs: 0x0
SLBA: 0xd00 WP: 0xffffffffffffffff Cap: 0x100 State: FULL Type: SEQWRITE_REQ Attrs: 0x0
SLBA: 0xe00 WP: 0xffffffffffffffff Cap: 0x100 State: FULL Type: SEQWRITE_REQ Attrs: 0x0
SLBA: 0xf00 WP: 0xffffffffffffffff Cap: 0x100 State: FULL Type: SEQWRITE_REQ Attrs: 0x0
SLBA: 0x1000 WP: 0xffffffffffffffff Cap: 0x100 State: FULL Type: SEQWRITE_REQ Attrs: 0x0
SLBA: 0x1100 WP: 0xffffffffffffffff Cap: 0x100 State: FULL Type: SEQWRITE_REQ Attrs: 0x0
SLBA: 0x1200 WP: 0xffffffffffffffff Cap: 0x100 State: FULL Type: SEQWRITE_REQ Attrs: 0x0
SLBA: 0x1300 WP: 0xffffffffffffffff Cap: 0x100 State: FULL Type: SEQWRITE_REQ Attrs: 0x0
SLBA: 0x1400 WP: 0xffffffffffffffff Cap: 0x100 State: FULL Type: SEQWRITE_REQ Attrs: 0x0
SLBA: 0x1500 WP: 0xffffffffffffffff Cap: 0x100 State: FULL Type: SEQWRITE_REQ Attrs: 0x0
SLBA: 0x1600 WP: 0xffffffffffffffff Cap: 0x100 State: FULL Type: SEQWRITE_REQ Attrs: 0x0
SLBA: 0x1700 WP: 0xffffffffffffffff Cap: 0x100 State: FULL Type: SEQWRITE_REQ Attrs: 0x0
SLBA: 0x1800 WP: 0xffffffffffffffff Cap: 0x100 State: FULL Type: SEQWRITE_REQ Attrs: 0x0
SLBA: 0x1900 WP: 0xffffffffffffffff Cap: 0x100 State: FULL Type: SEQWRITE_REQ Attrs: 0x0
SLBA: 0x1a00 WP: 0xffffffffffffffff Cap: 0x100 State: FULL Type: SEQWRITE_REQ Attrs: 0x0
SLBA: 0x1b00 WP: 0xffffffffffffffff Cap: 0x100 State: FULL Type: SEQWRITE_REQ Attrs: 0x0
SLBA: 0x1c00 WP: 0xffffffffffffffff Cap: 0x100 State: FULL Type: SEQWRITE_REQ Attrs: 0x0
SLBA: 0x1d00 WP: 0xffffffffffffffff Cap: 0x100 State: FULL Type: SEQWRITE_REQ Attrs: 0x0
SLBA: 0x1e00 WP: 0xffffffffffffffff Cap: 0x100 State: FULL Type: SEQWRITE_REQ Attrs: 0x0
SLBA: 0x1f00 WP: 0xffffffffffffffff Cap: 0x100 State: FULL Type: SEQWRITE_REQ Attrs: 0x0
SLBA: 0x2000 WP: 0xffffffffffffffff Cap: 0x100 State: FULL Type: SEQWRITE_REQ Attrs: 0x0
SLBA: 0x2100 WP: 0xffffffffffffffff Cap: 0x100 State: FULL Type: SEQWRITE_REQ Attrs: 0x0
SLBA: 0x2200 WP: 0xffffffffffffffff Cap: 0x100 State: FULL Type: SEQWRITE_REQ Attrs: 0x0
SLBA: 0x2300 WP: 0xffffffffffffffff Cap: 0x100 State: FULL Type: SEQWRITE_REQ Attrs: 0x0
SLBA: 0x2400 WP: 0xffffffffffffffff Cap: 0x100 State: FULL Type: SEQWRITE_REQ Attrs: 0x0
SLBA: 0x2500 WP: 0xffffffffffffffff Cap: 0x100 State: FULL Type: SEQWRITE_REQ Attrs: 0x0
SLBA: 0x2600 WP: 0xffffffffffffffff Cap: 0x100 State: FULL Type: SEQWRITE_REQ Attrs: 0x0
SLBA: 0x2700 WP: 0xffffffffffffffff Cap: 0x100 State: FULL Type: SEQWRITE_REQ Attrs: 0x0
SLBA: 0x2800 WP: 0xffffffffffffffff Cap: 0x100 State: FULL Type: SEQWRITE_REQ Attrs: 0x0
SLBA: 0x2900 WP: 0xffffffffffffffff Cap: 0x100 State: FULL Type: SEQWRITE_REQ Attrs: 0x0
SLBA: 0x2a00 WP: 0xffffffffffffffff Cap: 0x100 State: FULL Type: SEQWRITE_REQ Attrs: 0x0
SLBA: 0x2b00 WP: 0xffffffffffffffff Cap: 0x100 State: FULL Type: SEQWRITE_REQ Attrs: 0x0
SLBA: 0x2c00 WP: 0xffffffffffffffff Cap: 0x100 State: FULL Type: SEQWRITE_REQ Attrs: 0x0
SLBA: 0x2d00 WP: 0xffffffffffffffff Cap: 0x100 State: FULL Type: SEQWRITE_REQ Attrs: 0x0
SLBA: 0x2e00 WP: 0xffffffffffffffff Cap: 0x100 State: FULL Type: SEQWRITE_REQ Attrs: 0x0
SLBA: 0x2f00 WP: 0xffffffffffffffff Cap: 0x100 State: FULL Type: SEQWRITE_REQ Attrs: 0x0
SLBA: 0x3000 WP: 0xffffffffffffffff Cap: 0x100 State: FULL Type: SEQWRITE_REQ Attrs: 0x0
SLBA: 0x3100 WP: 0x318d Cap: 0x100 State: IMP_OPENED Type: SEQWRITE_REQ Attrs: 0x0
SLBA: 0x3200 WP: 0x3254 Cap: 0x100 State: IMP_OPENED Type: SEQWRITE_REQ Attrs: 0x0
SLBA: 0x3300 WP: 0xffffffffffffffff Cap: 0x100 State: FULL Type: SEQWRITE_REQ Attrs: 0x0
SLBA: 0x3400 WP: 0x3400 Cap: 0x100 State: EMPTY Type: SEQWRITE_REQ Attrs: 0x0
SLBA: 0x3500 WP: 0x3500 Cap: 0x100 State: EMPTY Type: SEQWRITE_REQ Attrs: 0x0
SLBA: 0x3600 WP: 0xffffffffffffffff Cap: 0x100 State: FULL Type: SEQWRITE_REQ Attrs: 0x0
SLBA: 0x3700 WP: 0x3700 Cap: 0x100 State: EMPTY Type: SEQWRITE_REQ Attrs: 0x0
SLBA: 0x3800 WP: 0x3800 Cap: 0x100 State: EMPTY Type: SEQWRITE_REQ Attrs: 0x0
SLBA: 0x3900 WP: 0xffffffffffffffff Cap: 0x100 State: FULL Type: SEQWRITE_REQ Attrs: 0x0
SLBA: 0x3a00 WP: 0x3a00 Cap: 0x100 State: EMPTY Type: SEQWRITE_REQ Attrs: 0x0
SLBA: 0x3b00 WP: 0x3b00 Cap: 0x100 State: EMPTY Type: SEQWRITE_REQ Attrs: 0x0
SLBA: 0x3c00 WP: 0xffffffffffffffff Cap: 0x100 State: FULL Type: SEQWRITE_REQ Attrs: 0x0
SLBA: 0x3d00 WP: 0x3d00 Cap: 0x100 State: EMPTY Type: SEQWRITE_REQ Attrs: 0x0
SLBA: 0x3e00 WP: 0x3e00 Cap: 0x100 State: EMPTY Type: SEQWRITE_REQ Attrs: 0x0
SLBA: 0x3f00 WP: 0x3f00 Cap: 0x100 State: EMPTY Type: SEQWRITE_REQ Attrs: 0x0
SLBA: 0x4000 WP: 0xffffffffffffffff Cap: 0x100 State: FULL Type: SEQWRITE_REQ Attrs: 0x0
SLBA: 0x4100 WP: 0xffffffffffffffff Cap: 0x100 State: FULL Type: SEQWRITE_REQ Attrs: 0x0
SLBA: 0x4200 WP: 0xffffffffffffffff Cap: 0x100 State: FULL Type: SEQWRITE_REQ Attrs: 0x0
SLBA: 0x4300 WP: 0xffffffffffffffff Cap: 0x100 State: FULL Type: SEQWRITE_REQ Attrs: 0x0
SLBA: 0x4400 WP: 0xffffffffffffffff Cap: 0x100 State: FULL Type: SEQWRITE_REQ Attrs: 0x0
SLBA: 0x4500 WP: 0xffffffffffffffff Cap: 0x100 State: FULL Type: SEQWRITE_REQ Attrs: 0x0
SLBA: 0x4600 WP: 0xffffffffffffffff Cap: 0x100 State: FULL Type: SEQWRITE_REQ Attrs: 0x0
SLBA: 0x4700 WP: 0xffffffffffffffff Cap: 0x100 State: FULL Type: SEQWRITE_REQ Attrs: 0x0
SLBA: 0x4800 WP: 0xffffffffffffffff Cap: 0x100 State: FULL Type: SEQWRITE_REQ Attrs: 0x0
SLBA: 0x4900 WP: 0xffffffffffffffff Cap: 0x100 State: FULL Type: SEQWRITE_REQ Attrs: 0x0
SLBA: 0x4a00 WP: 0xffffffffffffffff Cap: 0x100 State: FULL Type: SEQWRITE_REQ Attrs: 0x0
SLBA: 0x4b00 WP: 0xffffffffffffffff Cap: 0x100 State: FULL Type: SEQWRITE_REQ Attrs: 0x0
SLBA: 0x4c00 WP: 0xffffffffffffffff Cap: 0x100 State: FULL Type: SEQWRITE_REQ Attrs: 0x0
SLBA: 0x4d00 WP: 0xffffffffffffffff Cap: 0x100 State: FULL Type: SEQWRITE_REQ Attrs: 0x0
SLBA: 0x4e00 WP: 0xffffffffffffffff Cap: 0x100 State: FULL Type: SEQWRITE_REQ Attrs: 0x0
SLBA: 0x4f00 WP: 0xffffffffffffffff Cap: 0x100 State: FULL Type: SEQWRITE_REQ Attrs: 0x0
SLBA: 0x5000 WP: 0xffffffffffffffff Cap: 0x100 State: FULL Type: SEQWRITE_REQ Attrs: 0x0
SLBA: 0x5100 WP: 0xffffffffffffffff Cap: 0x100 State: FULL Type: SEQWRITE_REQ Attrs: 0x0
SLBA: 0x5200 WP: 0xffffffffffffffff Cap: 0x100 State: FULL Type: SEQWRITE_REQ Attrs: 0x0
SLBA: 0x5300 WP: 0xffffffffffffffff Cap: 0x100 State: FULL Type: SEQWRITE_REQ Attrs: 0x0
SLBA: 0x5400 WP: 0xffffffffffffffff Cap: 0x100 State: FULL Type: SEQWRITE_REQ Attrs: 0x0
SLBA: 0x5500 WP: 0xffffffffffffffff Cap: 0x100 State: FULL Type: SEQWRITE_REQ Attrs: 0x0
SLBA: 0x5600 WP: 0xffffffffffffffff Cap: 0x100 State: FULL Type: SEQWRITE_REQ Attrs: 0x0
SLBA: 0x5700 WP: 0xffffffffffffffff Cap: 0x100 State: FULL Type: SEQWRITE_REQ Attrs: 0x0
SLBA: 0x5800 WP: 0xffffffffffffffff Cap: 0x100 State: FULL Type: SEQWRITE_REQ Attrs: 0x0
SLBA: 0x5900 WP: 0xffffffffffffffff Cap: 0x100 State: FULL Type: SEQWRITE_REQ Attrs: 0x0
SLBA: 0x5a00 WP: 0xffffffffffffffff Cap: 0x100 State: FULL Type: SEQWRITE_REQ Attrs: 0x0
SLBA: 0x5b00 WP: 0xffffffffffffffff Cap: 0x100 State: FULL Type: SEQWRITE_REQ Attrs: 0x0
SLBA: 0x5c00 WP: 0xffffffffffffffff Cap: 0x100 State: FULL Type: SEQWRITE_REQ Attrs: 0x0
SLBA: 0x5d00 WP: 0xffffffffffffffff Cap: 0x100 State: FULL Type: SEQWRITE_REQ Attrs: 0x0
SLBA: 0x5e00 WP: 0xffffffffffffffff Cap: 0x100 State: FULL Type: SEQWRITE_REQ Attrs: 0x0
SLBA: 0x5f00 WP: 0xffffffffffffffff Cap: 0x100 State: FULL Type: SEQWRITE_REQ Attrs: 0x0
SLBA: 0x6000 WP: 0xffffffffffffffff Cap: 0x100 State: FULL Type: SEQWRITE_REQ Attrs: 0x0
SLBA: 0x6100 WP: 0xffffffffffffffff Cap: 0x100 State: FULL Type: SEQWRITE_REQ Attrs: 0x0
SLBA: 0x6200 WP: 0xffffffffffffffff Cap: 0x100 State: FULL Type: SEQWRITE_REQ Attrs: 0x0
SLBA: 0x6300 WP: 0xffffffffffffffff Cap: 0x100 State: FULL Type: SEQWRITE_REQ Attrs: 0x0
SLBA: 0x6400 WP: 0xffffffffffffffff Cap: 0x100 State: FULL Type: SEQWRITE_REQ Attrs: 0x0
SLBA: 0x6500 WP: 0xffffffffffffffff Cap: 0x100 State: FULL Type: SEQWRITE_REQ Attrs: 0x0
SLBA: 0x6600 WP: 0xffffffffffffffff Cap: 0x100 State: FULL Type: SEQWRITE_REQ Attrs: 0x0
SLBA: 0x6700 WP: 0xffffffffffffffff Cap: 0x100 State: FULL Type: SEQWRITE_REQ Attrs: 0x0
SLBA: 0x6800 WP: 0xffffffffffffffff Cap: 0x100 State: FULL Type: SEQWRITE_REQ Attrs: 0x0
SLBA: 0x6900 WP: 0xffffffffffffffff Cap: 0x100 State: FULL Type: SEQWRITE_REQ Attrs: 0x0
SLBA: 0x6a00 WP: 0xffffffffffffffff Cap: 0x100 State: FULL Type: SEQWRITE_REQ Attrs: 0x0
SLBA: 0x6b00 WP: 0xffffffffffffffff Cap: 0x100 State: FULL Type: SEQWRITE_REQ Attrs: 0x0
SLBA: 0x6c00 WP: 0xffffffffffffffff Cap: 0x100 State: FULL Type: SEQWRITE_REQ Attrs: 0x0
SLBA: 0x6d00 WP: 0xffffffffffffffff Cap: 0x100 State: FULL Type: SEQWRITE_REQ Attrs: 0x0
SLBA: 0x6e00 WP: 0xffffffffffffffff Cap: 0x100 State: FULL Type: SEQWRITE_REQ Attrs: 0x0
SLBA: 0x6f00 WP: 0xffffffffffffffff Cap: 0x100 State: FULL Type: SEQWRITE_REQ Attrs: 0x0
SLBA: 0x7000 WP: 0xffffffffffffffff Cap: 0x100 State: FULL Type: SEQWRITE_REQ Attrs: 0x0
SLBA: 0x7100 WP: 0xffffffffffffffff Cap: 0x100 State: FULL Type: SEQWRITE_REQ Attrs: 0x0
SLBA: 0x7200 WP: 0xffffffffffffffff Cap: 0x100 State: FULL Type: SEQWRITE_REQ Attrs: 0x0
SLBA: 0x7300 WP: 0xffffffffffffffff Cap: 0x100 State: FULL Type: SEQWRITE_REQ Attrs: 0x0
SLBA: 0x7400 WP: 0xffffffffffffffff Cap: 0x100 State: FULL Type: SEQWRITE_REQ Attrs: 0x0
SLBA: 0x7500 WP: 0xffffffffffffffff Cap: 0x100 State: FULL Type: SEQWRITE_REQ Attrs: 0x0
SLBA: 0x7600 WP: 0xffffffffffffffff Cap: 0x100 State: FULL Type: SEQWRITE_REQ Attrs: 0x0
SLBA: 0x7700 WP: 0xffffffffffffffff Cap: 0x100 State: FULL Type: SEQWRITE_REQ Attrs: 0x0
SLBA: 0x7800 WP: 0xffffffffffffffff Cap: 0x100 State: FULL Type: SEQWRITE_REQ Attrs: 0x0
SLBA: 0x7900 WP: 0xffffffffffffffff Cap: 0x100 State: FULL Type: SEQWRITE_REQ Attrs: 0x0
SLBA: 0x7a00 WP: 0xffffffffffffffff Cap: 0x100 State: FULL Type: SEQWRITE_REQ Attrs: 0x0
SLBA: 0x7b00 WP: 0xffffffffffffffff Cap: 0x100 State: FULL Type: SEQWRITE_REQ Attrs: 0x0
SLBA: 0x7c00 WP: 0xffffffffffffffff Cap: 0x100 State: FULL Type: SEQWRITE_REQ Attrs: 0x0
SLBA: 0x7d00 WP: 0x7d00 Cap: 0x100 State: EMPTY Type: SEQWRITE_REQ Attrs: 0x0
SLBA: 0x7e00 WP: 0x7e00 Cap: 0x100 State: EMPTY Type: SEQWRITE_REQ Attrs: 0x0
SLBA: 0x7f00 WP: 0xffffffffffffffff Cap: 0x100 State: FULL Type: SEQWRITE_REQ Attrs: 0x0
SLBA: 0x8000 WP: 0x8000 Cap: 0x100 State: EMPTY Type: SEQWRITE_REQ Attrs: 0x0
SLBA: 0x8100 WP: 0xffffffffffffffff Cap: 0x100 State: FULL Type: SEQWRITE_REQ Attrs: 0x0
SLBA: 0x8200 WP: 0x8200 Cap: 0x100 State: EMPTY Type: SEQWRITE_REQ Attrs: 0x0
SLBA: 0x8300 WP: 0x8300 Cap: 0x100 State: EMPTY Type: SEQWRITE_REQ Attrs: 0x0
SLBA: 0x8400 WP: 0xffffffffffffffff Cap: 0x100 State: FULL Type: SEQWRITE_REQ Attrs: 0x0
SLBA: 0x8500 WP: 0x8500 Cap: 0x100 State: EMPTY Type: SEQWRITE_REQ Attrs: 0x0
SLBA: 0x8600 WP: 0x8600 Cap: 0x100 State: EMPTY Type: SEQWRITE_REQ Attrs: 0x0
SLBA: 0x8700 WP: 0xffffffffffffffff Cap: 0x100 State: FULL Type: SEQWRITE_REQ Attrs: 0x0
SLBA: 0x8800 WP: 0x8800 Cap: 0x100 State: EMPTY Type: SEQWRITE_REQ Attrs: 0x0
SLBA: 0x8900 WP: 0x8900 Cap: 0x100 State: EMPTY Type: SEQWRITE_REQ Attrs: 0x0
SLBA: 0x8a00 WP: 0x8a00 Cap: 0x100 State: EMPTY Type: SEQWRITE_REQ Attrs: 0x0
SLBA: 0x8b00 WP: 0x8b00 Cap: 0x100 State: EMPTY Type: SEQWRITE_REQ Attrs: 0x0
SLBA: 0x8c00 WP: 0x8c00 Cap: 0x100 State: EMPTY Type: SEQWRITE_REQ Attrs: 0x0
SLBA: 0x8d00 WP: 0x8d00 Cap: 0x100 State: EMPTY Type: SEQWRITE_REQ Attrs: 0x0
SLBA: 0x8e00 WP: 0x8e00 Cap: 0x100 State: EMPTY Type: SEQWRITE_REQ Attrs: 0x0
SLBA: 0x8f00 WP: 0x8f00 Cap: 0x100 State: EMPTY Type: SEQWRITE_REQ Attrs: 0x0
..to 1024 zones (not shown)
...
- https://github.com/hgst/libnvme -- fork from SPDK * Needs hugepages and probably a true direct access API (no support for ZNS)
- https://github.com/linux-nvme/libnvme/ -- using the driver passthrough ioctl interface
- https://blogs.oracle.com/linux/how-to-emulate-block-devices-with-qemu QEMU device emulation
Compiling xNVMe, to disable SPDK BE (./configure --disable-be-spdk) https://xnvme.io/docs/latest/getting_started/
Which numa node pcie device is attached to
atr@node1:/home/atr/tmp/libnvme$ sudo cat /sys/class/nvme/nvme0/numa_node
1
atr@node1:/home/atr/tmp/libnvme$
On node3
$ sudo nvme show-regs -H /dev/nvme1n1
cap : 2004010fff
Controller Memory Buffer Supported (CMBS): The Controller Memory Buffer is Not Supported
Persistent Memory Region Supported (PMRS): The Persistent Memory Region is Not Supported
Memory Page Size Maximum (MPSMAX): 4096 bytes
Memory Page Size Minimum (MPSMIN): 4096 bytes
Boot Partition Support (BPS): No
Command Sets Supported (CSS): NVM command set is Supported
One or more I/O Command Sets are Not Supported
NVM Subsystem Reset Supported (NSSRS): No
Doorbell Stride (DSTRD): 4 bytes
Timeout (TO): 2000 ms
Arbitration Mechanism Supported (AMS): Weighted Round Robin with Urgent Priority Class is not supported
Contiguous Queues Required (CQR): Yes
Maximum Queue Entries Supported (MQES): 4096
[...]
$ sudo nvme id-ctrl -H /dev/nvme1n1
version : 10000
NVMe specification 1.0
cc : 460001
I/O Completion Queue Entry Size (IOCQES): 16 bytes
I/O Submission Queue Entry Size (IOSQES): 64 bytes
Shutdown Notification (SHN): No notification; no effect
Arbitration Mechanism Selected (AMS): Round Robin
Memory Page Size (MPS): 4096 bytes
I/O Command Set Selected (CSS): NVM Command Set
Enable (EN): Yes
NVME Identify Controller:
vid : 0x8086
ssvid : 0x8086
sn : PHM203410051280AGN
mn : INTEL SSDPE21D280GA
fr : E2010480
rab : 0
ieee : 5cd2e4
cmic : 0
[3:3] : 0 ANA not supported
[2:2] : 0 PCI
[1:1] : 0 Single Controller
[0:0] : 0 Single Port
mdts : 5
[...]
With MDTS being 5 => (4096 (minmps) * 2^5 (mdts)) => 128KB. Also remember LBA size is 512 bytes on Optane
testing 128 KB
atr@node3:~$ DD=256; sudo nvme read /dev/nvme1n1 -s 0x100 -b $(($DD -1)) -z $(($DD * 512)) -d 512KB
read: Success
atr@node3:~$ DD=256; sudo nvme write /dev/nvme1n1 -s 0x100 -b $(($DD -1)) -z $(($DD * 512)) -d 512KB
write: Success
atr@node3:~$ DD=257; sudo nvme write /dev/nvme1n1 -s 0x100 -b $(($DD -1)) -z $(($DD * 512)) -d 512KB
submit-io: Invalid argument
atr@node3:~$ DD=257; sudo nvme read /dev/nvme1n1 -s 0x100 -b $(($DD -1)) -z $(($DD * 512)) -d 512KB
submit-io: Invalid argument
atr@node3:~$
so 256 * 512 = 128KB - it works!
Now inside the VM with ZNS
cap : 4018200f0107ff
Controller Memory Buffer Supported (CMBS): The Controller Memory Buffer is Not Supported
Persistent Memory Region Supported (PMRS): The Persistent Memory Region is Not Supported
Memory Page Size Maximum (MPSMAX): 65536 bytes
Memory Page Size Minimum (MPSMIN): 4096 bytes
Boot Partition Support (BPS): No
Command Sets Supported (CSS): NVM command set is Supported
One or more I/O Command Sets are Supported
NVM Subsystem Reset Supported (NSSRS): No
Doorbell Stride (DSTRD): 4 bytes
Timeout (TO): 7500 ms
Arbitration Mechanism Supported (AMS): Weighted Round Robin with Urgent Priority Class is not supported
Contiguous Queues Required (CQR): Yes
Maximum Queue Entries Supported (MQES): 2048
version : 10400
NVMe specification 1.4
cc : 460061
I/O Completion Queue Entry Size (IOCQES): 16 bytes
I/O Submission Queue Entry Size (IOSQES): 64 bytes
Shutdown Notification (SHN): No notification; no effect
Arbitration Mechanism Selected (AMS): Round Robin
Memory Page Size (MPS): 4096 bytes
I/O Command Set Selected (CSS): All supported I/O Command Sets
Enable (EN): Yes
[...]
and the id-ctrl
NVME Identify Controller:
vid : 0x1b36
ssvid : 0x1af4
sn : zns-dev
mn : QEMU NVMe Ctrl
fr : 1.0
rab : 6
ieee : 525400
cmic : 0
[3:3] : 0 ANA not supported
[2:2] : 0 PCI
[1:1] : 0 Single Controller
[0:0] : 0 Single Port
mdts : 7
cntlid : 0
ver : 0x10400
rtd3r : 0
rtd3e : 0
oaes : 0x100
The default QEMU MDTS is 7. So here maximum I/O size is : (4096 * 2^7) => 512kB the LBA block size in use here is 4096.
This is normal NVMe device, see the random pattern of failure
atr@stosys-qemu-vm:/home/atr/new/zns-resources/stosys-class/stosys-project-code$ DD=128; for((i=0;i<10;i++)); do sudo nvme write /dev/nvme1n1 -s 0x0 -b $(($DD -1)) -z $(($DD * 4096)) -d 512KB; done
write: Success
submit-io: Invalid argument
submit-io: Invalid argument
submit-io: Invalid argument
submit-io: Invalid argument
write: Success
write: Success
write: Success
write: Success
write: Success
atr@stosys-qemu-vm:/home/atr/new/zns-resources/stosys-class/stosys-project-code$ DD=128; for((i=0;i<10;i++)); do sudo nvme write /dev/nvme1n1 -s 0x0 -b $(($DD -1)) -z $(($DD * 4096)) -d 512KB; done
submit-io: Invalid argument
write: Success
write: Success
write: Success
write: Success
write: Success
write: Success
write: Success
submit-io: Invalid argument
submit-io: Invalid argument
atr@stosys-qemu-vm:/home/atr/new/zns-resources/stosys-class/stosys-project-code$ DD=128; for((i=0;i<10;i++)); do sudo nvme write /dev/nvme1n1 -s 0x0 -b $(($DD -1)) -z $(($DD * 4096)) -d 512KB; done
write: Success
submit-io: Invalid argument
submit-io: Invalid argument
submit-io: Invalid argument
submit-io: Invalid argument
submit-io: Invalid argument
write: Success
write: Success
write: Success
write: Success
atr@stosys-qemu-vm:/home/atr/new/zns-resources/stosys-class/stosys-project-code$
Simillar story is on the zone devices, but with a reset. On actual Optane NVMe is always works properly.
atr@atr-xps-13:~/vu/github/atr-zns-resources/stosys-class/stosys-project-code/src/m3$ make
g++ -std=c++11 -faligned-new -DHAVE_ALIGNED_NEW -DROCKSDB_PLATFORM_POSIX -DROCKSDB_LIB_IO_POSIX -DOS_LINUX -fno-builtin-memcmp -DROCKSDB_FALLOCATE_PRESENT -DGFLAGS=1 -DZLIB -DNUMA -DROCKSDB_MALLOC_USABLE_SIZE -DROCKSDB_PTHREAD_ADAPTIVE_MUTEX -DROCKSDB_BACKTRACE -DROCKSDB_RANGESYNC_PRESENT -DROCKSDB_SCHED_GETCPU_PRESENT -DROCKSDB_AUXV_GETAUXVAL_PRESENT -march=native -DHAVE_SSE42 -DHAVE_PCLMUL -DHAVE_AVX2 -DHAVE_BMI -DHAVE_LZCNT -DHAVE_UINT128_EXTENSION -DROCKSDB_SUPPORT_THREAD_LOCAL -isystem third-party/gtest-1.8.1/fused-src -isystem ./third-party/folly -I/usr/local/include -I/home/atr/vu/github/storage/rocksdb/ -L/home/atr/local/lib/ -L/home/atr/local/usr/local/lib/ -Wl,-rpath=/home/atr/local/lib -o m3 src/m3_main.o -L/usr/local/lib -ldl -lrocksdb -lpthread -lrt -ldl -lgflags -lz -lnuma -lzstd -lbz2 -llz4 -lsnappy -u m3_leveldb_reg
/usr/bin/ld: cannot find -lzstd
/usr/bin/ld: cannot find -lbz2
/usr/bin/ld: cannot find -llz4
/usr/bin/ld: cannot find -lsnappy
collect2: error: ld returned 1 exit status
make: *** [Makefile:17: m3] Error 1
atr@atr-xps-13:~/vu/github/atr-zns-resources/stosys-class/stosys-project-code/src/m3$
RocksbDB needs libraries:
sudo apt install autoconf libgflags-dev libtool autoconf-archive
sudo apt-get install libsnappy-dev zlib1g-dev libbz2-dev liblz4-dev libzstd-dev
see this: https://github.com/facebook/rocksdb/blob/main/INSTALL.md#dependencies
In case of missing references like:
atr@atr-xps-13:~/vu/github/atr-zns-resources/stosys-class/stosys-project-code$ g++ -std=c++11 -frtti -faligned-new -DHAVE_ALIGNED_NEW -DROCKSDB_PLATFORM_POSIX -DROCKSDB_LIB_IO_POSIX -DOS_LINUX -fno-builtin-memcmp -DROCKSDB_FALLOCATE_PRESENT -DGFLAGS=1 -DZLIB -DNUMA -DROCKSDB_MALLOC_USABLE_SIZE -DROCKSDB_PTHREAD_ADAPTIVE_MUTEX -DROCKSDB_BACKTRACE -DROCKSDB_RANGESYNC_PRESENT -DROCKSDB_SCHED_GETCPU_PRESENT -DROCKSDB_AUXV_GETAUXVAL_PRESENT -march=native -DHAVE_SSE42 -DHAVE_PCLMUL -DHAVE_AVX2 -DHAVE_BMI -DHAVE_LZCNT -DHAVE_UINT128_EXTENSION -DROCKSDB_SUPPORT_THREAD_LOCAL -isystem third-party/gtest-1.8.1/fused-src -isystem ./third-party/folly -I/usr/local/include -I/home/atr/vu/github/storage/rocksdb/ -L/home/atr/local/lib/ -L/home/atr/local/usr/local/lib/ -Wl,-rpath=/home/atr/local/lib CMakeFiles/m3.dir/src/m3/src/m3_main.cpp.o CMakeFiles/m3.dir/src/m3/src/m3.cpp.o -o bin/m3 -L/usr/local/lib -ldl -lrocksdb -lpthread -lrt -ldl -lgflags -lz -lnuma -lzstd -lbz2 -llz4 -lsnappy -u m3_leveldb_reg
/usr/bin/ld: CMakeFiles/m3.dir/src/m3/src/m3.cpp.o:(.data.rel.ro._ZTIN7rocksdb2M3E[_ZTIN7rocksdb2M3E]+0x10): undefined reference to `typeinfo for rocksdb::FileSystem'
/usr/bin/ld: CMakeFiles/m3.dir/src/m3/src/m3.cpp.o:(.data.rel.ro._ZTIN7rocksdb17FileSystemWrapperE[_ZTIN7rocksdb17FileSystemWrapperE]+0x10): undefined reference to `typeinfo for rocksdb::FileSystem'
collect2: error: ld returned 1 exit status
atr@atr-xps-13:~/vu/github/atr-zns-resources/stosys-class/stosys-project-code$
use the RTTI flag to compile the rocksdb
DEBUG_LEVEL=0 USE_RTTI=1 DESTDIR=/home/atr/local/ make -j 4 db_bench install
- https://github.com/facebook/rocksdb/issues/4329
- https://stackoverflow.com/questions/307352/g-undefined-reference-to-typeinfo
- https://github.com/facebook/rocksdb/issues/4903
atr@stosys-qemu-vm:~$ sudo nvme list-ns /dev/nvme1 -H
list-ns: unrecognized option '-H'
Usage: nvme list-ns <device> [OPTIONS]
For the specified controller handle, show the namespace list in the
associated NVMe subsystem, optionally starting with a given nsid.
Options:
[ --namespace-id=<NUM>, -n <NUM> ] --- first nsid returned list should
start from
[ --csi=<NUM>, -y <NUM> ] --- I/O command set identifier
[ --all, -a ] --- show all namespaces in the
subsystem, whether attached or
inactive
atr@stosys-qemu-vm:~$ sudo nvme list-ns /dev/nvme1 -a
[ 0]:0x1
atr@stosys-qemu-vm:~$
atr@stosys-qemu-vm:~$ sudo nvme id-ctrl /dev/nvme1 -H | grep 'NS Management'
[3:3] : 0x1 NS Management and Attachment Supported
atr@stosys-qemu-vm:~$
See figure 251 specification 1.4 field OACS.
atr@node1:/home/atr/zns-fw$ sudo nvme id-ctrl /dev/nvme1 | grep cntlid
cntlid : 0
atr@node1:/home/atr/zns-fw$
atr@stosys-qemu-vm:~$ sudo nvme detach-ns /dev/nvme1 -n 1 -c 0
NVMe status: INVALID_FIELD: A reserved coded value or an unsupported value in a defined field(0x4002)
atr@stosys-qemu-vm:~$
Trying to detach a namespace (it says in the delete that you should detach it first)
atr@stosys-qemu-vm:~$ sudo nvme delete-ns -h
Usage: nvme delete-ns <device> [OPTIONS]
Delete the given namespace by sending a namespace management command to the
provided device. All controllers should be detached from the namespace prior
to namespace deletion. A namespace ID becomes inactive when that namespace
is detached or, if the namespace is not already inactive, once deleted.
Options:
[ --namespace-id=<NUM>, -n <NUM> ] --- namespace to delete
[ --timeout=<NUM>, -t <NUM> ] --- timeout value, in milliseconds
atr@stosys-qemu-vm:~$ sudo nvme delete-ns /dev/nvme1 -n 1
NVMe status: INVALID_OPCODE: The associated command opcode field is not valid(0x4001)
https://www.ibm.com/docs/en/linux-on-systems?topic=drive-deleting-stray-nvme-namespaces-nvme
atr@node1:/home/atr/zns-fw$ ll
total 7120
drwxrwxr-x 3 atr atr 4096 Nov 2 09:38 ./
drwxr-xr-x 8 atr atr 4096 Nov 2 09:38 ../
-rw-rw-r-- 1 atr atr 3645440 Oct 14 13:10 borabora_zns_GZ_R6Z10011.vpkg
-rw-rw-r-- 1 atr atr 3619845 Oct 27 08:27 borabora_zns_GZ_R6Z10011.zip
-rw-rw-r-- 1 atr atr 355 Apr 9 2021 create_zns.sh
-rw-rw-r-- 1 atr atr 372 Oct 27 08:08 load_fw.sh
drwxrwxr-x 2 atr atr 4096 Nov 2 09:38 Previous/
-rw-rw-r-- 1 atr atr 491 Oct 27 08:27 readme.txt
atr@node1:/home/atr/zns-fw$ cat readme.txt
Disclaimer: Firmware binary is shared under NDA and confidential.
How to apply the firmware:
1. Copy files to server
2. Load firmware. All device namespaces will be deleted.
sudo ./load_fw.sh /dev/nvmeX borabora_zns_GZ_R6ZXXXXX.vpdk
3. Cold reboot the system
Note that if namespaces are deleted, the drive will not be visible by "nvme list" until it has namespaces recreated.
To create a single large zoned namespace, the command ./create_zns.sh /dev/nvmeX may be used.
atr@node1:/home/atr/zns-fw$ cat load_fw.sh
#!/bin/sh
#nvme delete-ns $1 -n 0xffffffff
#nvme format $1 -n 0xffffffff -l 2 -f
sleep 1
nvme fw-download $1 -f $2
sleep 1
nvme fw-activate $1 -s 1 -a 0
sleep 1
nvme fw-download $1 -f $2
sleep 1
nvme fw-activate $1 -s 2 -a 0
sleep 1
nvme fw-download $1 -f $2
sleep 1
nvme fw-activate $1 -s 3 -a 0
sleep 1
nvme fw-download $1 -f $2
sleep 1
nvme fw-activate $1 -s 4 -a 1
atr@node1:/home/atr/zns-fw$ sudo nvme detach-ns /dev/nvme1
nvme1 nvme1n1 nvme1n2
atr@node1:/home/atr/zns-fw$ sudo nvme detach-ns /dev/nvme1 -n 1
NVMe status: CONTROLLER_LIST_INVALID: The controller list provided is invalid(0x611c)
atr@node1:/home/atr/zns-fw$ sudo nvme id-ctrl /dev/nvme1 | grep cntlid
cntlid : 0
atr@node1:/home/atr/zns-fw$ #sudo nvme detach-ns /dev/nvme1 -n 1 -c 0 C
atr@node1:/home/atr/zns-fw$ sudo nvme list-ns /dev/nvme1 -a
[ 0]:0x1
[ 1]:0x2
atr@node1:/home/atr/zns-fw$ sudo nvme detach-ns /dev/nvme1 -n 1 -c 0
detach-ns: Success, nsid:1
atr@node1:/home/atr/zns-fw$ sudo nvme detach-ns /dev/nvme1 -n 2 -c 0
detach-ns: Success, nsid:2
atr@node1:/home/atr/zns-fw$ chmod +x load_fw.sh
atr@node1:/home/atr/zns-fw$ sudo ./load_fw.sh /dev/nvme
nvme0 nvme0n1 nvme1
atr@node1:/home/atr/zns-fw$ sudo ./load_fw.sh /dev/nvme^C
atr@node1:/home/atr/zns-fw$ sudo nvme list-ns /dev/nvme1 -a
[ 0]:0x1
[ 1]:0x2
atr@node1:/home/atr/zns-fw$ sudo nvme delete-ns /dev/nvme1 -n 1
delete-ns: Success, deleted nsid:1
atr@node1:/home/atr/zns-fw$ sudo nvme delete-ns /dev/nvme1 -n 2
delete-ns: Success, deleted nsid:2
atr@node1:/home/atr/zns-fw$ sudo nvme list-ns /dev/nvme1 -a
atr@node1:/home/atr/zns-fw$ sudo ./load_fw.sh /dev/nvme
nvme0 nvme0n1 nvme1
atr@node1:/home/atr/zns-fw$ sudo ./load_fw.sh /dev/nvme1 ./borabora_zns_GZ_R6Z10011.vpkg
Firmware download success
NVMe status: FIRMWARE_SLOT: The firmware slot indicated is invalid or read only. This error is indicated if the firmware slot exceeds the number supported(0x6106)
Firmware download success
Success committing firmware action:0 slot:2
Firmware download success
Success committing firmware action:0 slot:3
Firmware download success
Success committing firmware action:1 slot:4
atr@node1:/home/atr/zns-fw$ sudo sync
atr@node1:/home/atr/zns-fw$ sudo sync
atr@node1:/home/atr/zns-fw$ sudo reboot
the v1.4 has section 4.6 as completion queue entry status. Then the section 4.6.1.2.1 has generic error codes in the figure 128.
Examples
atr@stosys-qemu-vm:/home/atr/msc-stosys-framework$ sudo nvme write /dev/nvme0n1 -s 0 -c 0 -z 4096 -d ./4kb
NVMe status: ZONE_INVALID_WRITE: The write to zone was not at the write pointer offset(0x41bc)
atr@stosys-qemu-vm:/home/atr/msc-stosys-framework$
Here 0x41 goes to command specific codes Then 0xBC goes to the ZNS
The version 2.0 has these details in 3.3.3.2.1
atr@node3:~$ sudo nvme format /dev/nvme1n2 -lbaf 0
You are about to format nvme1n2, namespace 0x2.
Namespace nvme1n2 has parent controller(s):nvme1
WARNING: Format may irrevocably delete this device's data.
You have 10 seconds to press Ctrl-C to cancel this operation.
Use the force [--force|-f] option to suppress this warning.
Sending format operation ...
NVMe status: INVALID_FORMAT: The LBA Format specified is not supported. This may be due to various conditions(0x610a)
atr@node3:~$ sudo nvme id-ctrl /dev/nvme1 |grep tnvmcap
tnvmcap : 7924214661120
atr@node3:~$ sudo nvme id-ctrl /dev/nvme1 |grep unvmcap
unvmcap : 0
atr@node3:~$ sudo nvme delete-ns /dev/nvme1 -n 1
delete-ns: Success, deleted nsid:1
atr@node3:~$ sudo nvme delete-ns /dev/nvme1 -n 2
delete-ns: Success, deleted nsid:2
atr@node3:~$ sudo nvme create-ns /dev/nvme1 -s 8388608 -c 8388608 -b 512 --csi=0
create-ns: Success, created nsid:1
atr@node3:~$ sudo nvme create-ns /dev/nvme1 -s 15468593152 -c 15468593152 -b 512 --csi=2
create-ns: Success, created nsid:2
atr@node3:~$ sudo nvme create-ns /dev/nvme1 -s 15468593152 -c 15468593152 -b 512 --csi=2^C
atr@node3:~$ sudo nvme attach-ns /dev/nvme1 -n 1 -c 0
attach-ns: Success, nsid:1
atr@node3:~$ sudo nvme attach-ns /dev/nvme1 -n 2 -c 0
attach-ns: Success, nsid:2
atr@node3:~$ sudo ./nvme attach-ns /dev/nvme1 -n 1 -c 0^C
atr@node3:~$ sudo nvme list
Node SN Model Namespace Usage Format FW Rev
--------------------- -------------------- ---------------------------------------- --------- -------------------------- ---------------- --------
/dev/nvme0n1 PHM20341005S280AGN INTEL SSDPE21D280GA 1 280.07 GB / 280.07 GB 512 B + 0 B E2010480
/dev/nvme1n1 21123U900167 WZS4C8T4TDSP303 1 0.00 B / 4.29 GB 512 B + 0 B R6Z0701D
/dev/nvme1n2 21123U900167 WZS4C8T4TDSP303 2 0.00 B / 7.92 TB 512 B + 0 B R6Z0701D
atr@node3:~$ lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
loop0 7:0 0 55.4M 1 loop /snap/core18/2128
loop1 7:1 0 55.5M 1 loop /snap/core18/2246
loop2 7:2 0 61.9M 1 loop /snap/core20/1169
loop4 7:4 0 61.8M 1 loop /snap/core20/1081
loop5 7:5 0 32.5M 1 loop /snap/snapd/13640
loop6 7:6 0 67.2M 1 loop /snap/lxd/21835
loop7 7:7 0 32.4M 1 loop /snap/snapd/13270
loop8 7:8 0 67.2M 1 loop /snap/lxd/21803
sda 8:0 0 447.1G 0 disk
├─sda1 8:1 0 1M 0 part
├─sda2 8:2 0 50G 0 part /boot
├─sda3 8:3 0 200G 0 part /
└─sda4 8:4 0 197.1G 0 part
nvme0n1 259:0 0 260.9G 0 disk
├─nvme0n1p1 259:3 0 150G 0 part /mnt/xfs
└─nvme0n1p2 259:4 0 110.8G 0 part
nvme1n1 259:1 0 4G 0 disk
nvme1n2 259:2 0 7.2T 0 disk
atr@node3:~$
The packaged mkfs.f2fs
is old, so it needs to be updated. The changes were merged into https://www.mail-archive.com/[email protected]/msg17381.html and https://www.mail-archive.com/[email protected]/msg17379.html (last year, April 2020).
mkfs.f2fs utilitiies: https://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs-tools.git/about/
git clone git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs-tools.git
cd f2fs-tools/
./autogen.sh
./configure --prefix=/home/atr/local/
make -j
Then if the size is not enough
atr@node1:/home/atr/src/f2fs-tools$ sudo /home/atr/local/sbin/mkfs.f2fs -l atrf2fs -f -m -c /dev/nvme1n2 /dev/nvme1n1 # conventional and then the zone device
F2FS-tools: mkfs.f2fs Ver: 1.14.0 (2021-09-28)
Info: Disable heap-based policy
Info: Debug level = 0
Info: Label = atrf2fs
Info: Trim is enabled
Info: Host-managed zoned block device:
3688 zones, 0 randomly writeable zones
524288 blocks per zone
Info: Segments per section = 1024
Info: Sections per zone = 1
Info: sector size = 512
Info: total sectors = 15476981760 (7557120 MB)
Info: zone aligned segment0 blkaddr: 524288
Error: Conventional device /dev/nvme1n1 is too small, (14336 MiB needed).
Error: Failed to prepare a super block!!!
Error: Could not format the device!!!
If block size mismatch
atr@node1:/home/atr/src/f2fs-tools$ sudo /home/atr/local/sbin/mkfs.f2fs -l atrf2fs -f -m -c /dev/nvme1n2 /dev/nvme0n1
F2FS-tools: mkfs.f2fs Ver: 1.14.0 (2021-09-28)
Info: Disable heap-based policy
Info: Debug level = 0
Info: Label = atrf2fs
Info: Trim is enabled
Error: Different sector sizes!!!
Then I reformatted ns to 512 bytes, now
atr@node1:/home/atr/src/f2fs-tools$ sudo /home/atr/local/sbin/mkfs.f2fs -l atrf2fs -f -m -c /dev/nvme1n2 /dev/nvme0n1
F2FS-tools: mkfs.f2fs Ver: 1.14.0 (2021-09-28)
Info: Disable heap-based policy
Info: Debug level = 0
Info: Label = atrf2fs
Info: Trim is enabled
Info: Host-managed zoned block device:
3688 zones, 0 randomly writeable zones
524288 blocks per zone
/dev/nvme0n1 appears to contain an existing filesystem (ext4).
Info: Segments per section = 1024
Info: Sections per zone = 1
Info: sector size = 512
Info: total sectors = 16015595440 (7820114 MB)
Info: zone aligned segment0 blkaddr: 742134
Info: format version with
"Linux version 5.12.0+ (atr@node3) (gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0, GNU ld (GNU Binutils for Ubuntu) 2.34) #6 SMP Tue Apr 27 14:01:35 UTC 2021"
Info: [/dev/nvme0n1] Discarding device
Info: This device doesn't support BLKSECDISCARD
Info: Discarded 267090 MB
Info: [/dev/nvme1n2] Discarding device
Info: Discarded 7553024 MB
Info: Overprovision ratio = 2.290%
Info: Overprovision segments = 184707 (GC reserved = 97624)
Info: format successful
Then mount
atr@node1:/home/atr/src/f2fs-tools$ sudo mount -t f2fs /dev/nvme0n1 /home/atr/mnt-f2fs/