ZFS FAQ - MidnightBSD/src GitHub Wiki

How do I remount a filesystem in read/write mode? zfs set readonly=off zroot

How do I mount all filesystems? zfs mount -a

What can cause some filesystems not to mount at startup?

There are a few possibilities.

  1. canmount=off property is set on the filesystem
  2. The parent directory has an empty folder in it. e.g. tank/foo/bar where tank/foo/ has a folder called bar in it when tank/foo is mouunted. This would block tank/foo/bar to mount.

Recovering EFI boot partition

If your boot loader gets corrupted, you can boot off of MidnightBSD install media for your release and do the following from the live cd functionality.

(replace nvd0 with your disk. EFI should be on the first partition)

(more destructive, can wipe custom configurations here) gpart bootcode -p /boot/boot1.efifat -i 1 nvd0

In some cases, you can simply copy the loader.efi (after mounting it) mount_msdosfs /dev/nvd0p1 /mnt cp /boot/loader.efi /mnt/efi/boot/BOOTx64.efi

ZFS not mounting all partitions at boot

When upgrading from 3.2.x to 4.x, sometimes zpool won't mount.

The default location of /boot/zfs/zpool.cache and /etc/zfs/zpool.cache changed. This is often the root cause.

Compare the two files using zdb

zdb -CU /etc/zfs/zpool.cache
zdb -CU /boot/zfs/zpool.cache

You can sometimes fix this by forcing the pool to use the new location:

zpool set cachefile=/etc/zfs/zpool.cache  tank

See also https://lists.freebsd.org/pipermail/freebsd-questions/2021-April/293823.html

NFS

Setup NFS shares

zfs set sharenfs="-maproot=0 -network 192.168.0.0/24" zroot/my/path

service mountd restart

Verify configuration

zfs get sharenfs zroot/my/path

Note that any entries from sharenfs commands get placed in /etc/zfs/exports

Setting up multiple subnets

Note the new lines INSIDE the quotes

zfs set sharenfs="-maproot=root -network 10.0.0.0/24 
> /path/to/mountpoint -maproot=root -network 192.168.0.0/24 
> /path/to/mountpoint -maproot=root -network 172.16.0.0/24" pool0/space

Setup a ZFS mirror on existing zfs root

See this page https://dan.langille.org/2019/10/15/creating-a-mirror-from-your-zroot/

Effectively, you need to replicate the partition table using gpart backup / restore, add the zfs partition to the pool, and set it up as bootable either with the mnbsd-boot partition or the efi partition (assumes gpt) as necessary.

MidnightBSD partition types are mnbsd-boot and mnbsd-zfs on GPT.

Tuning

i386 Tuning

Typically, you need to increase vm.kmem_size_max and vm.kmem_size (with vm.kmem_size_max >= vm.kmem_size) to avoid kernel panics. If you need to extend them beyond 512M, you need to recompile your kernel with increased KVA_PAGES in kernel config for vm.kmem_size beyond 1 GB:

options KVA_PAGES=512

By default, the kernel receives 1 GB of the 4 GB of address space available on the i386 architecture, and this is used for all of the kernel address space needs, not just the kmem map. By increasing KVA_PAGES, you can allocate a larger proportion of the 4 GB address space to the kernel (2 GB in the above example), allowing more room to increase vm.kmem_size. The trade-off is that user applications have less address space, and some programs may no longer run. If you change KVA_PAGES and the system reboots (no panic) after running for a while, this may be because the address space for userland applications is too small.

Example configuration for 768MB physical RAM:

in /boot/loader.conf

vm.kmem_size="330M"
vm.kmem_size_max="330M"
vfs.zfs.arc_max="40M"
vfs.zfs.vdev.cache.size="5M"

amd64

No tuning should be necessary when the system has more than 2GB of RAM.

General ARC advice

The value for vfs.zfs.arc_max needs to be smaller than the value for vm.kmem_size.

To improve the random read performance, a separate L2ARC device can be used (zpool add cache ). A cheap solution is to add a USB memory stick (see http://www.leidinger.net/blog/2010/02/10/making-zfs-faster/). Be sure to use a high-quality drive. The high-performance solution is to add an SSD, preferably an NVMe drive. Optane is particularly useful for this workload due to its high write endurance and performance characteristics. L2ARC devices should be faster and/or have less latency than the storage pool.

Using a L2ARC device will increase the memory ZFS needs to allocate, see http://www.mail-archive.com/[email protected]/msg34674.html for more info.

By default, the L2ARC does not attempt to cache prefetched or streaming workloads. Most data of this type is sequential. The combined throughput of your pool disks exceeds the throughput of the L2ARC devices. If you believe otherwise (number of L2ARC devices X their max throughput > number of pool disks X their max throughput), then this can be changed with the following sysctl:

vfs.zfs.l2arc_noprefetch

The default value of 1 does not allow caching of streaming and/or sequential workloads and will not read from L2ARC when prefetching blocks. Switching it to 0 will allow prefetched/streaming reads to be cached. This may significantly improve performance if you store many small files in a large directory hierarchy (since many metadata blocks are read via the prefetcher and would ordinarily always be read from pool disks).

Tuning for PostgreSQL

Many guides recommend 8K or 16K for record size with PostgreSQL. Some benchmarks show better results with 16K, but most favor 8K, as that aligns with default block sizes in PG.

zfs set recordsize=8K tank/usr/local/pgsql

zfs set primarycache=metadata  tank/usr/local/pgsql

zfs set logbias=throughput  tank/usr/local/pgsql

zfs set redundant_metadata=most  tank/usr/local/pgsql

Tuning for large files

Setting a larger recordsize can benefit large files such as videos.

zfs set recordsize=1M tank/media/video

Tuning for Virtual Machine storage

For Virtual machines, it's important to write data reliably to prevent damage during power off or panics. (e.g. with bhyve VMs) Set the sync property on the file system in question.

zfs set sync=always tank/filesystem

A zil slog will not help much with this configuration. Further, it's essential to make sure that there is a lot of free space in your pool to improve write performance. You're better off with a much larger, slow disk than a fast, small one for writing large files like VMs.

See https://www.truenas.com/community/threads/the-path-to-success-for-block-storage.81165/

Additionally, using primarycache=metadata property can help with performance, but only if the zfs blocksize matches what the VM is using. Otherwise, there can be a performance penalty due to alignment issues and writes.

When to use SLOG

Suitable for frequent small file writes.

Be sure to mirror it. Don't use QLC SSDs.

See https://klarasystems.com/articles/what-makes-a-good-time-to-use-openzfs-slog-and-when-should-you-avoid-it/

Deduplication

It's usually a bad idea to use deduplication unless you have a lot of RAM or a very small file system. The minimum is usually around 5GB of RAM per 1TB of disk.

ZFS send / receive

To receive data as a non-root user, myuser:

zfs allow myuser mount,create,receive vm/vm

To send, first snapshot, then send

zfs snapshot vm/vm/m3164b@luke
zfs send vm/vm/m3164b@luke | ssh myuser@myhost zfs recv vm/vm/m3164
⚠️ **GitHub.com Fallback** ⚠️