Coronis - shawfdong/hyades GitHub Wiki

Coronis (hostname: coronis.ucsc.edu) is a file server, acquired in January 2015 to expand the storage capacity for the Hyades cluster. Its hardware specifications are as follows:

Network Configurations
Subnet IP Address Netmask Gateway
Public 10GbE 128.114.126.228 255.255.255.224 128.114.126.238
Private GbE 10.6.7.4 255.255.0.0  
Private 10GbE 10.7.7.4 255.255.0.0  
IPMI 10.9.7.4 255.255.0.0  

Table of Contents

Installing FreeBSD on a mirrored ZFS root

The goal is to install the latest FreeBSD release on a root ZFS filesystem to be created on the two Intel SSDs. We'll closely follow the instructions in the FreeBSD wiki article Installing FreeBSD 9.0 (or later) Root on ZFS using GPT.

Notes:

  1. The instructions in Installing FreeBSD Root on ZFS (Mirror) using GPT are only applicable to FreeBSD 8.x; thus are outdated;
  2. There is no support for booting FreeBSD on a ZFS root under UEFI yet. So we'll boot under legacy BIOS; but we'll use GPT instead of MBR.

Downloading FreeBSD 10.1

As of January 2015, the latest version of FreeBSD is 10.1. We'll use a USB stick to install FreeBSD 10.1 onto Coronis.

Download FreeBSD-10.1-RELEASE-amd64-memstick.img (the memory stick image of FreeBSD 10.1 for amd64) from ftp://ftp.freebsd.org/pub/FreeBSD/releases/amd64/amd64/ISO-IMAGES/10.1/

Write the image to a USB stick. On my iMac (running OS X Maverick 10.9), I did the following:

$ diskutil list
$ diskutil umount /dev/disk2s2
$ sudo dd if=/dev/zero of=/dev/disk2 bs=64k count=10
$ sudo dd if=FreeBSD-10.1-RELEASE-amd64-memstick.img of=/dev/disk2 bs=64k

Creating a bootable ZFS filesystem

Boot FreeBSD from the USB stick.

Go through the initial setup as usual[1].

When the partitioning dialogue in bsdinstall comes up, choose the Shell option.

Create new GPT (GUID Partition Table) on the two Intel SSDs:

# camcontrol devlist | grep -i intel
<INTEL SSDSA2CW120G3 4PC10362>     at scbus2 target 0 lun 0 (ada0,pass17)
<INTEL SSDSA2CW120G3 4PC10362>     at scbus3 target 0 lun 0 (ada1,pass18)
# gpart destroy -F ada0
# gpart destroy -F ada1
# gpart create -s gpt ada0
# gpart create -s gpt ada1

Add partitions for the boot loader and swap then install the protective MBR and gptzfsboot boot loader. All partitions are aligned to 4k for optimal performance with advanced format drives.

# gpart add -s 222 -a 4k -t freebsd-boot -l boot0 ada0
# gpart add -s 8g -a 4k -t freebsd-swap -l swap0 ada0
# gpart add -a 4k -t freebsd-zfs -l disk0 ada0
# gpart bootcode -b /boot/pmbr -p /boot/gptzfsboot -i 1 ada0

# gpart add -s 222 -a 4k -t freebsd-boot -l boot1 ada1
# gpart add -s 8g -a 4k -t freebsd-swap -l swap1 ada1
# gpart add -a 4k -t freebsd-zfs -l disk1 ada1
# gpart bootcode -b /boot/pmbr -p /boot/gptzfsboot -i 1 ada1

Note: The Intel 320 Series SSDs don't appear to be advanced format drives:

# camcontrol identify ada0 | grep "sector size"
sector size           logical 512, physical 512, offset 0
but using a 4k sector size probably doesn't hurt.

Ensure ZFS uses the correct block size (value of 9 means 512 bytes sector size and value of 12 means 4096 bytes sector size):

# sysctl vfs.zfs.min_auto_ashift=12
vfs.zfs.min_auto_ashift: 9 -> 12

Load the necessary kernel modules:

# kldload zfs

Create the ZFS pool:

# zpool create -o altroot=/mnt -O canmount=off -m none zroot mirror /dev/gpt/disk0 /dev/gpt/disk1
This will create a zpool called zroot which will not be mounted. This zpool is only used to derive other file systems from.

Installing FreeBSD to the ZFS filesystem

Create ZFS filesystem hierarchy:

# zfs set checksum=fletcher4 zroot
# zfs set atime=off zroot

# zfs create -o mountpoint=none zroot/ROOT
# zfs create -o mountpoint=/ zroot/ROOT/default

# zfs create -o mountpoint=/home -o setuid=off zroot/home
# zfs create -o mountpoint=/tmp -o compression=lz4 -o setuid=off zroot/tmp
# chmod 1777 /mnt/tmp

# zfs create -o mountpoint=/usr zroot/usr
# zfs create zroot/usr/local
# zfs create zroot/usr/obj
# zfs create -o compression=lz4 -o setuid=off   zroot/usr/ports
# zfs create -o compression=off -o exec=off -o setuid=off zroot/usr/ports/distfiles
# zfs create -o compression=off -o exec=off -o setuid=off zroot/usr/ports/packages
# zfs create -o compression=lz4 -o exec=off -o setuid=off zroot/usr/src

# zfs create -o mountpoint=/var zroot/var
# zfs create -o compression=lz4 -o exec=off -o setuid=off zroot/var/crash
# zfs create -o exec=off -o setuid=off zroot/var/db
# zfs create -o compression=lz4 -o exec=on -o setuid=off zroot/var/db/pkg
# zfs create -o exec=off -o setuid=off zroot/var/empty
# zfs create -o compression=lz4 -o exec=off -o setuid=off zroot/var/log
# zfs create -o compression=gzip -o exec=off -o setuid=off zroot/var/mail
# zfs create -o exec=off -o setuid=off zroot/var/run
# zfs create -o compression=lz4 -o exec=on -o setuid=off zroot/var/tmp
# chmod 1777 /mnt/var/tmp

Notes:

  1. The fletcher4 checksum algorithm is more robust than the old default fletcher2 algorithm.
  2. Set atime off to avoid writing a metadata change every time a file is accessed, a serious performance penalty.
  3. Compression may be set to on, off, lzjb, gzip, gzip-N (where N is an integer from 1 (fastest) to 9 (best compression ratio); gzip is equivalent to gzip-6).
  4. On FreeBSD 8.4 and 9.2 or later, lz4 compression is also supported, providing the best trade-off (significantly faster compression and decompression with moderately higher compression ratios).
  5. During installation, the new root file system is mounted under /mnt.
Set the dataset to boot from:
# zpool set bootfs=zroot/ROOT/default zroot

Add the swap devices to fstab, so that they will automatically show up when the system starts:

# cat << EOF > /tmp/bsdinstall_etc/fstab
# Device        Mountpoint  FStype  Options  Dump  Pass#
/dev/gpt/swap0  none        swap    sw       0     0
/dev/gpt/swap1  none        swap    sw       0     0
EOF

Once all of the ZFS filesystems have been created, type exit in the shell and proceed with the installation as normal. When prompted to 'Choose the services you would like to be started at boot', I chose sshd, ntpd, powerd, dumpdev.

Once the installation is complete, choose Exit from the main menu.

The next dialogue will offer the option to 'open a shell in the new system to make any final manual modifications'. Select Yes.

Configure ZFS:

# mount -t devfs devfs /dev
# echo 'zfs_enable="YES"' >> /etc/rc.conf
# echo 'zfs_load="YES"' >> /boot/loader.conf
# echo "sysctl vfs.zfs.min_auto_ashift=12" >> /etc/sysctl.conf

Set read only on /var/empty, which is supposed to be empty at all times:

# zfs set readonly=on zroot/var/empty

To finish the installation, exit the shell, remove the USB stick and choose the Reboot option from the next dialogue.

Creating ZFS filesystems on the SAS HDDs

The next step was to create ZFS filesystems on the 61 4TB SAS hard drives, among which 16 are controlled by a LSI SAS 9207-8i Host Bus Adapter and 45 by a LSI SAS 9207-8e Host Bus Adapter.

List all drives:

# camcontrol devlist

Create a zpool on on the 61 4TB SAS hard drives:

# zpool create -m none zang \
    raidz1 da1 da2 da3 da4 da5 da6 da7 da8 da9 da10 da11 da12 da13 da14 da15 \
    raidz1 da16 da17 da18 da19 da20 da21 da22 da23 da24 da25 da26 da27 da28 da29 da30 \
    raidz1 da31 da32 da33 da34 da35 da36 da37 da38 da39 da40 da41 da42 da43 da44 da45 \
    raidz1 da46 da47 da48 da49 da50 da51 da52 da53 da54 da55 da56 da57 da58 da59 da60 \
    spare da0

Notes:

  1. Here I created 4 raidz1 VDEVs (virtual devices), each on 15 physical hard drives[2]
  2. I then created a zpool that stripes the 4 raidz1 VDEVs
  3. I used a physical hard drive as hot spare
  4. This setup should strike reasonable balance between IOPS performance and usable space[3]
Create a ZFS filesystem:
# zfs set checksum=fletcher4 zang
# zfs set atime=off zang
# zfs create -o mountpoint=/export/zang -o setuid=off zang/default
# chmod 1777 /export/zang

Network Configuration

Add the following stanza to /etc/rc.conf:

hostname="coronis.ucsc.edu"
ifconfig_ix0="inet 128.114.126.228 netmask 255.255.255.224"
ifconfig_ix1="inet 10.7.7.4 netmask 255.255.0.0 mtu 9000"
ifconfig_igb0="inet 10.6.7.4 netmask 255.255.0.0"
defaultrouter="128.114.126.238"

Apply the settings to the networking system:

# /etc/rc.d/netif restart && /etc/rc.d/routing restart

NFS

Configuring NFS Server

To enable starting NFS server at boot time, add these options to /etc/rc.conf[4]:

rpcbind_enable="YES"
nfs_server_enable="YES"
nfs_server_flags="-u -t -n 256"
Ambrosia will serve NFS in both UDP and TCP transports using 256 daemons (nfsd -u -t -n 256).

Start the NFS server:

# service nfsd start

Exporting the ZFS filesystems over NFS

The goal was to export the ZFS filesystem, /export/zang, over NFS to both of the private subnets in the Hyades cluster:

  1. to private GbE subnet 10.6.0.0/16, with root_squash
  2. to private 10GbE subnet 10.7.0.0/16, with no_root_squash
One can use the zfs set sharenfs command to export ZFS filesystems over NFS. However, there is no clean way to export a ZFS filesystem to 2 subnets using the command. I use the following hack.

Export the ZFS filesystems to the private 10GbE subnet:

# zfs set sharenfs="-maproot=root -network=10.7.0.0/16" zang/default
which produced the following /etc/zfs/exports:
# !!! DO NOT EDIT THIS FILE MANUALLY !!!

/export/zang    -maproot=root -network=10.7.0.0/16

The shares are instantaneously exported to the private 10GbE subnet, with no_root_squash (-maproot=root). There is no need to reload/restart mountd.

Manually create /etc/exports:

/export/zang	-maproot=nobody -network=10.6.0.0/16 
The shares will be exported to the private GbE subnet, with root_squash (-maproot=nobody). However, to make the change to take effect immediately, we have to force mountd to reread /etc/exports:
# service mountd onereload

Let's test it:

# showmount -e
Exports list on localhost:
/export/zang                       10.6.0.0 10.7.0.0

ZFS tunings

The combination of ZFS and NFS stresses the ZIL to the point that performance falls significantly below expected levels[5]. Let's disable ZIL on the exported datasets:

# zfs set sync=disabled zang/default

Linux NFS clients

On FreeBSD, the NFS file locking daemon, rpc.lockd, provides monitored and unmonitored file and record locking services in an NFS environment. It typically operates in conjunction with rpc.statd, to monitor the status of hosts requesting locks. Those 2 daemons are optional, and disabled by default. Note that a sane NFS server requires at least 3 services running: rpcbind, mountd and nfsd.

We use the noatime,nosuid,nolock option to mount /export/zang on Linux:

# mkdir /zang
# mount -t nfs -o noatime,nosuid,nolock 10.7.7.4:/export/zang /zang

Combined with the defaults, the above command results in the following mount options:

# grep zang /proc/mounts 
10.7.7.4:/export/zang /zang nfs rw,nosuid,noatime,vers=3,rsize=65536,wsize=65536,namlen=255,hard,nolock,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=10.7.7.4,mountvers=3,mountport=684,mountproto=udp,local_lock=all,addr=10.7.7.4 0 0

Firewall

We run the PF firewall on Coronis[6].

To configure the system to start PF at boot time, add the following to /etc/rc.conf:

pf_enable="YES"

Create PF rulesets in /etc/pf.conf:

# Options
ext_if = "ix0" # macro for external interface
int_if = "{ igb0, ix1, lo0 }" # macro for internal interfaces
# reserved IPV4 addresses
martians = "0.0.0.0/8, 10.0.0.0/8, 100.64.0.0/10, \
    127.0.0.0/8, 169.254.0.0/16, 172.16.0.0/12, \
    192.0.0.0/24, 192.0.2.0/24, 192.88.99.0/24, \
    192.168.0.0/16, 198.18.0.0/15, 198.51.100.0/24, \
    203.0.113.0/24, 224.0.0.0/4, 240.0.0.0/4, \
    255.255.255.255/32"
set block-policy drop
set skip on $int_if

# Normalization
scrub in all fragment reassemble no-df max-mss 1440

# Queueing

# Translation

# Filtering
antispoof quick for ($ext_if) inet
block out quick inet6 all
block in quick inet6 all
block in quick from { $martians urpf-failed no-route } to any
block in all
pass out quick on $ext_if inet keep state
table <bruteforce> persist
block quick from <bruteforce>
pass in on $ext_if inet proto tcp to any port ssh \
    flags S/SA keep state \
    (max-src-conn 5, max-src-conn-rate 5/5, \
     overload <bruteforce> flush global)
pass inet proto icmp icmp-type echoreq

Start the firewall:

# service pf start

Add the following lines to /etc/crontab:

#
# Remove <bruteforce> table entries in PF that have not been referenced for a day
10      1       *       *       *       root    pfctl -t bruteforce -T expire 86400

Restart crond:

# service cron start

Maintenance

SSH

To disable password authentication in SSH, use the following options to /etc/ssh/sshd_config:

PasswordAuthentication no
ChallengeResponseAuthentication no
PermitRootLogin without-password
and restart sshd:
# service sshd restart

freebsd-update

Applying Security Patches[7]

# freebsd-update fetch
# freebsd-update install
# shutdown -r now

Upgrading to FreeBSD 10.2

We upgraded Coronis to FreeBSD 10.2 in Nov 2015.

Upgrade Coronis from FreeBSD 10.1 to FreeBSD 10.2:

# freebsd-update -r 10.2-RELEASE upgrade
# freebsd-update install

Reboot the machine:

# shutdown -r now

Restart freebsd-update:

# freebsd-update install

See Also

References

  1. ^ Installing FreeBSD 9.X and Later
  2. ^ ZFS Administration - VDEVs
  3. ^ ZFS RAIDZ stripe width
  4. ^ FreeBSD Handbook - Network File System (NFS)
  5. ^ ZFS Tuning Guide
  6. ^ FreeBSD Handbook - PF
  7. ^ FreeBSD Handbook - Updating and Upgrading FreeBSD
⚠️ **GitHub.com Fallback** ⚠️