Coronis - shawfdong/hyades GitHub Wiki
Coronis (hostname: coronis.ucsc.edu) is a file server, acquired in January 2015 to expand the storage capacity for the Hyades cluster. Its hardware specifications are as follows:
- Two quad-core Intel Xeon E5-2609V2 CPUs at 2.5GHz
- 192GB (12 x 16GB) RAM at 1600MHz
- Two 120GB Intel 320 Series SSDs
- 61 Seagate ST4000NM0023 4TB SAS 6GB/s nearline hard drives at 7200RPM, with
- 16 in a 3U Supermicro Chassis 836E16-R920B, controlled by a LSI SAS 9207-8i Host Bus Adapter
- 45 in a 4U Supermicro JBOD 847E16, controlled by a LSI SAS 9207-8e Host Bus Adapter
- Intel 10 Gigabit Network Adapter X520-DA2 (E10G42BTDA), with dual SFP+ ports
- Intel I350 Gigabit Ethernet Controller, with dual 1000Base-T ports
Subnet | IP Address | Netmask | Gateway |
---|---|---|---|
Public 10GbE | 128.114.126.228 | 255.255.255.224 | 128.114.126.238 |
Private GbE | 10.6.7.4 | 255.255.0.0 | |
Private 10GbE | 10.7.7.4 | 255.255.0.0 | |
IPMI | 10.9.7.4 | 255.255.0.0 |
The goal is to install the latest FreeBSD release on a root ZFS filesystem to be created on the two Intel SSDs. We'll closely follow the instructions in the FreeBSD wiki article Installing FreeBSD 9.0 (or later) Root on ZFS using GPT.
Notes:
- The instructions in Installing FreeBSD Root on ZFS (Mirror) using GPT are only applicable to FreeBSD 8.x; thus are outdated;
- There is no support for booting FreeBSD on a ZFS root under UEFI yet. So we'll boot under legacy BIOS; but we'll use GPT instead of MBR.
As of January 2015, the latest version of FreeBSD is 10.1. We'll use a USB stick to install FreeBSD 10.1 onto Coronis.
Download FreeBSD-10.1-RELEASE-amd64-memstick.img (the memory stick image of FreeBSD 10.1 for amd64) from ftp://ftp.freebsd.org/pub/FreeBSD/releases/amd64/amd64/ISO-IMAGES/10.1/
Write the image to a USB stick. On my iMac (running OS X Maverick 10.9), I did the following:
$ diskutil list $ diskutil umount /dev/disk2s2 $ sudo dd if=/dev/zero of=/dev/disk2 bs=64k count=10 $ sudo dd if=FreeBSD-10.1-RELEASE-amd64-memstick.img of=/dev/disk2 bs=64k
Boot FreeBSD from the USB stick.
Go through the initial setup as usual[1].
When the partitioning dialogue in bsdinstall comes up, choose the Shell option.
Create new GPT (GUID Partition Table) on the two Intel SSDs:
# camcontrol devlist | grep -i intel <INTEL SSDSA2CW120G3 4PC10362> at scbus2 target 0 lun 0 (ada0,pass17) <INTEL SSDSA2CW120G3 4PC10362> at scbus3 target 0 lun 0 (ada1,pass18) # gpart destroy -F ada0 # gpart destroy -F ada1 # gpart create -s gpt ada0 # gpart create -s gpt ada1
Add partitions for the boot loader and swap then install the protective MBR and gptzfsboot boot loader. All partitions are aligned to 4k for optimal performance with advanced format drives.
# gpart add -s 222 -a 4k -t freebsd-boot -l boot0 ada0 # gpart add -s 8g -a 4k -t freebsd-swap -l swap0 ada0 # gpart add -a 4k -t freebsd-zfs -l disk0 ada0 # gpart bootcode -b /boot/pmbr -p /boot/gptzfsboot -i 1 ada0 # gpart add -s 222 -a 4k -t freebsd-boot -l boot1 ada1 # gpart add -s 8g -a 4k -t freebsd-swap -l swap1 ada1 # gpart add -a 4k -t freebsd-zfs -l disk1 ada1 # gpart bootcode -b /boot/pmbr -p /boot/gptzfsboot -i 1 ada1
Note: The Intel 320 Series SSDs don't appear to be advanced format drives:
# camcontrol identify ada0 | grep "sector size" sector size logical 512, physical 512, offset 0but using a 4k sector size probably doesn't hurt.
Ensure ZFS uses the correct block size (value of 9 means 512 bytes sector size and value of 12 means 4096 bytes sector size):
# sysctl vfs.zfs.min_auto_ashift=12 vfs.zfs.min_auto_ashift: 9 -> 12
Load the necessary kernel modules:
# kldload zfs
Create the ZFS pool:
# zpool create -o altroot=/mnt -O canmount=off -m none zroot mirror /dev/gpt/disk0 /dev/gpt/disk1This will create a zpool called zroot which will not be mounted. This zpool is only used to derive other file systems from.
Create ZFS filesystem hierarchy:
# zfs set checksum=fletcher4 zroot # zfs set atime=off zroot # zfs create -o mountpoint=none zroot/ROOT # zfs create -o mountpoint=/ zroot/ROOT/default # zfs create -o mountpoint=/home -o setuid=off zroot/home # zfs create -o mountpoint=/tmp -o compression=lz4 -o setuid=off zroot/tmp # chmod 1777 /mnt/tmp # zfs create -o mountpoint=/usr zroot/usr # zfs create zroot/usr/local # zfs create zroot/usr/obj # zfs create -o compression=lz4 -o setuid=off zroot/usr/ports # zfs create -o compression=off -o exec=off -o setuid=off zroot/usr/ports/distfiles # zfs create -o compression=off -o exec=off -o setuid=off zroot/usr/ports/packages # zfs create -o compression=lz4 -o exec=off -o setuid=off zroot/usr/src # zfs create -o mountpoint=/var zroot/var # zfs create -o compression=lz4 -o exec=off -o setuid=off zroot/var/crash # zfs create -o exec=off -o setuid=off zroot/var/db # zfs create -o compression=lz4 -o exec=on -o setuid=off zroot/var/db/pkg # zfs create -o exec=off -o setuid=off zroot/var/empty # zfs create -o compression=lz4 -o exec=off -o setuid=off zroot/var/log # zfs create -o compression=gzip -o exec=off -o setuid=off zroot/var/mail # zfs create -o exec=off -o setuid=off zroot/var/run # zfs create -o compression=lz4 -o exec=on -o setuid=off zroot/var/tmp # chmod 1777 /mnt/var/tmp
Notes:
- The fletcher4 checksum algorithm is more robust than the old default fletcher2 algorithm.
- Set atime off to avoid writing a metadata change every time a file is accessed, a serious performance penalty.
- Compression may be set to on, off, lzjb, gzip, gzip-N (where N is an integer from 1 (fastest) to 9 (best compression ratio); gzip is equivalent to gzip-6).
- On FreeBSD 8.4 and 9.2 or later, lz4 compression is also supported, providing the best trade-off (significantly faster compression and decompression with moderately higher compression ratios).
- During installation, the new root file system is mounted under /mnt.
# zpool set bootfs=zroot/ROOT/default zroot
Add the swap devices to fstab, so that they will automatically show up when the system starts:
# cat << EOF > /tmp/bsdinstall_etc/fstab # Device Mountpoint FStype Options Dump Pass# /dev/gpt/swap0 none swap sw 0 0 /dev/gpt/swap1 none swap sw 0 0 EOF
Once all of the ZFS filesystems have been created, type exit in the shell and proceed with the installation as normal. When prompted to 'Choose the services you would like to be started at boot', I chose sshd, ntpd, powerd, dumpdev.
Once the installation is complete, choose Exit from the main menu.
The next dialogue will offer the option to 'open a shell in the new system to make any final manual modifications'. Select Yes.
Configure ZFS:
# mount -t devfs devfs /dev # echo 'zfs_enable="YES"' >> /etc/rc.conf # echo 'zfs_load="YES"' >> /boot/loader.conf # echo "sysctl vfs.zfs.min_auto_ashift=12" >> /etc/sysctl.conf
Set read only on /var/empty, which is supposed to be empty at all times:
# zfs set readonly=on zroot/var/empty
To finish the installation, exit the shell, remove the USB stick and choose the Reboot option from the next dialogue.
The next step was to create ZFS filesystems on the 61 4TB SAS hard drives, among which 16 are controlled by a LSI SAS 9207-8i Host Bus Adapter and 45 by a LSI SAS 9207-8e Host Bus Adapter.
List all drives:
# camcontrol devlist
Create a zpool on on the 61 4TB SAS hard drives:
# zpool create -m none zang \ raidz1 da1 da2 da3 da4 da5 da6 da7 da8 da9 da10 da11 da12 da13 da14 da15 \ raidz1 da16 da17 da18 da19 da20 da21 da22 da23 da24 da25 da26 da27 da28 da29 da30 \ raidz1 da31 da32 da33 da34 da35 da36 da37 da38 da39 da40 da41 da42 da43 da44 da45 \ raidz1 da46 da47 da48 da49 da50 da51 da52 da53 da54 da55 da56 da57 da58 da59 da60 \ spare da0
Notes:
- Here I created 4 raidz1 VDEVs (virtual devices), each on 15 physical hard drives[2]
- I then created a zpool that stripes the 4 raidz1 VDEVs
- I used a physical hard drive as hot spare
- This setup should strike reasonable balance between IOPS performance and usable space[3]
# zfs set checksum=fletcher4 zang # zfs set atime=off zang # zfs create -o mountpoint=/export/zang -o setuid=off zang/default # chmod 1777 /export/zang
Add the following stanza to /etc/rc.conf:
hostname="coronis.ucsc.edu" ifconfig_ix0="inet 128.114.126.228 netmask 255.255.255.224" ifconfig_ix1="inet 10.7.7.4 netmask 255.255.0.0 mtu 9000" ifconfig_igb0="inet 10.6.7.4 netmask 255.255.0.0" defaultrouter="128.114.126.238"
Apply the settings to the networking system:
# /etc/rc.d/netif restart && /etc/rc.d/routing restart
To enable starting NFS server at boot time, add these options to /etc/rc.conf[4]:
rpcbind_enable="YES" nfs_server_enable="YES" nfs_server_flags="-u -t -n 256"Ambrosia will serve NFS in both UDP and TCP transports using 256 daemons (nfsd -u -t -n 256).
Start the NFS server:
# service nfsd start
The goal was to export the ZFS filesystem, /export/zang, over NFS to both of the private subnets in the Hyades cluster:
- to private GbE subnet 10.6.0.0/16, with root_squash
- to private 10GbE subnet 10.7.0.0/16, with no_root_squash
Export the ZFS filesystems to the private 10GbE subnet:
# zfs set sharenfs="-maproot=root -network=10.7.0.0/16" zang/defaultwhich produced the following /etc/zfs/exports:
# !!! DO NOT EDIT THIS FILE MANUALLY !!! /export/zang -maproot=root -network=10.7.0.0/16
The shares are instantaneously exported to the private 10GbE subnet, with no_root_squash (-maproot=root). There is no need to reload/restart mountd.
Manually create /etc/exports:
/export/zang -maproot=nobody -network=10.6.0.0/16The shares will be exported to the private GbE subnet, with root_squash (-maproot=nobody). However, to make the change to take effect immediately, we have to force mountd to reread /etc/exports:
# service mountd onereload
Let's test it:
# showmount -e Exports list on localhost: /export/zang 10.6.0.0 10.7.0.0
The combination of ZFS and NFS stresses the ZIL to the point that performance falls significantly below expected levels[5]. Let's disable ZIL on the exported datasets:
# zfs set sync=disabled zang/default
On FreeBSD, the NFS file locking daemon, rpc.lockd, provides monitored and unmonitored file and record locking services in an NFS environment. It typically operates in conjunction with rpc.statd, to monitor the status of hosts requesting locks. Those 2 daemons are optional, and disabled by default. Note that a sane NFS server requires at least 3 services running: rpcbind, mountd and nfsd.
We use the noatime,nosuid,nolock option to mount /export/zang on Linux:
# mkdir /zang # mount -t nfs -o noatime,nosuid,nolock 10.7.7.4:/export/zang /zang
Combined with the defaults, the above command results in the following mount options:
# grep zang /proc/mounts 10.7.7.4:/export/zang /zang nfs rw,nosuid,noatime,vers=3,rsize=65536,wsize=65536,namlen=255,hard,nolock,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=10.7.7.4,mountvers=3,mountport=684,mountproto=udp,local_lock=all,addr=10.7.7.4 0 0
We run the PF firewall on Coronis[6].
To configure the system to start PF at boot time, add the following to /etc/rc.conf:
pf_enable="YES"
Create PF rulesets in /etc/pf.conf:
# Options ext_if = "ix0" # macro for external interface int_if = "{ igb0, ix1, lo0 }" # macro for internal interfaces # reserved IPV4 addresses martians = "0.0.0.0/8, 10.0.0.0/8, 100.64.0.0/10, \ 127.0.0.0/8, 169.254.0.0/16, 172.16.0.0/12, \ 192.0.0.0/24, 192.0.2.0/24, 192.88.99.0/24, \ 192.168.0.0/16, 198.18.0.0/15, 198.51.100.0/24, \ 203.0.113.0/24, 224.0.0.0/4, 240.0.0.0/4, \ 255.255.255.255/32" set block-policy drop set skip on $int_if # Normalization scrub in all fragment reassemble no-df max-mss 1440 # Queueing # Translation # Filtering antispoof quick for ($ext_if) inet block out quick inet6 all block in quick inet6 all block in quick from { $martians urpf-failed no-route } to any block in all pass out quick on $ext_if inet keep state table <bruteforce> persist block quick from <bruteforce> pass in on $ext_if inet proto tcp to any port ssh \ flags S/SA keep state \ (max-src-conn 5, max-src-conn-rate 5/5, \ overload <bruteforce> flush global) pass inet proto icmp icmp-type echoreq
Start the firewall:
# service pf start
Add the following lines to /etc/crontab:
# # Remove <bruteforce> table entries in PF that have not been referenced for a day 10 1 * * * root pfctl -t bruteforce -T expire 86400
Restart crond:
# service cron start
To disable password authentication in SSH, use the following options to /etc/ssh/sshd_config:
PasswordAuthentication no ChallengeResponseAuthentication no PermitRootLogin without-passwordand restart sshd:
# service sshd restart
Applying Security Patches[7]
# freebsd-update fetch # freebsd-update install # shutdown -r now
We upgraded Coronis to FreeBSD 10.2 in Nov 2015.
Upgrade Coronis from FreeBSD 10.1 to FreeBSD 10.2:
# freebsd-update -r 10.2-RELEASE upgrade # freebsd-update install
Reboot the machine:
# shutdown -r now
Restart freebsd-update:
# freebsd-update install