DTN benchmarks - shawfdong/hyades GitHub Wiki

dtn.ucsc.edu is UCSC's Data Transfer Node.

The RAID controller of dtn is a PERC H700 Integrated, as reported by:

# omreport storage controller

but is actually a rebranded LSI MegaRAID SAS 2108:

# lspci | grep RAID
02:00.0 RAID bus controller: LSI Logic / Symbios Logic MegaRAID SAS 2108 [Liberator] (rev 05)

Salient features of PERC H700 Integrated are:[1]

  • 512 MB non-volatile cache
  • PCI-Express Gen2.0 support
  • 6Gb/s SAS (SAS 2.0) host interface
  • RAID Levels: 0, 1, 5, 6, 10, 50, 60
Two virtual disks are configured on the RAID controller. One is a RAID-1 volume (/dev/sda), comprised of two 146 GB SAS drives; and the other is a RAID-6 volume (/dev/sdb), comprised of twelve (12x) 1TB SAS drives. The former is used for the OS. On the latter is created an XFS file system, which is mounted at /data.
$ df -h
Filesystem      Size  Used Avail Use% Mounted on
/dev/sda6       117G  1.6G  109G   2% /
tmpfs            12G     0   12G   0% /dev/shm
/dev/sda1       248M  101M  135M  43% /boot
/dev/sda5       3.9G   72M  3.6G   2% /tmp
/dev/sda2       9.7G  485M  8.7G   6% /var
/dev/sdb1       9.1T   11G  9.1T   1% /data

Here we mostly benchmark the IO performance of /data. The partition is almost empty, so we are measuring the near maximum IO performance of /data.

Table of Contents

dd

We start with the humble dd:[2]

$ cd /data/shaw
$ dd if=/dev/zero of=10GB bs=1M count=10240 
10240+0 records in
10240+0 records out
10737418240 bytes (11 GB) copied, 7.78806 s, 1.4 GB/s

The number here is a bit off the mark:

  1. The default behavior of dd is to not sync. The above command will just commit 10 GB of data into a RAM buffer (write cache) and exit
  2. The math is off too: 10 GB / 7.78806 s = 1.28 GB/s
  3. Even the corrected number exceeds the rated speed of SAS 2.0 (6Gb/s = 0.75 GB/s)
  4. We get the same result after we drop caches[3] as well
  5. Perhaps the inflated number is due to the non-volatile cache on the RAID controller?
Next we run dd with the option conv=fdatasync, which will cause dd to physically write output file data before finishing:
$ rm 10GB 
$ dd if=/dev/zero of=10GB bs=1M count=10240 conv=fdatasync
10240+0 records in
10240+0 records out
10737418240 bytes (11 GB) copied, 10.2523 s, 1.0 GB/s
The number is lower, but still exceeds the rated speed of SAS 2.0!

We also run dd with the option oflag=dsync, which will cause dd to use synchronized I/O for data:

$ rm 10GB 
$ dd if=/dev/zero of=10GB bs=1M count=10240 oflag=dsync
10240+0 records in
10240+0 records out
10737418240 bytes (11 GB) copied, 21.6774 s, 495 MB/s
Now the number is below the rated speed of SAS 2.0. However, the number may be artificially too low, because in this mode dd syncs every megabyte (bs).

Raising bs to 1GB may give a slightly more accurate number:

# echo 3 > /proc/sys/vm/drop_caches
$ dd if=/dev/zero of=10GB bs=1G count=10 oflag=dsync
10+0 records in
10+0 records out
10737418240 bytes (11 GB) copied, 20.2551 s, 530 MB/s

bonnie++

We next move on to more sophisticated file system benchmarking tools. Bonnie++ is a benchmark suite that is aimed at performing a number of simple tests of hard drive and file system performance.[4]

$ man bonnie++
$ bonnie++ -d /data/shaw -m RAID6 -q > RAID6.csv
$ cat RAID6.csv | bon_csv2html

Here are the results:

Version 1.96 Sequential Output Sequential Input Random
Seeks
Sequential Create Random Create
RAID6 Size Per Char Block Rewrite Per Char Block Num Files Create Read Delete Create Read Delete
K/sec % CPU K/sec % CPU K/sec % CPU K/sec % CPU K/sec % CPU /sec % CPU /sec % CPU /sec % CPU /sec % CPU /sec % CPU /sec % CPU /sec % CPU
47G 1265 99 1139579 96 423197 54 2467 99 1120794 66 632.3 42 16 29273 95 +++++ +++ +++++ +++ 2934 10 +++++ +++ 28754 98
Latency 9192us 211us 114ms 7606us 87079us 90117us Latency 220us 92us 3631us 211us 10us 213us

A few quick notes:

  1. Be default, bonnie++ uses datasets whose sizes are twice the amount of memory, in order to minimize the effect of file caching. The total memory of dtn is 24 GB.
  2. The block sequential write speed is 1139579 KB/s = 1.09 GB/s, in line with the number given by dd with the option conv=fdatasync.
  3. The block sequential rewrite speed is 423197 KB/s = 413 GB/s, in line with the number given by dd with the option oflag=dsync.
  4. Sequential read and write speeds are about the same.
  5. The partition delivers 632 IOPS (Random Seeks in the table). By comparison, a 10,000 SAS drive has ~140 IOPS; and an SSD has more than 5,000 IOPS.

IOzone

IOzone is another popular filesystem benchmark tool. The benchmark generates and measures a variety of file operations. It tests file I/O performance for the following operations: Read, write, re-read, re-write, read backwards, read strided, fread, fwrite, random read, pread ,mmap, aio_read, aio_writeiozone.[5]

By default, IOzone will automatically create temporary files of size from 64k to 512M, to perform various testing; and will generate a lot of data. Here we fix the test file size to 8 GB, which is perhaps more representative of scientific datasets:

$ cd /data/shaw
$ /opt/iozone/bin/iozone -h
$ /opt/iozone/bin/iozone -a -s 8g

Auto Mode
File size set to 50331648 KB
Command line used: /opt/iozone/bin/iozone -a -s 8g
Output is in Kbytes/sec
Time Resolution = 0.000001 seconds.
Processor cache size set to 1024 Kbytes.
Processor cache line size set to 32 bytes.
File stride size set to 17 * record size.

                                                            random  random    bkwd   record   stride                                   
     KB  reclen   write rewrite    read    reread    read   write    read  rewrite     read   fwrite frewrite   fread  freread
8388608       4 1278278 1792398  3782188  4172230 2645611  472406 3342890  4186844  3104421  1876873  1764100 4612878  4606743
8388608       8 1387521 1840763  5536001  6327749 4735639  412711 4649099  6173163  5031476  1891950  1734683 5539152  6215837
8388608      16 1530530 1824517  5532682  6631587 5572456  439025 5230515  6593869  5707732  1925902  1851019 5487075  5824362
8388608      32 1559452 1479965  4920313  5748630 5216838  448824 4996454  7515205  5326289  1949813  1822888 5227218  5746346
8388608      64 1620582 1904598  5526339  6666472 6341118  533976 5749484  8202099  6400482  1966015  1937222 5714279  6654383
8388608     128 1688951 1862371  4734829  4994547 4792736  531427 4598581  7931357  4862839  1493131  1818461 4727006  4986528
8388608     256 1639100 1423794  5205149  5898215 5833426  558253 5144740  6786322  5836876  1935167  1858435 5137160  5889321
8388608     512 1675583 1867929  5238974  5971130 5941256  634707 5176485  6750829  5942460  1909840  1876588 5438048  5963510
8388608    1024 1631046 1877555  4422876  4629597 4612647  772982 4273753  6457786  4615422  1880351  1833600 5501909  5977828
8388608    2048 1703468 1880169  4405730  4640178 4630253  997600 4282073  6710091  4630821  1790512  1795246 5242335  5995914
8388608    4096 1651548 1846247  4495717  4937829 4931508 1215636 4631677  6121315  4928341  1465070  1697145 5211998  5483348
8388608    8192 1530098 1873715  3196046  3294850 3289768 1144923 3976804  3338651  4269248  1810580  1698517 4021962  4102221
8388608   16384 1425880 1804983  3005383  3117677 3116822 1203485 3051333  2357394  3071841  1709233  1670472 3054361  3114826

Clearly, file caching distorts IOZone results! The numbers for write are generally higher than those given by dd and bonnie++; and the numbers for read are simply outrageous, at more than 2.87 GB/s!

References

  1. ^ DELL PERC H700 and H800 Technical Guide
  2. ^ How to use dd to benchmark your disk or CPU?
  3. ^ Drop Caches
  4. ^ Bonnie++ Documentation
  5. ^ 10 iozone Examples for Disk I/O Performance Measurement on Linux
⚠️ **GitHub.com Fallback** ⚠️