DTN benchmarks - shawfdong/hyades GitHub Wiki
dtn.ucsc.edu is UCSC's Data Transfer Node.
The RAID controller of dtn is a PERC H700 Integrated, as reported by:
# omreport storage controller
but is actually a rebranded LSI MegaRAID SAS 2108:
# lspci | grep RAID 02:00.0 RAID bus controller: LSI Logic / Symbios Logic MegaRAID SAS 2108 [Liberator] (rev 05)
Salient features of PERC H700 Integrated are:[1]
- 512 MB non-volatile cache
- PCI-Express Gen2.0 support
- 6Gb/s SAS (SAS 2.0) host interface
- RAID Levels: 0, 1, 5, 6, 10, 50, 60
$ df -h Filesystem Size Used Avail Use% Mounted on /dev/sda6 117G 1.6G 109G 2% / tmpfs 12G 0 12G 0% /dev/shm /dev/sda1 248M 101M 135M 43% /boot /dev/sda5 3.9G 72M 3.6G 2% /tmp /dev/sda2 9.7G 485M 8.7G 6% /var /dev/sdb1 9.1T 11G 9.1T 1% /data
Here we mostly benchmark the IO performance of /data. The partition is almost empty, so we are measuring the near maximum IO performance of /data.
We start with the humble dd:[2]
$ cd /data/shaw $ dd if=/dev/zero of=10GB bs=1M count=10240 10240+0 records in 10240+0 records out 10737418240 bytes (11 GB) copied, 7.78806 s, 1.4 GB/s
The number here is a bit off the mark:
- The default behavior of dd is to not sync. The above command will just commit 10 GB of data into a RAM buffer (write cache) and exit
- The math is off too: 10 GB / 7.78806 s = 1.28 GB/s
- Even the corrected number exceeds the rated speed of SAS 2.0 (6Gb/s = 0.75 GB/s)
- We get the same result after we drop caches[3] as well
- Perhaps the inflated number is due to the non-volatile cache on the RAID controller?
$ rm 10GB $ dd if=/dev/zero of=10GB bs=1M count=10240 conv=fdatasync 10240+0 records in 10240+0 records out 10737418240 bytes (11 GB) copied, 10.2523 s, 1.0 GB/sThe number is lower, but still exceeds the rated speed of SAS 2.0!
We also run dd with the option oflag=dsync, which will cause dd to use synchronized I/O for data:
$ rm 10GB $ dd if=/dev/zero of=10GB bs=1M count=10240 oflag=dsync 10240+0 records in 10240+0 records out 10737418240 bytes (11 GB) copied, 21.6774 s, 495 MB/sNow the number is below the rated speed of SAS 2.0. However, the number may be artificially too low, because in this mode dd syncs every megabyte (bs).
Raising bs to 1GB may give a slightly more accurate number:
# echo 3 > /proc/sys/vm/drop_caches $ dd if=/dev/zero of=10GB bs=1G count=10 oflag=dsync 10+0 records in 10+0 records out 10737418240 bytes (11 GB) copied, 20.2551 s, 530 MB/s
We next move on to more sophisticated file system benchmarking tools. Bonnie++ is a benchmark suite that is aimed at performing a number of simple tests of hard drive and file system performance.[4]
$ man bonnie++ $ bonnie++ -d /data/shaw -m RAID6 -q > RAID6.csv $ cat RAID6.csv | bon_csv2html
Here are the results:
Version 1.96 | Sequential Output | Sequential Input |
Random Seeks |
Sequential Create | Random Create | ||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
RAID6 | Size | Per Char | Block | Rewrite | Per Char | Block | Num Files | Create | Read | Delete | Create | Read | Delete | ||||||||||||
K/sec | % CPU | K/sec | % CPU | K/sec | % CPU | K/sec | % CPU | K/sec | % CPU | /sec | % CPU | /sec | % CPU | /sec | % CPU | /sec | % CPU | /sec | % CPU | /sec | % CPU | /sec | % CPU | ||
47G | 1265 | 99 | 1139579 | 96 | 423197 | 54 | 2467 | 99 | 1120794 | 66 | 632.3 | 42 | 16 | 29273 | 95 | +++++ | +++ | +++++ | +++ | 2934 | 10 | +++++ | +++ | 28754 | 98 |
Latency | 9192us | 211us | 114ms | 7606us | 87079us | 90117us | Latency | 220us | 92us | 3631us | 211us | 10us | 213us |
A few quick notes:
- Be default, bonnie++ uses datasets whose sizes are twice the amount of memory, in order to minimize the effect of file caching. The total memory of dtn is 24 GB.
- The block sequential write speed is 1139579 KB/s = 1.09 GB/s, in line with the number given by dd with the option conv=fdatasync.
- The block sequential rewrite speed is 423197 KB/s = 413 GB/s, in line with the number given by dd with the option oflag=dsync.
- Sequential read and write speeds are about the same.
- The partition delivers 632 IOPS (Random Seeks in the table). By comparison, a 10,000 SAS drive has ~140 IOPS; and an SSD has more than 5,000 IOPS.
IOzone is another popular filesystem benchmark tool. The benchmark generates and measures a variety of file operations. It tests file I/O performance for the following operations: Read, write, re-read, re-write, read backwards, read strided, fread, fwrite, random read, pread ,mmap, aio_read, aio_writeiozone.[5]
By default, IOzone will automatically create temporary files of size from 64k to 512M, to perform various testing; and will generate a lot of data. Here we fix the test file size to 8 GB, which is perhaps more representative of scientific datasets:
$ cd /data/shaw $ /opt/iozone/bin/iozone -h $ /opt/iozone/bin/iozone -a -s 8g Auto Mode File size set to 50331648 KB Command line used: /opt/iozone/bin/iozone -a -s 8g Output is in Kbytes/sec Time Resolution = 0.000001 seconds. Processor cache size set to 1024 Kbytes. Processor cache line size set to 32 bytes. File stride size set to 17 * record size. random random bkwd record stride KB reclen write rewrite read reread read write read rewrite read fwrite frewrite fread freread 8388608 4 1278278 1792398 3782188 4172230 2645611 472406 3342890 4186844 3104421 1876873 1764100 4612878 4606743 8388608 8 1387521 1840763 5536001 6327749 4735639 412711 4649099 6173163 5031476 1891950 1734683 5539152 6215837 8388608 16 1530530 1824517 5532682 6631587 5572456 439025 5230515 6593869 5707732 1925902 1851019 5487075 5824362 8388608 32 1559452 1479965 4920313 5748630 5216838 448824 4996454 7515205 5326289 1949813 1822888 5227218 5746346 8388608 64 1620582 1904598 5526339 6666472 6341118 533976 5749484 8202099 6400482 1966015 1937222 5714279 6654383 8388608 128 1688951 1862371 4734829 4994547 4792736 531427 4598581 7931357 4862839 1493131 1818461 4727006 4986528 8388608 256 1639100 1423794 5205149 5898215 5833426 558253 5144740 6786322 5836876 1935167 1858435 5137160 5889321 8388608 512 1675583 1867929 5238974 5971130 5941256 634707 5176485 6750829 5942460 1909840 1876588 5438048 5963510 8388608 1024 1631046 1877555 4422876 4629597 4612647 772982 4273753 6457786 4615422 1880351 1833600 5501909 5977828 8388608 2048 1703468 1880169 4405730 4640178 4630253 997600 4282073 6710091 4630821 1790512 1795246 5242335 5995914 8388608 4096 1651548 1846247 4495717 4937829 4931508 1215636 4631677 6121315 4928341 1465070 1697145 5211998 5483348 8388608 8192 1530098 1873715 3196046 3294850 3289768 1144923 3976804 3338651 4269248 1810580 1698517 4021962 4102221 8388608 16384 1425880 1804983 3005383 3117677 3116822 1203485 3051333 2357394 3071841 1709233 1670472 3054361 3114826
Clearly, file caching distorts IOZone results! The numbers for write are generally higher than those given by dd and bonnie++; and the numbers for read are simply outrageous, at more than 2.87 GB/s!