Periodic performance measurements - vincentkfu/fio-blog GitHub Wiki

Adam Manzanares provided helpful feedback on this post.

I have been using fio for over a decade but still regularly find features I am not familiar with. One of these is fio's summary statistics for periodic bandwidth and IOPS measurements. These are reported via the two "bw" and "iops" lines highlighted in the output below, directly beneath the completion latency percentile panel.

$ fio --name=test --ioengine=null --filesize=1G --time_based --runtime=3s
test: (g=0): rw=read, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=null, iodepth=1
fio-3.34-46-g0771
Starting 1 process
Jobs: 1 (f=1): [R(1)][-.-%][r=13.0GiB/s][r=3399k IOPS][eta 00m:00s]
test: (groupid=0, jobs=1): err= 0: pid=8498: Mon May  1 20:24:28 2023
  read: IOPS=3360k, BW=12.8GiB/s (13.8GB/s)(38.4GiB/3000msec)
    clat (nsec): min=37, max=62397, avg=46.13, stdev=146.11
     lat (nsec): min=73, max=62490, avg=87.04, stdev=197.77
    clat percentiles (nsec):
     |  1.00th=[   40],  5.00th=[   42], 10.00th=[   42], 20.00th=[   43],
     | 30.00th=[   43], 40.00th=[   44], 50.00th=[   44], 60.00th=[   45],
     | 70.00th=[   45], 80.00th=[   45], 90.00th=[   48], 95.00th=[   51],
     | 99.00th=[   57], 99.50th=[   78], 99.90th=[  110], 99.95th=[  113],
     | 99.99th=[  362]
+------------------------------------------------------------------------------------------+
|  bw (  MiB/s): min=12354, max=13351, per=100.00%, avg=13130.32, stdev=435.00, samples=5  |
|  iops        : min=3162738, max=3418112, avg=3361363.60, stdev=111358.31, samples=5      |
+------------------------------------------------------------------------------------------+
  lat (nsec)   : 50=93.49%, 100=6.26%, 250=0.23%, 500=0.01%, 750=0.01%
  lat (nsec)   : 1000=0.01%
  lat (usec)   : 2=0.01%, 4=0.01%, 10=0.01%, 20=0.01%, 50=0.01%
  lat (usec)   : 100=0.01%
  cpu          : usr=99.93%, sys=0.00%, ctx=3, majf=0, minf=8
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=10079211,0,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
   READ: bw=12.8GiB/s (13.8GB/s), 12.8GiB/s-12.8GiB/s (13.8GB/s-13.8GB/s), io=38.4GiB (41.3GB), run=3000-3000msec

Fio continuously monitors bytes transferred and I/O operations completed. By default fio will measure bandwidth and IOPS in each half-second interval and then report results based on these measurements.

The reported values are mostly straightforward. For bandwidth, the collected measurements had a minimum of 12,354 MiB/s and a maximum of 13,351 MiB/s. Of the jobs in this one's reporting group, the measured bandwidth constituted 100 percent of the group's bandwidth (since this group included only one job). Bandwidth measurements had an average of 13,130 MiB/s and a standard deviation of 435 MiB/s. In total five bandwidth measurements were collected.

The statistics reported for IOPS are similar, although the group percentage is omitted. The five IOPS measurements had a minimum of 3,162,738, a maximum of 3,418,112, a mean of 3,361,364, and a standard deviation of 111,358. Note that IOPS and bandwidth values are collected independently, and should not be expected to match exactly with each other.

Two options control the size of the intervals over which bandwidth and IOPS are calculated. Here are the descriptions from the documentation:

Name Description
bwavgtime=int Average the calculated bandwidth over the given time. Value is specified in milliseconds. If the job also does bandwidth logging through write_bw_log, then the minimum of this option and log_avg_msec will be used. Default: 500ms.
iopsavgtime=int Average the calculated IOPS over the given time. Value is specified in milliseconds. If the job also does IOPS logging through write_iops_log, then the minimum of this option and log_avg_msec will be used. Default: 500ms.

bwavgtime controls the size of the intervals over which bandwidth measurements are collected. By default bandwidth is calculated over half second intervals. iopsavgtime specifies the intervals over which IOPS measurements are collected. The default is the same at 500ms. Note that fio will ignore any units specified with the values and always parse values as milliseconds (i.e., bwavgtime=1000s will have fio collect bandwidth measurements every second).

Also worth noting is the relationship between these two options and log_avg_msec. As the documentation states, the respective actual measurement frequencies will be the minimum of {bw|iops}avgtime and the value specified for log_avg_msec. The reason is that the code used for these periodic measurements is also used for write_{bw|iops}_log when log_avg_msec is set. The value set for {bw|iops}avgtime can also increase the frequency with which these logs are written when the times are less than log_avg_msec.

Discussion

The most useful aspect of these reported values is that they provide information on the variability of performance levels during the job run. The averages of these measurements are not especially useful as they are just estimates of the overall bandwidth and IOPS for the full job which are already reported exactly elsewhere in the output. However, nowhere else do we see how performance varied during the job's run time. Was bandwidth essentially constant during the job? Or did IOPS vary wildly over the course of the run?

Here is an example of what these measurements can tell us about variability. I ran two sequential read workloads on a hard drive. Run 1 was carried out when the device was otherwise idle. Run 2 was carried out while a separate random read job was running on the same device. The table below lists the summary statistics for the periodic bandwidth and IOPS measurements.

Statistic Run 1: No background job Run 2: Background random read job
Bandwidth (KiB/s)
standard deviation 2,068 3,212
range 10,176 10,024
average 112,742 6,738
minimum 110,408 3,680
maximum 120,584 13,704
N 19 19
IOPS
standard deviation 517 803
range 2544 2506
average 28185 1684
minimum 27602 920
maximum 30146 3426
N 19 19

The standard deviation is higher for Run 2 with a background random read job than for Run 1 where the device was otherwise idle for both bandwidth (3212 KiB/s vs 2068 KiB/s) and IOPS (803 vs 517). When there is no background activity, a sequential read job should produce a smooth performance curve, but with a random read workload in the background it is not surprising to see higher variability.

The other measures of variability also tell a similar story. The ranges (difference between minimum and maximum) for bandwidth (10,024 vs 10,176) and IOPS (2,506 vs 2,544) are similar between the two trials but this direct comparison does not take into account the actual performance levels. The bandwidth range for Run 1 is less than 10% of the run's overall average bandwidth, but the bandwidth range for Run 2 is nearly 1.5 times the run's overall average bandwidth. Thus, as a proportion of the respective performance levels, the bandwidth range for Run 2 dwarfs the bandwidth range for Run 1. The same is true for the IOPS ranges. In terms of absolute value the IOPS ranges are similar but relative to the performance levels of the respective runs, the Run 2 IOPS range is much larger than the Run 1 IOPS range.

The full output for the two fio invocations is below:

Full comparison output
# ./fio --name=test --filename=/dev/sda --runtime=10 --direct=1 
test: (g=0): rw=read, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=psync, iodepth=1
fio-3.34
Starting 1 process
Jobs: 1 (f=1): [R(1)][100.0%][r=110MiB/s][r=28.3k IOPS][eta 00m:00s]
test: (groupid=0, jobs=1): err= 0: pid=528049: Tue May  2 14:09:51 2023
  read: IOPS=28.2k, BW=110MiB/s (115MB/s)(1101MiB/10001msec)
    clat (usec): min=25, max=11609, avg=34.81, stdev=22.28
     lat (usec): min=25, max=11610, avg=34.90, stdev=22.29
    clat percentiles (usec):
     |  1.00th=[   27],  5.00th=[   31], 10.00th=[   31], 20.00th=[   31],
     | 30.00th=[   32], 40.00th=[   33], 50.00th=[   35], 60.00th=[   38],
     | 70.00th=[   38], 80.00th=[   39], 90.00th=[   40], 95.00th=[   40],
     | 99.00th=[   44], 99.50th=[   47], 99.90th=[   64], 99.95th=[  113],
     | 99.99th=[  135]
   bw (  KiB/s): min=110408, max=120584, per=100.00%, avg=112741.89, stdev=2068.25, samples=19
   iops        : min=27602, max=30146, avg=28185.47, stdev=517.06, samples=19
  lat (usec)   : 50=99.74%, 100=0.20%, 250=0.06%, 500=0.01%
  lat (msec)   : 20=0.01%
  cpu          : usr=2.76%, sys=16.56%, ctx=281754, majf=0, minf=11
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=281748,0,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
   READ: bw=110MiB/s (115MB/s), 110MiB/s-110MiB/s (115MB/s-115MB/s), io=1101MiB (1154MB), run=10001-10001msec

Disk stats (read/write):
  sda: ios=278433/0, merge=0/0, ticks=8554/0, in_queue=8554, util=99.09%

# ./fio --name=test --startdelay=3s --filename=/dev/sda --runtime=10 --direct=1 --name=bg --filename=/dev/sda --rw=randread --direct=1 --runtime=14s
test: (g=0): rw=read, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=psync, iodepth=1
bg: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=psync, iodepth=1
fio-3.34
Starting 2 processes
Jobs: 1 (f=1): [_(1),r(1)][58.3%][r=2202KiB/s][r=550 IOPS][eta 00m:10s] 
test: (groupid=0, jobs=1): err= 0: pid=528042: Tue May  2 14:09:33 2023
  read: IOPS=1801, BW=7204KiB/s (7377kB/s)(70.4MiB/10001msec)
    clat (usec): min=104, max=87255, avg=554.04, stdev=3622.48
     lat (usec): min=104, max=87255, avg=554.16, stdev=3622.48
    clat percentiles (usec):
     |  1.00th=[  147],  5.00th=[  153], 10.00th=[  206], 20.00th=[  215],
     | 30.00th=[  219], 40.00th=[  225], 50.00th=[  227], 60.00th=[  229],
     | 70.00th=[  229], 80.00th=[  231], 90.00th=[  241], 95.00th=[  269],
     | 99.00th=[ 1369], 99.50th=[28181], 99.90th=[56361], 99.95th=[69731],
     | 99.99th=[82314]
   bw (  KiB/s): min= 3680, max=13704, per=100.00%, avg=6737.68, stdev=3211.98, samples=19
   iops        : min=  920, max= 3426, avg=1684.42, stdev=803.00, samples=19
  lat (usec)   : 250=91.42%, 500=4.55%, 750=0.89%, 1000=0.08%
  lat (msec)   : 2=2.11%, 10=0.03%, 20=0.23%, 50=0.52%, 100=0.16%
  cpu          : usr=0.55%, sys=1.20%, ctx=18016, majf=0, minf=12
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=18012,0,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1
bg: (groupid=0, jobs=1): err= 0: pid=528041: Tue May  2 14:09:33 2023
  read: IOPS=66, BW=267KiB/s (273kB/s)(3740KiB/14006msec)
    clat (usec): min=2736, max=70544, avg=14940.17, stdev=7633.70
     lat (usec): min=2736, max=70544, avg=14940.65, stdev=7633.71
    clat percentiles (usec):
     |  1.00th=[ 4490],  5.00th=[ 6194], 10.00th=[ 7570], 20.00th=[ 9372],
     | 30.00th=[10683], 40.00th=[11863], 50.00th=[13042], 60.00th=[14353],
     | 70.00th=[15926], 80.00th=[19268], 90.00th=[26870], 95.00th=[30540],
     | 99.00th=[36963], 99.50th=[41681], 99.90th=[70779], 99.95th=[70779],
     | 99.99th=[70779]
   bw (  KiB/s): min=  208, max=  360, per=4.88%, avg=264.89, stdev=45.27, samples=27
   iops        : min=   52, max=   90, avg=66.22, stdev=11.32, samples=27
  lat (msec)   : 4=0.43%, 10=24.49%, 20=56.79%, 50=18.07%, 100=0.21%
  cpu          : usr=0.28%, sys=0.12%, ctx=940, majf=0, minf=11
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=935,0,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
   READ: bw=5411KiB/s (5541kB/s), 267KiB/s-7204KiB/s (273kB/s-7377kB/s), io=74.0MiB (77.6MB), run=10001-14006msec

Disk stats (read/write):
  sda: ios=18933/0, merge=0/0, ticks=23640/0, in_queue=23639, util=98.70%

Notes

  • I am not the only one unfamiliar with these measurements as this blog post was inspired by a recent inquiry on the fio mailing list asking about the multiple bandwidth and IOPS figures fio reports.
  • What if I'm not interested in these periodic measurements and want to disable this feature? Fio does not have an option to explicitly disable this feature. However, it is possible to reduce the frequency of these measurements by specifying large values for bwavgtime and iopsavgtime. On a system with 4-byte integers bwavgtime=4294967295 will essentially disable periodic bandwidth measurement.
  • The final measurement is typically missed. The first example above ran for three seconds and measurements were collected every half second, yet only five measurements were collected. This is because fio terminated the job and disabled periodic measurement before the final measurement could be collected.
  • Part of the reason why this feature is not well known is because the discussion of these output lines in fio's documentation is quite terse. We should improve this. The discussion for log_avg_msec should also mention that the logging frequency can be affected by bwavgtime and iopsavgtime.
⚠️ **GitHub.com Fallback** ⚠️