Detecting steady state with fio - vincentkfu/fio-blog GitHub Wiki

It is well known that random write performance for flash-based storage devices depends on the device state. When the device is brand new or freshly erased, random write performance is high as the controller does not engage in background garbage collection. However, once the device's available capacity and spare area are consumed, random write performance diminishes. At this point, in addition to committing incoming user writes to NAND, the drive must also copy forward valid data from blocks that are to be erased in preparation for storing new user data. This property of flash-based storage devices means that performance of a new device can be very different from how a device performs after it has been subject to a production workload for a long period.

Since the high performance period for a new device may last for many minutes or even hours, the appropriate metric for evaluating device performance is its long-run performance at steady state. However, the transition from initial high performance to long-run steady state performance levels can be gradual and it is difficult to predict in advance how long the transition will take. To address this problem, fio has a feature to automatically detect when performance levels have attained steady state.

Steady state detection in fio

How does fio detect steady state? Fio determines whether steady state has been attained based on a rolling window of per-second performance measurements. If the most recent measurements in this window meet the specified criterion, fio decides that steady state has been attained and stops the benchmark run. For example, suppose we have a thirty-second window. During the benchmark run, fio will continually collect per-second performance measurements. After a full window's worth of measurements have been collected, fio will begin checking the stopping criterion. If at any point fio finds that the most recent set of thirty performance measurements meets the specified threshold, the job will stop.

fio uses either IOPS or bandwidth as the metric for steady state detection. For a fixed block size, these two metrics will be equivalent, but they will differ when jobs have mixed block sizes. For each of IOPS and bandwidth, fio offers two ways to assess whether steady state has been attained. The first one uses the maximum mean deviation. Suppose that we have chosen IOPS as our performance metric. Each second, fio will calculate the mean IOPS over the steady state window. If every single performance measurement within the steady state window has a mean deviation that is less than the specified maximum, then fio will conclude that steady state has been attained and stop the benchmark run.

In Figure 1 below the third measurement is outside of the 0.5 percent mean deviation threshold. However, the final thirty measurements are all within the threshold and the job terminated once the steady state window shifted and the offending measurement dropped out. Note that the mean used to calculate the deviations only relies on the thirty measurements within the steady state window.

Figure 1 IOPS Mean Deviation example

The second steady state assessment strategy is to calculate the least squares regression slope within the steady state window. If the slope is sufficiently close to zero, then steady state has been attained and fio will stop the benchmark run. The slope is the rate of change in performance over time. If performance is essentially unchanging then by definition steady state has been attained.

Figure 2 below shows a steady state run with a stopping criterion that the bandwidth slope be less than 4 KiB. In the first thirty-second sliding window, the slope was calculated as 17.99. This was larger than the specified steady state threshold and the job continued to run. New slopes were calculated from the shifting thirty-second window each second that the job continued to run. Seven seconds later, the linear regression slope was estimated at 1.24 and the job was terminated.

Figure 2 Bandwidth Slope Example

Both the maximum mean deviation and slope metrics can be expressed as specific numerical values or as percentages of the mean IOPS or mean bandwidth. 

Fio also has a steady state ramp time feature that delays the commencement of data collection for the steady state window. This is to allow all of the free space on the storage device to be consumed before the lower performance levels at steady state begin.

These are the descriptions from the fio documentation of the options related to steady state detection:

steadystate=str:float, ss=str:float

Define the criterion and limit for assessing steady state performance. The
first parameter designates the criterion whereas the second parameter sets the
threshold. When the criterion falls below the threshold for the specified
duration, the job will stop. For example, iops_slope:0.1% will direct fio to
terminate the job when the least squares regression slope falls below 0.1% of
the mean IOPS. If group_reporting is enabled this will apply to all jobs in the
group. Below is the list of available steady state assessment criteria. All
assessments are carried out using only data from the rolling collection window.
Threshold limits can be expressed as a fixed value or as a percentage of the
mean in the collection window.

When using this feature, most jobs should include the time_based and runtime 
options or the loops option so that fio does not stop running after it has
covered the full size of the specified file(s) or device(s).

    iops

    Collect IOPS data. Stop the job if all individual IOPS measurements are within
    the specified limit of the mean IOPS (e.g., iops:2 means that all individual
    IOPS values must be within 2 of the mean, whereas iops:0.2% means that all
    individual IOPS values must be within 0.2% of the mean IOPS to terminate the
    job).

    iops_slope

    Collect IOPS data and calculate the least squares regression slope. Stop the
    job if the slope falls below the specified limit.

    bw

    Collect bandwidth data. Stop the job if all individual bandwidth measurements
    are within the specified limit of the mean bandwidth.

    bw_slope

    Collect bandwidth data and calculate the least squares regression slope. Stop
    the job if the slope falls below the specified limit.

steadystate_duration=time, ss_dur=time

A rolling window of this duration will be used to judge whether steady state
has been reached. Data will be collected once per second. The default is 0
which disables steady state detection. When the unit is omitted, the value is
interpreted in seconds.

steadystate_ramp_time=time, ss_ramp=time

Allow the job to run for the specified duration before beginning data
collection for checking the steady state job termination criterion. The default
is 0. When the unit is omitted, the value is interpreted in seconds.

Usage notes

The usual fio benchmark stopping criterion (e.g., runtime, size, io_size) are still needed even when steady state detection is enabled. Steady state detection will stop the benchmark run early if steady state is attained but the job may also end without steady state attainment if, for example, the maximum runtime is reached.

For detailed steady state data, use the json output format. This will include the per-second measurements in the steady state window. With the normal output format, there will be a single summary line with whether or not steady state was attained, IOPS and bandwidth in the steady state window, and the value of the stopping criterion.

Examples

The examples here use fio 3.30.

Example 1: IOPS maximum mean deviation

[iops_mean]
ioengine=null
size=1T
rw=randwrite
time_based=1
runtime=10m
randrepeat=0
norandommap=1
steadystate=iops:2%
steadystate_duration=30s
steadystate_ramp_time=10s

This example job uses per-second IOPS measurements for determining steady state. For these measurements it considers the maximum mean deviation. For each sliding thirty-second steady state window, fio will calculate the mean IOPS. If none of the measurements within the window are more than 2 percent above or below the mean, then fio will declare that steady state has been attained and stop the benchmark run.

The job above has a maximum runtime of 10 minutes. If steady state has not been attained by then the job will stop. The steadystate_ramp_time option is also used here. Fio will not begin collecting data for assessing steady state until the first 10 seconds of the benchmark run have elapsed.

# fio iops_mean.fio
iops_mean: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=null, iodepth=1
fio-3.30
Starting 1 process
Jobs: 1 (f=1): [w(1)][15.5%][w=12.6GiB/s][w=3292k IOPS][eta 08m:27s]
iops_mean: (groupid=0, jobs=1): err= 0: pid=38899: Thu May 12 12:00:26 2022
  write: IOPS=3267k, BW=12.5GiB/s (13.4GB/s)(1158GiB/92899msec); 0 zone resets
    clat (nsec): min=12, max=754885, avg=21.12, stdev=60.95
     lat (nsec): min=62, max=754941, avg=77.82, stdev=93.74
    clat percentiles (nsec):
     |  1.00th=[   17],  5.00th=[   17], 10.00th=[   18], 20.00th=[   18],
     | 30.00th=[   19], 40.00th=[   19], 50.00th=[   21], 60.00th=[   22],
     | 70.00th=[   22], 80.00th=[   23], 90.00th=[   23], 95.00th=[   23],
     | 99.00th=[   38], 99.50th=[   42], 99.90th=[   86], 99.95th=[  100],
     | 99.99th=[  195]
   bw (  MiB/s): min= 7938, max=12983, per=100.00%, avg=12771.17, stdev=617.98, samples=185
   iops        : min=2032342, max=3323790, avg=3269420.37, stdev=158202.85, samples=185
  lat (nsec)   : 20=40.38%, 50=59.30%, 100=0.26%, 250=0.04%, 500=0.01%
  lat (nsec)   : 750=0.01%, 1000=0.01%
  lat (usec)   : 2=0.01%, 4=0.01%, 10=0.01%, 20=0.01%, 50=0.01%
  lat (usec)   : 100=0.01%, 1000=0.01%
  cpu          : usr=99.99%, sys=0.00%, ctx=560, majf=0, minf=8
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,303498967,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1
  steadystate  : attained=yes, bw=12.6GiB/s (13.2GB/s), iops=3301k, iops mean dev=1.311%

Run status group 0 (all jobs):
  WRITE: bw=12.5GiB/s (13.4GB/s), 12.5GiB/s-12.5GiB/s (13.4GB/s-13.4GB/s), io=1158GiB (1243GB), run=92899-92899msec

For this job, fio found that in the final thirty seconds the largest deviation from the mean IOPS was 1.3% of the 3301K IOPS mean. This is smaller than the specified 2 percent threshold and fio terminated the benchmark run.

Example 2: Bandwidth slope

[bw_slope]
direct=1
ioengine=psync
size=1G
rw=randread
time_based=1
runtime=10m
randrepeat=0
norandommap=1
steadystate=bw_slope:4k
steadystate_duration=30s
steadystate_ramp_time=30s

This example job uses per-second bandwidth measurements for determining steady state. For these measurements it considers the least squares regression slope. For each sliding thirty-second steady state window, fio will calculate the slope. If the slope is less than 4096, then fio will declare that steady state has been attained and stop the benchmark run.

The job above has a maximum runtime of 10 minutes. If steady state has not been attained by then the job will stop. The steadystate_ramp_time option is also used here. Fio will not begin collecting data for assessing steady state until the first thirty seconds of the benchmark run have elapsed.

# fio bw_slope.fio
bw_slope: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=psync, iodepth=1
fio-3.30
Starting 1 process
Jobs: 1 (f=1): [r(1)][13.2%][r=33.9MiB/s][r=8679 IOPS][eta 08m:41s]
bw_slope: (groupid=0, jobs=1): err= 0: pid=69744: Thu May 12 12:33:19 2022
  read: IOPS=8326, BW=32.5MiB/s (34.1MB/s)(2566MiB/78881msec)
    clat (usec): min=44, max=3081, avg=119.30, stdev=32.37
     lat (usec): min=44, max=3081, avg=119.39, stdev=32.38
    clat percentiles (usec):
     |  1.00th=[   79],  5.00th=[   80], 10.00th=[   89], 20.00th=[   91],
     | 30.00th=[   97], 40.00th=[  106], 50.00th=[  112], 60.00th=[  128],
     | 70.00th=[  139], 80.00th=[  141], 90.00th=[  153], 95.00th=[  184],
     | 99.00th=[  196], 99.50th=[  200], 99.90th=[  208], 99.95th=[  217],
     | 99.99th=[  314]
   bw (  KiB/s): min=32032, max=35664, per=100.00%, avg=33313.94, stdev=870.58, samples=157
   iops        : min= 8008, max= 8916, avg=8328.48, stdev=217.65, samples=157
  lat (usec)   : 50=0.01%, 100=31.73%, 250=68.25%, 500=0.01%, 750=0.01%
  lat (usec)   : 1000=0.01%
  lat (msec)   : 4=0.01%
  cpu          : usr=1.78%, sys=9.03%, ctx=657104, majf=0, minf=14
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=656775,0,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1
  steadystate  : attained=yes, bw=32.3MiB/s (33.1MB/s), iops=8277, bw slope=3031.097

Run status group 0 (all jobs):
   READ: bw=32.5MiB/s (34.1MB/s), 32.5MiB/s-32.5MiB/s (34.1MB/s-34.1MB/s), io=2566MiB (2690MB), run=78881-78881msec

Disk stats (read/write):
  sda: ios=656775/130, merge=0/39, ticks=73740/15, in_queue=73757, util=99.92%

For this job file, fio ran for nearly 79 seconds and in the final thirty seconds the bandwidth slope was 3031. In other words, bandwidth was increasing at a rate of 3031 bytes per second per second in the final thirty-second window. Because this was below the 4096 slope threshold specified in the job file, fio terminated the benchmark run.

If we had specified --output-format=json when running the job file above, in the json output would be a steadystate component that takes this form:

"steadystate" : {
        "ss" : "bw_slope:4096.000000",
        "duration" : 30,
        "attained" : 1,
        "criterion" : "-3325.282471",
        "max_deviation" : 0.000000,
        "slope" : -3325.282536,
        "data" : {
          "bw_mean" : 34845486,
          "iops_mean" : 8506,
          "iops" : [
            8456,
            8578,
            8568,
            8538,
            8530,
            8579,
            8432,
            8505,
            8529,
            8508,
            8471,
            8488,
            8557,
            8489,
            8511,
            8516,
            8488,
            8449,
            8475,
            8487,
            8477,
            8549,
            8498,
            8452,
            8496,
            8438,
            8553,
            8550,
            8519,
            8520
          ],
          "bw" : [
            34637645,
            35135488,
            35094528,
            34973853,
            34938880,
            35141957,
            34539243,
            34836480,
            34936952,
            34850850,
            34697216,
            34768848,
            35049472,
            34772948,
            34861056,
            34883651,
            34766848,
            34608944,
            34715547,
            34762752,
            34723747,
            35018954,
            34809849,
            34621245,
            34801649,
            34563843,
            35035355,
            35020800,
            34895951,
            34900052
          ]
        }
      }

It includes the specified stopping criterion and window size. Also included is an indication of whether steady state was actually attained or not. The criterion field is the realized value of the stopping criterion, which for this example is listed as a slope of -3325. Also included are per-second IOPS and bandwidth measurements as well as their means for the steady state window.

Example 3: My job isn't stopping. Can I take a peek at the steady state criterion?

Yes, in fact, it is possible to query the value of the steady state criterion at any point during the job. Fio will print the current full job output at any time when it receives the USR1 signal. In a separate shell window, obtain the PID of the main fio process and then use kill to send it the USR1 signal.

user@ubuntu:~$ ps -C fio
    PID TTY          TIME CMD
  70454 pts/0    00:00:04 fio
  70528 ?        00:00:03 fio
user@ubuntu:~$ kill -s USR1 70454

The terminal session running the fio job will appear like what is below. The first output panel is the result of the USR1 signal. The signal was sent 62.5 seconds into the run and at that point steady state had not yet been attained because the bandwidth slope was -14816.8. The final panel is the usual fio output and indicates that steady state was finally attained 64.9 seconds into the job when the bandwidth slope reached -701.4.

user@ubuntu:~/fio-dev/fio-blog/0001-steadystate$ fio bw_slope.fio
bw_slope: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=psync, iodepth=1
fio-3.30
Starting 1 process
Jobs: 1 (f=1): [r(1)][10.3%][r=32.2MiB/s][r=8235 IOPS][eta 08m:58s]
bw_slope: (groupid=0, jobs=1): err= 0: pid=71022: Thu May 12 14:50:10 2022
  read: IOPS=8388, BW=32.8MiB/s (34.4MB/s)(2047MiB/62475msec)
    clat (usec): min=45, max=3120, avg=118.36, stdev=31.54
     lat (usec): min=45, max=3120, avg=118.44, stdev=31.54
    clat percentiles (usec):
     |  1.00th=[   79],  5.00th=[   80], 10.00th=[   88], 20.00th=[   91],
     | 30.00th=[   97], 40.00th=[  106], 50.00th=[  111], 60.00th=[  128],
     | 70.00th=[  137], 80.00th=[  141], 90.00th=[  151], 95.00th=[  184],
     | 99.00th=[  196], 99.50th=[  198], 99.90th=[  208], 99.95th=[  212],
     | 99.99th=[  297]
   bw (  KiB/s): min=32160, max=36336, per=100.00%, avg=33574.31, stdev=924.79, samples=124
   iops        : min= 8040, max= 9084, avg=8393.57, stdev=231.20, samples=124
  lat (usec)   : 50=0.01%, 100=32.73%, 250=67.25%, 500=0.01%, 750=0.01%
  lat (usec)   : 1000=0.01%
  lat (msec)   : 4=0.01%
  cpu          : usr=1.75%, sys=9.03%, ctx=524337, majf=0, minf=11
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=524047,0,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1
  steadystate  : attained=no, bw=32.8MiB/s (33.6MB/s), iops=8405, bw slope=-14816.775

Run status group 0 (all jobs):
   READ: bw=32.8MiB/s (34.4MB/s), 32.8MiB/s-32.8MiB/s (34.4MB/s-34.4MB/s), io=2047MiB (2146MB), run=62475-62475msec

Disk stats (read/write):
  sda: ios=523293/68, merge=0/21, ticks=58302/10, in_queue=58312, util=99.89%
Jobs: 1 (f=1): [r(1)][10.8%][r=33.8MiB/s][r=8649 IOPS][eta 08m:55s]
bw_slope: (groupid=0, jobs=1): err= 0: pid=71022: Thu May 12 14:50:12 2022
  read: IOPS=8398, BW=32.8MiB/s (34.4MB/s)(2129MiB/64884msec)
    clat (usec): min=45, max=3120, avg=118.21, stdev=31.47
     lat (usec): min=45, max=3120, avg=118.30, stdev=31.47
    clat percentiles (usec):
     |  1.00th=[   79],  5.00th=[   80], 10.00th=[   88], 20.00th=[   91],
     | 30.00th=[   97], 40.00th=[  106], 50.00th=[  111], 60.00th=[  128],
     | 70.00th=[  137], 80.00th=[  139], 90.00th=[  151], 95.00th=[  184],
     | 99.00th=[  196], 99.50th=[  198], 99.90th=[  208], 99.95th=[  212],
     | 99.99th=[  322]
   bw (  KiB/s): min=32160, max=36336, per=100.00%, avg=33605.13, stdev=934.32, samples=129
   iops        : min= 8040, max= 9084, avg=8401.28, stdev=233.58, samples=129
  lat (usec)   : 50=0.01%, 100=33.12%, 250=66.86%, 500=0.01%, 750=0.01%
  lat (usec)   : 1000=0.01%
  lat (msec)   : 4=0.01%
  cpu          : usr=1.73%, sys=8.98%, ctx=545223, majf=0, minf=13
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=544925,0,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1
  steadystate  : attained=yes, bw=32.9MiB/s (33.7MB/s), iops=8421, bw slope=-701.744

Run status group 0 (all jobs):
   READ: bw=32.8MiB/s (34.4MB/s), 32.8MiB/s-32.8MiB/s (34.4MB/s-34.4MB/s), io=2129MiB (2232MB), run=64884-64884msec

Disk stats (read/write):
  sda: ios=544924/79, merge=0/22, ticks=60665/12, in_queue=60677, util=99.90%

Conclusion

Steady state detection was implemented in 2016 and its feature set has remained unchanged. Here are possible improvements that could be made.

Currently there is no way to measure latency for the steady state window. Fio's latency tracking data structures are quite large. So it is undesirable to snapshot the latency data every second. One alternative way to produce steady state latency data would be to first snapshot the data when steady state has been attained. Usually the benchmark run would stop at this point. However, instead of stopping, fio could continue running for the duration of the steady state window. Ideally the steady state criterion will remain satisfied; fio could confirm that this is the case and if so, the difference between the final latency data and the snapshot will represent the latency distribution during steady state. If the steady state criterion is not met in the final window, fio can repeat the process with a new latency snapshot.

Latency is in fact an additional criterion that fio could use to define steady state. For example, it would be reasonable to define the attainment of steady state as the point where the median latency has stabilized over a given window. This feature could be added by having fio calculate a specific latency percentile each second and then use the maximum mean deviation or slope criteria to assess these values over a steady state window.

There exists a Python script that carries out basic tests for steady state detection. These tests could be augmented to more rigorously check that the feature is operating as expected. By obtaining steady state JSON output before steady state is actually attained a test could confirm that fio is in fact terminating exactly when steady state is attained.

Finally, fio actually uses separate code paths to measure performance for steady state detection and logging of bandwidth and IOPs. These could be abstracted out and unified to reduce code duplication.

Notes

  • Luis Chamberlain's fio-tests project (https://github.com/mcgrof/fio-tests) automates the construction of fio job files and uses fio's steady state detection feature.
  • How does group_reporting interact with steady state detection? If group reporting is enabled, steady state options must be identical for all jobs within the group and measurements collected will derive from all jobs within the group.
  • Fio has a separate bandwidth and IOPS logging facility. However, steady state detection can also be used for this purpose. Simply set an impossible steady state attainment criteria, set the steady state duration to the job's duration, and select the json output format. Within the JSON output will be objects containing per-second measurements. In some cases this can be more convenient for processing than the usual logging data. Note that all of the steady state measurements are stored in RAM and this strategy would be impractical for extremely long windows.