How volume latency metrics are calculated - NetApp/harvest GitHub Wiki

1. How latency counter would be cooked

The perf metric values are calculated from two consecutive polls (therefore, no metrics are emitted after the first poll). The calculation algorithm depends on the property and base-counter attributes of each metric, the following properties are supported:

property	formula	description
raw	x = x_i	no post-processing, value x is submitted as it is
delta	x = x_i - x_i-1	delta of two poll values, x_i and x_i-1
rate	x = (x_i - x_i-1) / (t_i - t_i-1)	delta divided by the interval of the two polls in seconds
average	x = (x_i - x_i-1) / (y_i - y_i-1)	delta divided by the delta of the base counter y
percent	x = 100 * (x_i - x_i-1) / (y_i - y_i-1)	average multiplied by 100

latency_io_reqd special field used for latency only.

parameter	type	description	default
`latency_io_reqd`	int, optional	threshold of IOPs for calculating latency metrics (latencies based on very few IOPs are unreliable)	`10`

In case of latency calculation, base-counter is the mandatory for further processing. This belongs to the case of average from the above table.

Step1:

We first take delta of latency counter of current poll with the previous poll.

Step2:

Take delta of base-counter of current poll with previous poll. There is slight change as thresholds also involved in the calculation.

Step3:

There is latency_io_reqd optional field which used for controlling threshold value, default value is 10.

Based on the latency_io_reqd field value, minimumBase value has been decided and if delta of base-counter is greater than this minimumBase, then only current latency counters are being processed further else set as 0(zero) because latencies based on very few IOPs are unreliable.

Step4:

If Step3 condition fulfilled then apply average formula as delta of latency counter/ delta of base-counter and export latency counters.

Step5:

In case, delta of latency counter value is less than 0 or delta of base-counter value is less than 0 or any other case than Step4, latency counters are not exported.

2. Applying the thresholding where any counter ends with `latency` and it has a property of `average` or `percent`

Latency calculation would belongs to average case as per the above table. Also, instead of normal delta division for average, here it's applying delta division with thresholds DivideWithThreshold function. As mentioned in section1, latency_io_reqd field would be used for deriving minimumBase and based on that it will decide latency counters need to be processed or not as latencies based on very few IOPs are unreliable

These are the links where this logic would be used in all perf collectors .

ZapiPerf, RestPerf, KeyPerf

3. Special case as applying minimum number of iops as threshold

There are some special case for thresholds while calculating the latencies where minimumBase value would not help because there would be some objects like ontaps3_svm and few latency metrics like optimal_point_latency and scan_latency where base counter would be always has a few IOPS. So, intentionally we have made minimumBase as 0 to ensure it should proceed further and calculate the latencies appropriately.

This is the link where this logic would be used in Harvest codeflow.

4. Comparison of Prometheus, Grafana, ONTAP CLI statistics, PAS, and System Manager screenshots for the same time period

This is the detail of latencies of the workload osc_vol01-wid33577 at the same time interval and compared between Prometheus, Grafana, ONTAP CLI, PAS, System Manager. read latency values would be range between 250-450µs, write latency is 0µs.

Prometheus:

qos_read_latency:

qos_write_latency:

qos_latency:

Grafana:

This is from Volume dashboard

This is from Workload dashboard which is exactly same as above

ONTAP CLI:

qos statistics volume latency show -volume osc_vol01 -iterations 10 -vserver osc -interval 60

NOTES:

prometheus is in different timezone to cluster hence the time difference in image vs cli
the qos statistic command won't print anything for workloads without activity
this command updates every minute - rows at the top happen before rows at the bottom (time flows down)
these values match closely with the Perf ZAPI values shown in Prometheus above

Below are the first three minutes of output from the CLI. We're interested in the latency, which is the third column. As you can see, the latency of range between 250-450(µs) microseconds and corresponds to the graph above collected by Harvest.

umeng-aff300-01-02::> qos statistics volume latency show -volume osc_vol01 -iterations 10 -vserver osc -interval 60
Workload            ID    Latency    Network    Cluster       Data       Disk    QoS Max    QoS Min      NVRAM      Cloud  FlexCache    SM Sync         VA     AVSCAN 
--------------- ------ ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- 
-total-              -   562.00us    58.00us        0ms   504.00us        0ms        0ms        0ms        0ms        0ms        0ms        0ms        0ms        0ms 
osc_vol01-wid..  33577   565.00us    58.00us        0ms   507.00us        0ms        0ms        0ms        0ms        0ms        0ms        0ms        0ms        0ms 
-total-              -   422.00us    60.00us        0ms   362.00us        0ms        0ms        0ms        0ms        0ms        0ms        0ms        0ms        0ms 
osc_vol01-wid..  33577   421.00us    60.00us        0ms   361.00us        0ms        0ms        0ms        0ms        0ms        0ms        0ms        0ms        0ms 
-total-              -   526.00us    55.00us     1.00us   470.00us        0ms        0ms        0ms        0ms        0ms        0ms        0ms        0ms        0ms 
osc_vol01-wid..  33577   531.00us    55.00us        0ms   476.00us        0ms        0ms        0ms        0ms        0ms        0ms        0ms        0ms        0ms 
-total-              -   522.00us    57.00us        0ms   465.00us        0ms        0ms        0ms        0ms        0ms        0ms        0ms        0ms        0ms 
osc_vol01-wid..  33577   522.00us    57.00us        0ms   465.00us        0ms        0ms        0ms        0ms        0ms        0ms        0ms        0ms        0ms 
-total-              -   572.00us    65.00us        0ms   493.00us        0ms        0ms        0ms        0ms        0ms    14.00us        0ms        0ms        0ms 
osc_vol01-wid..  33577   508.00us    57.00us        0ms   451.00us        0ms        0ms        0ms        0ms        0ms        0ms        0ms        0ms        0ms 
-total-              -   447.00us    71.00us    89.00us   286.00us        0ms        0ms        0ms        0ms        0ms     1.00us        0ms        0ms        0ms 
osc_vol01-wid..  33577   444.00us    59.00us        0ms   385.00us        0ms        0ms        0ms        0ms        0ms        0ms        0ms        0ms        0ms 
-total-              -   399.00us    71.00us   107.00us   221.00us        0ms        0ms        0ms        0ms        0ms        0ms        0ms        0ms        0ms 
osc_vol01-wid..  33577   381.00us    59.00us        0ms   322.00us        0ms        0ms        0ms        0ms        0ms        0ms        0ms        0ms        0ms 
-total-              -     2.57ms    54.00us    79.00us     2.19ms        0ms        0ms        0ms        0ms        0ms   244.00us        0ms        0ms        0ms 
osc_vol01-wid..  33577   469.00us    58.00us        0ms   411.00us        0ms        0ms        0ms        0ms        0ms        0ms        0ms        0ms        0ms 
-total-              -     3.32ms    75.00us     1.00us     2.49ms        0ms        0ms        0ms        0ms        0ms   754.00us        0ms        0ms        0ms 
osc_vol01-wid..  33577   525.00us    54.00us        0ms   471.00us        0ms        0ms        0ms        0ms        0ms        0ms        0ms        0ms        0ms 
-total-              -     3.36ms    78.00us        0ms     2.34ms        0ms        0ms        0ms        0ms        0ms   942.00us        0ms        0ms        0ms 
osc_vol01-wid..  33577   506.00us    55.00us        0ms   451.00us        0ms        0ms        0ms        0ms        0ms        0ms        0ms        0ms        0ms

PAS:

read_latency:

write_latency:

other_latency:

System Manager:

How volume latency metrics are calculated - NetApp/harvest GitHub Wiki

1. How latency counter would be cooked

2. Applying the thresholding where any counter ends with latency and it has a property of average or percent

3. Special case as applying minimum number of iops as threshold

4. Comparison of Prometheus, Grafana, ONTAP CLI statistics, PAS, and System Manager screenshots for the same time period

⚠️ **GitHub.com Fallback** ⚠️

2. Applying the thresholding where any counter ends with `latency` and it has a property of `average` or `percent`

⚠️ GitHub.com Fallback ⚠️