How volume latency metrics are calculated - NetApp/harvest GitHub Wiki
The perf metric values are calculated from two consecutive polls (therefore,
no metrics are emitted after the first poll). The calculation algorithm depends on the property
and base-counter
attributes of each metric, the following properties are supported:
property | formula | description |
---|---|---|
raw | x = xi | no post-processing, value x is submitted as it is |
delta | x = xi - xi-1 | delta of two poll values, xi and xi-1 |
rate | x = (xi - xi-1) / (ti - ti-1) | delta divided by the interval of the two polls in seconds |
average | x = (xi - xi-1) / (yi - yi-1) | delta divided by the delta of the base counter y |
percent | x = 100 * (xi - xi-1) / (yi - yi-1) | average multiplied by 100 |
latency_io_reqd
special field used for latency only.
parameter | type | description | default |
---|---|---|---|
latency_io_reqd |
int, optional | threshold of IOPs for calculating latency metrics (latencies based on very few IOPs are unreliable) | 10 |
In case of latency calculation, base-counter is the mandatory for further processing. This belongs to the case of average
from the above table.
Step1:
We first take delta of latency counter of current poll with the previous poll.
Step2:
Take delta of base-counter of current poll with previous poll. There is slight change as thresholds also involved in the calculation.
Step3:
There is latency_io_reqd
optional field which used for controlling threshold value, default value is 10.
Based on the latency_io_reqd
field value, minimumBase
value has been decided and if delta of base-counter is greater than this minimumBase
, then only current latency counters are being processed further else set as 0(zero) because latencies based on very few IOPs are unreliable.
Step4:
If Step3 condition fulfilled then apply average
formula as delta of latency counter/ delta of base-counter and export latency counters.
Step5:
In case, delta of latency counter value is less than 0 or delta of base-counter value is less than 0 or any other case than Step4, latency counters are not exported.
2. Applying the thresholding where any counter ends with latency
and it has a property of average
or percent
Latency calculation would belongs to average
case as per the above table. Also, instead of normal delta division for average, here it's applying delta division with thresholds DivideWithThreshold
function. As mentioned in section1, latency_io_reqd
field would be used for deriving minimumBase
and based on that it will decide latency counters need to be processed or not as latencies based on very few IOPs are unreliable
These are the links where this logic would be used in all perf collectors .
There are some special case for thresholds while calculating the latencies where minimumBase
value would not help because there would be some objects like ontaps3_svm
and few latency metrics like optimal_point_latency
and scan_latency
where base counter would be always has a few IOPS. So, intentionally we have made minimumBase
as 0 to ensure it should proceed further and calculate the latencies appropriately.
This is the link where this logic would be used in Harvest codeflow.
4. Comparison of Prometheus, Grafana, ONTAP CLI statistics, PAS, and System Manager screenshots for the same time period
This is the detail of latencies of the workload osc_vol01-wid33577
at the same time interval and compared between Prometheus, Grafana, ONTAP CLI, PAS, System Manager.
read latency values would be range between 250-450µs
, write latency is 0µs
.
Prometheus:
qos_read_latency
:
qos_write_latency
:
qos_latency
:
Grafana:
This is from Volume
dashboard
This is from Workload
dashboard which is exactly same as above
ONTAP CLI:
qos statistics volume latency show -volume osc_vol01 -iterations 10 -vserver osc -interval 60
NOTES:
- prometheus is in different timezone to cluster hence the time difference in image vs cli
- the
qos statistic
command won't print anything for workloads without activity - this command updates every minute - rows at the top happen before rows at the bottom (time flows down)
- these values match closely with the Perf ZAPI values shown in Prometheus above
Below are the first three minutes of output from the CLI. We're interested in the latency, which is the third column. As you can see, the latency of range between 250-450(µs)
microseconds and corresponds to the graph above collected by Harvest.
umeng-aff300-01-02::> qos statistics volume latency show -volume osc_vol01 -iterations 10 -vserver osc -interval 60
Workload ID Latency Network Cluster Data Disk QoS Max QoS Min NVRAM Cloud FlexCache SM Sync VA AVSCAN
--------------- ------ ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ----------
-total- - 562.00us 58.00us 0ms 504.00us 0ms 0ms 0ms 0ms 0ms 0ms 0ms 0ms 0ms
osc_vol01-wid.. 33577 565.00us 58.00us 0ms 507.00us 0ms 0ms 0ms 0ms 0ms 0ms 0ms 0ms 0ms
-total- - 422.00us 60.00us 0ms 362.00us 0ms 0ms 0ms 0ms 0ms 0ms 0ms 0ms 0ms
osc_vol01-wid.. 33577 421.00us 60.00us 0ms 361.00us 0ms 0ms 0ms 0ms 0ms 0ms 0ms 0ms 0ms
-total- - 526.00us 55.00us 1.00us 470.00us 0ms 0ms 0ms 0ms 0ms 0ms 0ms 0ms 0ms
osc_vol01-wid.. 33577 531.00us 55.00us 0ms 476.00us 0ms 0ms 0ms 0ms 0ms 0ms 0ms 0ms 0ms
-total- - 522.00us 57.00us 0ms 465.00us 0ms 0ms 0ms 0ms 0ms 0ms 0ms 0ms 0ms
osc_vol01-wid.. 33577 522.00us 57.00us 0ms 465.00us 0ms 0ms 0ms 0ms 0ms 0ms 0ms 0ms 0ms
-total- - 572.00us 65.00us 0ms 493.00us 0ms 0ms 0ms 0ms 0ms 14.00us 0ms 0ms 0ms
osc_vol01-wid.. 33577 508.00us 57.00us 0ms 451.00us 0ms 0ms 0ms 0ms 0ms 0ms 0ms 0ms 0ms
-total- - 447.00us 71.00us 89.00us 286.00us 0ms 0ms 0ms 0ms 0ms 1.00us 0ms 0ms 0ms
osc_vol01-wid.. 33577 444.00us 59.00us 0ms 385.00us 0ms 0ms 0ms 0ms 0ms 0ms 0ms 0ms 0ms
-total- - 399.00us 71.00us 107.00us 221.00us 0ms 0ms 0ms 0ms 0ms 0ms 0ms 0ms 0ms
osc_vol01-wid.. 33577 381.00us 59.00us 0ms 322.00us 0ms 0ms 0ms 0ms 0ms 0ms 0ms 0ms 0ms
-total- - 2.57ms 54.00us 79.00us 2.19ms 0ms 0ms 0ms 0ms 0ms 244.00us 0ms 0ms 0ms
osc_vol01-wid.. 33577 469.00us 58.00us 0ms 411.00us 0ms 0ms 0ms 0ms 0ms 0ms 0ms 0ms 0ms
-total- - 3.32ms 75.00us 1.00us 2.49ms 0ms 0ms 0ms 0ms 0ms 754.00us 0ms 0ms 0ms
osc_vol01-wid.. 33577 525.00us 54.00us 0ms 471.00us 0ms 0ms 0ms 0ms 0ms 0ms 0ms 0ms 0ms
-total- - 3.36ms 78.00us 0ms 2.34ms 0ms 0ms 0ms 0ms 0ms 942.00us 0ms 0ms 0ms
osc_vol01-wid.. 33577 506.00us 55.00us 0ms 451.00us 0ms 0ms 0ms 0ms 0ms 0ms 0ms 0ms 0ms
PAS:
read_latency
:
write_latency
:
other_latency
:
System Manager:
