METRICS from Cumulo Horcrux Dashboard - Cumulo-pro/Horcrux-Architecture GitHub Wiki

Last block height signed

signer_last_precommit_height

Indicates the last block height for which a precommit was signed. A precommit is a signal that a validator agrees with the state of the blockchain at that specific height and is ready to move forward.

Value: nº block height

image

Last precommit round to be signed

signer_last_precommit_round

Represents the last precommit round that was signed. Rounds are iterations within the same block height to reach consensus. 0 Indicates that the last precommit round that was signed was round 0, which may imply that consensus was reached in the first round for that specific block height.

Value:nº rounds conducted to reach consensus

image

Consecutive Threshold Signature Parts Missed

signer_missed_ephemeral_shares

Number of consecutive times a node has failed to provide its part of the signature in a threshold signature scheme. In these schemes, multiple parties generate and share "parts" of a signature that, when combined, create a valid signature on a message or transaction. The loss of ephemeral parts of the signature may indicate connectivity problems, node failures or synchronisation problems. Increasing metrics here indicate signing leader isn't able to get a response from the other nodes in the cluster.

Value:nº consecutive times a node has failed to provide its part of the signature

image

Time Last Threshold-Share Sign

signer_seconds_since_last_local_ephemeral_share_time

Measures the number of seconds elapsed since the last time an ephemeral share was signed locally. This metric is critical because a prolonged time without generating a shared signature may indicate underlying problems. If the value is significantly high, it could indicate problems with connectivity, synchronisation or possible difficulties in joining the consensus.
It should not exceed the blocking time of the chain.

Value: Time in seconds since the last time an ephemeral action was signed.

image

Error Total Insufficient Cosigners

signer_error_total_insufficient_cosigners

Counts the total number of times the number of co-signatories does not reach the threshold necessary to validate an operation or transaction. In other words, this metric increases each time an action that requires the signature of multiple parties cannot be completed because there are not enough signatures from the required parties.

Value: number of times of non-validated transactions

image

Error Total Invalid Signatures

signer_error_total_invalid_signatures

Total number of times the combined signature is invalid. Information on the total number of times the combined signatures resulting from signature operations are invalid. In this case, the ideal value of the metric is 0, indicating that no errors have been recorded at the time of measurement.

Value: Number of times a specific error related to the invalidity of the combined signatures occurs.

image

Total Nonces Requested When Cache is Drained

signer_total_drained_nonce_cache

Total count of times nonces (one-time numbers) have been requested when the node's nonces cache is exhausted. When a node's nonces cache is exhausted, it may indicate a high volume of transactions or activity on the node, leading to frequent requests for nonces.

Value: Total count of times nonces have been requested

image

Total Parts of Ephemeral Signatures Lost

signer_total_missed_ephemeral_shares

Total count of times parts of the ephemeral signature have been lost in a threshold signature scheme. Loss of ephemeral parts of the signature may indicate connectivity problems, node failures or synchronisation problems in the blockchain system.

Value: count of times parts of the ephemeral signature have been lost

image

Time taken to get all cosigner signatures

signer_sign_block_cosigner_lag_seconds

Time taken to obtain all the co-signatories' signatures required for a block. This metric focuses on the total time to collect all required signatures, providing a complete picture of the signing process. The lower these values, the faster the availability threshold is reached. This metric is only available in the Leader and will report 'NaN' in the trackers.
Value: seconds, which it takes to obtain the signatures of all co-signatories of a block.

image

Time Threshold of cosigners available

signer_sign_block_threshold_lag_seconds

This metric summarises the time, in seconds, it takes to obtain the necessary number (threshold) of co-signatories to sign a block. 90% of transactions should be completed in 0.x seconds or less. It is crucial that the cluster reaches the availability threshold in a short time to ensure efficient and secure operation of the blockchain system.
Value: time needed, in seconds, to acquire the required number of cosignatories

image

Time cosigner signature

signer_cosigner_sign_lag_seconds
Time taken to sign a cosigner, sorted by peerid and quantiles. High numbers may indicate a high latency link or a saturated resource. This metric is only available on the leader and will report 'NaN' on the followers.
Value: time taken to sign a cosigner in seconds

image

Total Times Signatory is NOT a Raft Leader

signer_total_raft_not_leader
Total number of times the signatory has not acted as a leader in the Raft consensus algorithm and has delegated the signing process to the Raft leader. This metric provides information on the signatory's participation in the signing process when it is not the Raft leader. A significant count of this metric may indicate that the signatory is not frequently chosen as the Raft leader or that there is a high level of delegation of signing to the Raft leader in the system.

image

Time Since Last Precommit

signer_seconds_since_last_precommit
Useful for Signature Co-Signer Node, Single Signature It measures in seconds the time elapsed since the last precommit. It is useful especially for nodes acting as Co-Signers or in single signer systems.
Value: seconds elapsed since the last precommit.

image

Time Since Last Prevote

signer_seconds_since_last_prevote
Measures in seconds the time elapsed since the last prevote. It is useful especially for nodes acting as Co-Signers or in systems with a single signer, it can be useful to understand the activity and behaviour of the node in the pre-signing voting process.
Value: seconds since the last pre-vote was held

image

Time Since Last Local Finish Sign

signer_seconds_since_last_local_sign_finish_time
Measures in seconds the time elapsed since the last local signature was completed. This value must remain below 2 times the block time. Value: seconds since the last local signature was completed

image

Time Since Last Local Start Sign

signer_seconds_since_last_local_sign_start_time
Measures in seconds the time elapsed since the last local signature was initiated. It is important to note that this value may increase beyond the block time, but is rarely significant. While this value can be useful for monitoring system performance and behaviour, it is rarely a critical indicator and therefore may not require immediate attention unless an unusual or abnormal pattern of behaviour is observed.
Value: in seconds the time elapsed since the last local signature was initiated.

image

Total Times Signer is Raft Leader

signer_total_raft_leader
Total number of times the signatory has acted as a leader in the Raft consensus algorithm. Monitoring this metric can provide information on the stability and health of the consensus process in the blockchain system, as well as on the performance and participation of the node as a leader in the Raft algorithm.

image

Raft Apply

horcrux_raft_apply
This is a counter that records the number of times registration entries are correctly applied in the Horcrux system using the Raft consensus algorithm.

image

Total Raft Leader Election Failures
signer_total_raft_leader_election_timeout
Total number of times the Raft leader has failed an election due to lack of available peers in the system. If a node attempts to become a leader but does not receive responses from enough peers within the allotted timeout, a leader election failure occurs. This type of failure can occur due to connectivity problems, loss of messages or unavailability of other nodes in the system.

image

Consecutive Sentry Connect Retries

signer_sentry_connect_tries
Watch 'signer_sentry_connect_tries' for any increase which indicates retry attempts to reach your sentry. If 'signer_total_sentry_connect_tries' is significant, it can indicate network or server issues.

image

Consecutive number of TCP connection attempts to sentinel nodes

signer_sentry_connect_tries
Counts the consecutive number of times a TCP connection has been attempted. A high count on this metric may indicate that the validator (or "signer") has been restarted several times, as each restart may result in repeated attempts to connect. This metric is useful for monitoring and diagnosing connection and node stability problems within an infrastructure using TCP connections, especially in environments where connection stability is critical.

image

Total number of TCP connection attempts to sentinel nodes

signer_total_sentry_connect_tries
Counts the total number of TCP connection attempts to sentinel nodes. Sentinels are nodes used to protect validators from direct attacks, acting as a proxy between the validator and the rest of the network. Unlike the signer_sentry_connect_tries metric, which counts consecutive attempts, this metric accumulates all attempts from a given starting point, without restarting with each successful connection. A high count on this metric may indicate frequent restarts of a validator or ongoing connection problems.

image