ADR‐001: Change in Truncated Chat Key Display Format - status-im/status-wiki GitHub Wiki

Status

Accepted

Context

The current user ID key format is constructed as follows:

  • A fixed prefix zQ3
  • The first 2 characters and last 3 characters of a base58-encoded 256-bit ECDSA public key (sh...abc)
  • Combined to give zQ3sh...abc.

This results in a truncated representation that provides 5 × log2(58) ≈ 29.285 bits of entropy. Given the possible combinations of 58^5 = 550,731,776, this initially seems sufficient.

When accounting for the characters being in specific positions, we get slightly better possible combinations: 58^2 × 58^3 = 3,364 × 195,112 = 656,356,768.

However, statistical analysis using the Birthday Paradox demonstrates a high likelihood of collision once the number of generated keys surpasses 200,000.

Using the Poisson approximation:

$$ p \approx 1 - e^{-\frac{k^2}{2N}} $$

  • At k = 200,000 and N = 656,356,768, we have an 83% probability of collision.
  • At k = 300,000, the probability of collision increases to 98%.

This means that within a relatively small number of keys, collisions are almost guaranteed, making the current format unsuitable for large-scale deployment.

Decision

To mitigate the collision risk, we are increasing the number of exposed characters in the truncated key representation. The new format:

  • Retains the prefix zQ3
  • Expands to include the first 5 characters and last 5 characters of the base58-encoded key (sh3g5...abc72)
  • Combined to give zQ3sh3g5...abc72.

This increases entropy to 58^5 × 58^5 = 4.3080421 × 10^{17} possible combinations. Applying the same statistical model:

  • With 100,000,000 keys, the probability of collision is 1.15%.
  • With 1,000,000,000 keys, the probability of collision rises to 68.67%.

This greatly reduces the likelihood of collision under our expected key generation rates.

Consequences

Positive

  • Improved Uniqueness: The increased entropy significantly lowers the probability of public key truncation collisions.
  • Scalability: Ensures that the format remains viable for large-scale deployment.

Negative

  • Longer Identifiers: The new format increases user ID key length, which may have UI/UX implications.

Alternatives Considered

  1. Maintaining the Existing Format
    • Rejected due to the high probability of collisions at scale.
  2. Using a Completely Different Encoding Scheme
    • Would require extensive migration and potential compatibility issues.
  3. Dynamically Adjusting Key Length
    • Adds complexity without clear long-term benefits.

Next Steps

  • Implement the new format in all relevant systems.
  • Update UI components to accommodate the longer key representation.
  • Monitor system performance and user feedback post-implementation.

TL;DR

The user ID key format is changing from:

  • Old: zQ3sh...abc
  • New: zQ3sh3g5...abc72

This improves uniqueness, reduces the probability of collisions, and ensures scalability for larger user bases.