ADR‐001: Change in Truncated Chat Key Display Format - status-im/status-wiki GitHub Wiki
Status
Accepted
Context
The current user ID key format is constructed as follows:
- A fixed prefix
zQ3
- The first 2 characters and last 3 characters of a base58-encoded 256-bit ECDSA public key (
sh...abc
) - Combined to give
zQ3sh...abc
.
This results in a truncated representation that provides 5 × log2(58) ≈ 29.285
bits of entropy. Given the possible combinations of 58^5 = 550,731,776
, this initially seems sufficient.
When accounting for the characters being in specific positions, we get slightly better possible combinations: 58^2 × 58^3 = 3,364 × 195,112 = 656,356,768
.
However, statistical analysis using the Birthday Paradox demonstrates a high likelihood of collision once the number of generated keys surpasses 200,000.
Using the Poisson approximation:
$$ p \approx 1 - e^{-\frac{k^2}{2N}} $$
- At
k = 200,000
andN = 656,356,768
, we have an 83% probability of collision. - At
k = 300,000
, the probability of collision increases to 98%.
This means that within a relatively small number of keys, collisions are almost guaranteed, making the current format unsuitable for large-scale deployment.
Decision
To mitigate the collision risk, we are increasing the number of exposed characters in the truncated key representation. The new format:
- Retains the prefix
zQ3
- Expands to include the first 5 characters and last 5 characters of the base58-encoded key (
sh3g5...abc72
) - Combined to give
zQ3sh3g5...abc72
.
This increases entropy to 58^5 × 58^5 = 4.3080421 × 10^{17}
possible combinations. Applying the same statistical model:
- With 100,000,000 keys, the probability of collision is 1.15%.
- With 1,000,000,000 keys, the probability of collision rises to 68.67%.
This greatly reduces the likelihood of collision under our expected key generation rates.
Consequences
Positive
- Improved Uniqueness: The increased entropy significantly lowers the probability of public key truncation collisions.
- Scalability: Ensures that the format remains viable for large-scale deployment.
Negative
- Longer Identifiers: The new format increases user ID key length, which may have UI/UX implications.
Alternatives Considered
- Maintaining the Existing Format
- Rejected due to the high probability of collisions at scale.
- Using a Completely Different Encoding Scheme
- Would require extensive migration and potential compatibility issues.
- Dynamically Adjusting Key Length
- Adds complexity without clear long-term benefits.
Next Steps
- Implement the new format in all relevant systems.
- Update UI components to accommodate the longer key representation.
- Monitor system performance and user feedback post-implementation.
TL;DR
The user ID key format is changing from:
- Old:
zQ3sh...abc
- New:
zQ3sh3g5...abc72
This improves uniqueness, reduces the probability of collisions, and ensures scalability for larger user bases.