UUID - rFronteddu/general_wiki GitHub Wiki
UUID
Universally Unique Identifiers, also known as UUIDs, are designed to allow developers to generate unique IDs guaranteeing uniqueness across systems.
In 2024, there are five official version of UUIDs.
UUID 1
UUIDv1 is known as time-based UUID and can be broken down as follows:
This timestamp uses Oct 10, 1568 (time the Gregorian calendar started to be widely used) as its base. The embedded timestamp grows in 100ns increments from this date, which is then used to set time_low, time_mid, time_hi segments of the UUID. The third segment of the UUID contains the version as well as time_hi and occupies the first character of that segment. Note that this is true for all version for UUIDs.
The reserved portions is also known as the variant of the UUID, which determines how the bits within the UUID are used. The last segment is the node, which is the unique address of the system generating the UUID.
UUIDv2
Version 2 has the low_time replaced with a POSIX user ID. This version is rarely used because since low_time is where much of the variability of UUIDs reside, replacing this segment increased the chance of collision.
UUIDv3 and v5
Version 3 and 5 are similar, they are deterministic so that using the same information, the same UUID can be generated. These implementations use a namespace (itself a UUID) and a name. These values are run through HASHING to generate a 128-bit value that can be represented as a UUID.
The key difference between 3 and 5 is that the first uses MD5 hashing while the second uses SHA1.
UUIDv4
Version 4 is known as the random variant because the value of the UUID is almost entirely random excluding the part of the UUID that specifies the version (which will always be 4).
UUIDv6
Version 6 is nearly identical to 1 but the bits used to capture the timestamp are flipped. The main reason is to create a version compatible with v1 but allowing these values to be more sortable since the most significant portion of the timestamp is upfront.
UUIDv7
Version 7 is also a time-based UUID variant that integrates the more commonly used Unix Epoch timestamp instead of the Gregorian date. The other key difference is that the node is replaced with randomness, making these UUIDs less trackable back to their source.
UUIDv8
Version 8 is the latest that permits vendor-specific implementations while adhering to RFC standards. The only requirements is that the version is specified in the usual position.
UUIDs and MySQL
There are several tradeoffs to using a UUID as a primary key instead of an auto-incrementing integer.
Insert Performance
MySQL has indexes take the form of a B+ Tree, adding random index can be very inefficient due to page splitting and rebalancing.
Higher Storage Utilization
An autoincrementing index will consume 32 bits of storage per value, a compacted UUID will use 128 bits (4x consumption). A readable one much more, a CHAR(36) would consume 288 bits per UUID (9x).
In addiction, secondary index will consume more space too since they use the primary key as a pointer to the actual row.
There are also DBs that assume pages will be incremented predictably. InnoDB will fill pages to about 94% before creating a new page. When random, the space for each page can be as low as 54%.