What are Identicons? The History

In general, identicons are unique and recognizable images that represent data. They're nothing new, you've probably seen them in used as avatars in WordPress blogs, on GitHub and StackOverflow. And they're seeing usage in cryptocurrency as well.

The inspiration of the identicon has its roots in quilting, in the form of the 9-block pattern. Essentially, this boils down to symmetrical repeating geometric patterns. Humans have been using similar patterns for years, from the ancient Greeks to the Aztecs. It is perhaps innate in human nature to appreciate patterns. In fact, humans often see patterns where there are none, in a phenomenon known as Pareidolia. This is most often the case where a person recognizes human facial features in mundane things such as an outlet, or in the clouds.

More examples: the Rorschach test is an interesting example where Pareidolia is combined with symmetry (a horizontal reflection of the ink blot). Kaleidoscopes and fractals are other examples of repeating patterns.

Exploiting the way humans perceive patterns seems to be key to creating these types of identifiable icons. But the problem is doing so algorithmically and in such a way that a large number of unique icons are possible. Typical identicons appear to have terrible collision statistics, and in reality are only useful for differentiating small numbers of information at a time. Raw hash digests such as 63a8fb2aec6f5a1 can have significantly more possible values than the images used to represent them. My goal is to reduce this limitation as much as possible without sacrificing visual clarity.

The rest of this article will focus on analyzing the structure and mathematical possibilities of the 9-block pattern, including derivative designs.

Analysis

Don Park is largely credited with inventing the identicon in January 2007 when he added icons to user comments on his blog. Based on an earlier concept he had for avoiding phish attacks (2004), called "phishmarks", it displayed a unique icon based on a user's IP address next to the comment. This in effect allowed viewers to identify comments by the same user. Don Park was in fact heavily inspired by Nine Block, a 9x9 pattern generator made in 2002 by Jared Tarbell.

With that out of the way, here is a simplification of the classic identicon, designed by hand:

It is composed of two types of triangles and a diamond. A blank square and a filled square could be considered other possible shapes, making a total of 5 shapes. There are only 3 unique shapes that are repeated. If you ignore starting orientation, this means only 125 possible patterns (5³). If you consider all rotations of the non-symmetric triangles, the total is 11, making 1,331 possible patterns (11³). If we limit the center to only symmetrical shapes, it is instead 363 possible patterns (11×11×3). With color, there are more possibilities, however the same 363 base patterns would be constantly reused. This, I think is the main flaw of identicons.

Nine Block (2002)

To confirm this, let's look at Nine Block, the original inspiration for identicons:

Jared defined 16 base shapes to be used in total. 4 are symmetrical, and 12 have 4 alternate orientations. The center can only use the 4 symmetrical shapes. This means there are actually 52 possible shapes, allowing for 10,816 unique patterns (52×52×4). A big improvement from my example.

However Jared's actual implementation (in the screenshot) has only 34 possible shapes (2+8×4). So the screenshot example has only 2,312 unique patterns (34×34×2). If you look closely, you'll even see repeated patterns. See the following list of shapes:

The flash implementation only uses 10 shapes, and one of them doesn't even exist Jared's original 16 shapes.

Jdenticon

Let's now look at jdenticon briefly, a modern variation of the design:

While this one is 4x4 instead of 3x3, the principle is the same, there are only 3 unique shapes on screen. But this time, there are 14 unique center shapes, and only 10 corner and side shapes (2+2×4). This results in a total of 1,400 unique patterns (10×10×14). Again we are ignoring color.

Theoretically, if we combine the 14 center shapes with the 52 shapes of Jared's code, there is potentially 37,856 patterns.

Don Park (2004, 2007)

Time for Don Park's contribution. Starting with the stuff from 2004, sadly limited with info:

According to Don Park's technical info, there are 3 bits for the center piece and 7 bits each for the corner and side shape. That would seem to indicate 8 center shapes and 127 shapes for the other two. That would indicate 129,032 unique patterns (127×127×8). Seems promising, sure. But the lack of documentation prevents me from understanding the implementation.

What I can verify is that all but one of Jared's 16-shape set is present in the above example. It also uses a new development: Inverted shapes. Essentially a new shape is created from the negative space of the original shape, and the original shape becomes negative space. This technique may have a caveat that produces duplicate shapes. But more on that later.

I've counted 6 shapes for center pieces. Indeed, it takes 3 bits to store the number 6. I've also counted 11 symmetric shapes. and 4 non-symmetric shapes. 15 in total. The 11 symmetric shapes have inversions. And 2 of the non-symmetric have inversions.

So (1 + 1 + 1×2 + 1×2) + (11×2) × 4 = 94. It just so happens that 94 also requires 7 bits. Based on this, we can estimate that there are 53,016 possible unique patterns (94×94×6) in Don Park's 2004 implementation.

The 2007 implementation is unchanged for the most part.

But we now know what happened to the missing shape:

Patch 4 is no longer a rectangle, but a 'bowtie'. And patch 12's initial state changed position. Aside from that, it is identical to Jared's code. The use of inversions is a good idea, but it is poorly executed. For example, Patch 2 rotated 180 degrees and inverted is equal to Patch 2 rotated 0 degrees. This may not be the only example.

What needs to be done is each patch needs to have defined properties instead of every one being able to be inverted and flipped, etc. And a master list needs to be generated based on the properties (e.g. shape can be inverted, shape can be rotated, etc.). This would eliminate the possibility of collisions due to symmetry issues.

{INSERT FINAL CALCULATION HERE}