Dictionary Totals - ligos/readablepassphrasegenerator GitHub Wiki

Why Two Different Dictionary Totals?

Word Count

Surely there's only one way to count the number of words in the dictionary!?! Right?

Well, as with all things statistical, its not that simple.

In earlier versions of the Readable Passphrase Generator, it reported the total number of words in the dictionary. Or more correctly, the total root words. That's the first number. This is what you'd find if you counted the number of entries in a real dictionary.

From version 0.12, it also counts the number of unique forms of words as well. This is what you'd count if you counted the number of words in a word list (like many password crackers use).

What's the difference? The former counts run, running, will run as one word, because it's all the same root: run. The latter counts 3, because each are different. There are many more unique forms than roots.

Of course, some forms of run are the same: the singular and plural future tense forms are the same (will run), while the singular and plural past tense are different (was running vs were running). Words need to be identical down to the letter to be identical.

And sometimes a word will appear as a verb and as an adjective and perhaps even as a noun too. The former will count one for each part of speech, the latter counts one in total because they're all identical.

Fake Words

From version 1.3.0, a number of fake words are included in the dictionary (taken from ThisWordDoesNotExist.com). You can exclude these fake words when building a passphrase using the configuration screen in the KeePass plugin, or the --noFake option with the console app. As the fake words are excluded, this reduces the size of the dictionary and the number of possible combinations.