Make Your Own Dictionary - ligos/readablepassphrasegenerator GitHub Wiki

Make Your Own Dictionary

A word of warning

This is very boring, very time consuming work. It will take you upwards of 40 hours work to create a dictionary with enough words to be usable (I speak from experience).

You must select every word by hand, enter XML for every form that word has in english (plurals, tenses, etc), and then correct all the mistakes you made.

You cannot just grab a word list from a password cracker and automatically import it and expect good (or even bad) results.

How to make your own dictionary

(Note: there is now an alternate way to make a custom dictionary using a Pluggable Dictionary Loader)

OK, if I haven't scared you away yet, here's how the dictionary works.

It's a pretty simple XML file which lists words according to parts of speech (noun, verb, etc) with each different form (plural and tense). The file can be plain UTF8 XML or UTF8 XML compressed with gzip. By default, the generator will look for a dictionary.xml (or dictionary.gz) in the current working directory or assembly entry point.

Each part of speech is a single XML element and represented as a Word object in code. The main ones you'll be interested in are nouns, adjectives, verbs and adverbs. Each of these are stored in a separate file in the code base and joined back together at compile time (its easier to append words that way).

The best way to understand the dictionary is to see it.

<dictionary schemaVersion="6" language="aa-bb" name="MyNewDictionary">
  <article definite="the" indefinite="a" indefiniteBeforeVowel="an" />
  <demonstrative singular="this" plural="these" />
  <demonstrative singular="that" plural="those" />
  <personalPronoun singular="my" plural="our" />
  <personalPronoun singular="your" plural="your" />
  <indefinitePronoun singular="one" plural="ones" personal="true"/>
  <preposition value="above"/>
  <preposition value="across from"/>
  <numberRange start="1" end="9"/>
  <conjunction value="and" separates="nounsandphrases"/>
  <conjunction value="or" separates="nounsandphrases"/>
  <noun singular="waterway" plural="waterways"/>
  <noun plural="grog"/>
  <noun plural="pliers"/>
  <adjective value="downsized"/>
  <adverb value="ominously"/>
  <interrogative singular="why does" plural="why do" />
  <interrogative singular="how does" plural="how do" />
  <verb presentSingular="concocts" 
        pastSingular="concocted" 
        pastContinuousSingular="was concocting" 
        futureSingular="will concoct" 
        continuousSingular="is concocting" 
        perfectSingular="has concocted" 
        subjunctiveSingular="might concoct"
        presentPlural="concoct" 
        pastPlural="concocted" 
        pastContinuousPlural="were concocting" 
        futurePlural="will concoct" 
        continuousPlural="are concocting" 
        perfectPlural="have concocted" 
        subjunctivePlural="might concoct"/>
</dictionary>

As you can see, pronouns, demonstratives and nouns come in singular and plural forms. Nouns can come in either or both. Adjectives and adverbs have only a single form. And verbs are just horridly complicated (with up to 7 tense forms and singular / plural). Verbs should at least have present, past and future tenses (but the more forms the more combinations).

That's all there is to it!

Now you just need 5000 of them.

Note: while nouns, verbs, adjectives and adverbs will make up the bulk of any dictionary, you will need to include all the other parts of speech. The generator assumes a few of each kind, and will fail with strange errors if there are none.