HunspellXML Format (Affixes) - TrnsltLife/HunspellXML GitHub Wiki

HunspellXML   HunspellXML Format > AffixFile > Affixes


<affixes>...</affixes>

The <affixes>...</affixes> element contains all the morphological rules for the dictionary governing how prefixes and suffixes may attach to different classes of words. These rules are separated into two types: <prefix> rules and <suffix> rules.

<affixes>
<prefix flag="[flag]" cross="[boolean]">
	<rule add="[text]" remove="[text]" where="[regex]" morph="[morph]" combineFlags="[list of flags]" />
	<rule ... />
	<rule ... />
</prefix>
<prefix flag="[flag]" cross="[boolean]">
	<multiply>
		<group>
			<rule ... />
			<rule ... />
			<rule ... />
		</group>
		<group>
			<rule ... />
			<rule ... />
			<rule ... />
			<rule ... />
		</group>
	</multiply>
</prefix>
<suffix flag="[flag]" cross="[boolean]">
	<rule ... />
	<rule ... />
	<rule ... />
	<rule ... />
</suffix>
</affixes>

<prefix>...</prefix> and <suffix>...</suffix>

Attributes:

  • flag [flag] - The flag that identifies this set of prefix or suffix rules
  • cross [boolean] optional - Indicates whether this affix can combine with the opposite kind of affix (i.e. prefix with suffix or vice versa)

Each <prefix> or <suffix> has a required flag attribute that identifies the group of prefix or suffix rules that are grouped inside it. The words in the dictionary may contain a reference to this flag, which indicates that the word may combine with the prefixes or suffixes defined by the rules inside the <prefix> or <suffix> element.

For example, given a word in the dictionary:

<dictionaryFile>
	<words flags="NS"><!--these words will combine with the NS suffix rules below-->
	dog
	cat
	ostrich
	fish
	octopus
	fox
	</words>
</dictionaryFile>

and given the following affix rules:

<affixes>
	<suffix flag="NS">
		<rule add="s" where="[^hsx]" /> <!--add 's' to words not ending in 'h', 's', or 'x'-->
		<rule add="es" where="[hsx]" /> <!--add 'es' to words ending in 'h', 's', or 'x'-->
	</suffix>
</affixes>

The set of noun pluralization suffix rules identified by the flag NS allows the words in the dictionary signed with the NS flag to add -s and -es suffixes, so that all of the following words would be recognized by Hunspell:

dog	dogs
cat	cats
ostrich	ostriches
fish	fishes
octopus	octopuses
fox	foxes

Note that since the word 'dog' does not end in 'h', 's', or 'x', it will never combine with the second NS rule to produce doges.

<rule .../>

Attributes:

  • add [text] optional

  • remove [text] optional.

  • where [regex] optional

  • combineFlags [list of flags] optional

  • morph [text] optional

  • where: This is what Hunspell refers to as the "condition". Under what conditions does this rule match? If you omit the attribute or specify where=".", then the rule will always match, and the remove and add rules will be applied to come up with another valid word-form. Otherwise, you can specify a simplified regular expression to decide what matches.

    • dot (.) matches any character. A prefix rule with where=".n" could match any one character followed by an "n", e.g. "an", "en", "in", "on", "sn", "un", etc.
    • a set of characters between [square brackets] matches any single one of those characters. So a prefix rule with where="[iu]n" would match words starting with "in" or "un" but not "on" or anything else.
    • a set of characters between [^square brackets] where the first character is a caret (^) matches any one character except for the characters inside the brackets. So a suffix rule with where="[^hsx]" would match a word ending in any character besides "h", "s", and "x".
  • remove: If a rule matches the where condition, the remove rule is applied next before the add rule is applied. If you omit the attribute or put a value of remove="" or remove="0", nothing will be removed. Otherwise, the characters you specify will be removed from the beginning of the word (prefix rules) or from the end of the word (suffix rules). This does not use regular expressions. Specify a single affix to add onto the word-form.

  • add: This indicates what text should be added to the end (for suffixes) or to the beginning (for prefixes) of the word.

  • combineFlags: If the rule matches, the combineFlags attribute indicates what other affixation rules may apply after this one. If you omit this attribute or leave its value blank, no other rules will apply and the currently applied rule will be a final word-form with no additional prefixes or suffixes possible.

  • morph: Specify the morphological information that should be attached to this word when the where condition matches.

Examples

Here is a set of prefix rules that differentiate between different past tense spellings for regular English verbs:

<prefix flag="ED">
	<rule where="e" add="d" morph="is:past"/>
	<rule where="[^aeiou]y" remove="y" add="ied" morph="is:past"/>
	<rule where="[^ey]" remove="0" add="ed" morph="is:past"/>
	<rule where="[aeiou]y" remove="" add="ed" morph="is:past"/>
</prefix>

<rule where="e" add="d" morph="is:past"/> In this example, the first rule matches only words that end in "e" (where="e"). Nothing gets removed (there is no remove attribute) and a "d" gets added to the end, along with the morphological tag "is:past". If the original word was "shade", this suffix rule would result in "shaded".

<rule where="[^aeiou]y" remove="y" add="ied" morph="is:past"/> The next rule matches only words that end in "y" preceded by any letter but "aeiou". Or put another way, words that end in [consonant]+y such as "rally" and "bully". Remember, [aeiou] is the regular expression for "any one letter in the list a,e,i,o,u". But in this cases, with the addition of a caret, [^aeiou] is the regular expression code for "any one letter except for a,e,i,o,u". The removal rule remove="y" means "remove the last character if it is 'y'". That would transform "rally" and "bully" into "rall" and "bull". The rule add="ied" then adds "ied" onto the end, resulting in "rallied" and "bullied".

<rule where="[^ey]" remove="0" add="ed" morph="is:past"/> The first two rules have covered cases where the word ends in "e" and in "y". This rule matches all words that don't end in "e" or "y", using the matching rule where="[^ey]". For words that match, nothing is removed remove="0", and "ed" is added. So a verb like "post" would match and be transformed into "posted".

<rule where="[aeiou]y" remove="" add="ed" morph="is:past"/> The final rule, in contrast to the first rule, looks for matches where the word ends in a vowel (a,e,i,o,u) + y. Words that match include "play", "buoy", "prey", etc. The remove="" rule means nothing is removed from the end of the word. The add rule adds "ed", resulting in words like "played", "buoyed", "preyed", etc.

Morphological Tags

Morphological description fields should consist of a two-letter code followed by a colon : followed by a text label.

  • Multiple morphological description fields may be used. They are separated from each other by spaces.
  • Morphological information is used for parsing and is not needed for spell checking.
  • The morphological field codes that Hunspell defines are:
    • ph: Phonetic
    • st: Stem
    • al: Allomorph(s)
    • is: Inflectional suffix(es)
    • ts: Terminal suffix(es)
    • sp: Surface prefix
    • pa: Parts of the compound words
    • dp: Derivational prefix
    • ip: Inflectional prefix
    • tp: Terminal prefix

<multiply><group>...</group></multiply>

TODO

For the time being, see the Lingala Verb Example for examples on the use of the <multiply><group>...</group></multiply> elements.

⚠️ **GitHub.com Fallback** ⚠️