Phonological condition - adamb924/mortal-engine GitHub Wiki

The <phonological-condition> is used for regular expression matching for what precedes or follows the morpheme. “Phonological” be a slightly misleading term here. (Phonological conditions could be represented with tag match conditions instead.) The kernel is that you're doing regular expression matching on the parsed string.

(At the risk of stating the obvious: if you need to match both what precedes and what follows the morpheme, you can use two <phonological-condition> tags.)

We'll consider the example in examples/12-Phonological-Allomorphy.xml. Here we have a simple classifier prefix that is [e] in ATR words, and [ɛ] in non-ATR words. As we expect, there is “Classifier” morpheme with those two allomorphs. And since this is a page about the <phonological-condition> tag, we're not surprised to see those tags at work.

<?xml version="1.0" encoding="UTF-8"?>
<morphology
    xmlns="https://www.adambaker.org/mortal-engine"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="https://www.adambaker.org/mortal-engine morphology.xsd">
    <writing-systems src="writing-systems.xml"/>
    <model label="Nouns">
        <morpheme label="Classifier">
            <allomorph>
                <phonological-condition type="following">
                     <match-expression lang="wk-LA">.*[ieo]</match-expression>
                </phonological-condition>
                <form lang="wk-LA">e</form>
            </allomorph>
            <allomorph>
                <phonological-condition type="following">
                     <match-expression lang="wk-LA">.*[ɪɛɔ]</match-expression>
                </phonological-condition>
                <form lang="wk-LA">ɛ</form>
            </allomorph>
        </morpheme>
        <stem-list label="Stem">
            <filename>12-stems.xml</filename>
            <matching-tag>noun</matching-tag>
        </stem-list>
    </model>
</morphology>

To pull out a snippet:

<allomorph>
    <phonological-condition type="following">
         <match-expression lang="wk-LA">.*[ieo]</match-expression>
    </phonological-condition>
    <form lang="wk-LA">e</form>
</allomorph>

The <phonological-condition> tag has the attribute type="following". That means that the match expression is going to match what comes after the morpheme in the parsing. (The alternative is type="preceding".)

This is perhaps not the best table, but for these inputs, these are the strings the <phonological-condition> tag are going to match:

Input	Morpheme	What it matches with type="following"
esit	e	sit
ɛben	ɛ	ben
ekɛm	ɛ	kɛm

Two things about <match-expression>:

It is specific to a writing system. Of course it has to be: different writing systems can use different letters.
It is a PERL-compatible regular expression. (For which see the Qt documentation or the Perl documentation.)

So the regular expression above .*[ieo] means, “any number of characters, followed by one of i, e, or o”.

Suppose you wanted the very next letter to be a vowel. The regular expression for that would be: ^[aeiou] (or whatever counts as a vowel).

I acknowledge that this is not much of a tutorial. There are lots of places you can learn about regular expressions.

To complete the example, we can see from the output below that we've implemented a basic ATR harmony system.

Classifiers are obligatory (i.e., no bare stems)
Success: The input sɪn (wk-LA) was rejected by the model, which is correct. 
Words with ATR vowels get an ATR classifier
Success: The input esit (wk-LA) was accepted by the model, which is correct. [Classifier][Stem]
Success: The input eben (wk-LA) was accepted by the model, which is correct. [Classifier][Stem]
Success: The input emot (wk-LA) was accepted by the model, which is correct. [Classifier][Stem]
Words with ATR vowels can't take a non-ATR classifier
Success: The input ɛsit (wk-LA) was rejected by the model, which is correct. 
Success: The input ɛben (wk-LA) was rejected by the model, which is correct. 
Success: The input ɛmot (wk-LA) was rejected by the model, which is correct. 
Non-ATR words take a non-ATR classifier
Success: The input ɛsɪn (wk-LA) was accepted by the model, which is correct. [Classifier][Stem]
Success: The input ɛkɛm (wk-LA) was accepted by the model, which is correct. [Classifier][Stem]
Success: The input ɛkɔŋ (wk-LA) was accepted by the model, which is correct. [Classifier][Stem]
Non-ATR words cannot take an ATR classifier
Success: The input esɪn (wk-LA) was rejected by the model, which is correct. 
Success: The input ekɛm (wk-LA) was rejected by the model, which is correct. 
Success: The input ekɔŋ (wk-LA) was rejected by the model, which is correct.

Tag conditions vs. phonological conditions

As noted above, you can also use tag match conditions to make phonological distinctions. If you use tags, then you have to add them manually. In general, if you can make a phonological condition work, then that will be easier, because it's automatic. But there may be a bunch of exceptions in the language, or the orthography of a language may not record all of the phonetic information you need to distinguish between words. In that case, it's necessary to use a tag match condition instead of a phonological condition.

Phonological condition - adamb924/mortal-engine GitHub Wiki

Tag conditions vs. phonological conditions

⚠️ **GitHub.com Fallback** ⚠️

⚠️ GitHub.com Fallback ⚠️