Tag: morpheme - adamb924/mortal-engine GitHub Wiki

The basics

At the least, a <morpheme> tag has an <allomorph> tag, which then contains <form> tags. The snippet below is from examples/02-Affixes.xml.

<morpheme label="Plural">
    <allomorph>
        <form lang="wk-AR">لار</form>
        <form lang="wk-LA">lar</form>
    </allomorph>
</morpheme>

This morpheme has the label “Plural”, and has two forms associated with it, for two different writing systems. Every <allomorph> tag can only have one <form> tag for each writing system.

During parsing, a morpheme is parsed if any of its allomorphs match. In addition, Mortal Engine will try to parse it with all of the matching allomorphs, to see which ones succeed.

As a corollary of the above statement, the following snippet will always produce the same behavior as the preceding one:

<morpheme label="Plural">
    <allomorph>
        <form lang="wk-AR">لار</form>
    </allomorph>
    <allomorph>
        <form lang="wk-LA">lar</form>
    </allomorph>
</morpheme>

When allomorphy works differently in different writing systems, it's often convenient to have separate <allomorph> tags for different writing systems.

Missing forms

If an <allomorph> does not have a <form> for a particular writing system, then it's as if the allomorph does not exist for that writing system.

In the snippet below (from examples/02a-Missing-Form.xml), if you're parsing an input with the “wk-AR” writing system, it's as if the “Plural” morpheme isn't there at all. It will never match in a parsing.

<morpheme label="Plural">
    <allomorph>
        <form lang="wk-LA">lar</form>
    </allomorph>
</morpheme>

In fact, if the <morpheme> is required (i.e, it lacks the <optional/> tag), then the parsing will never succeed for the “wk-AR” writing system: you're requiring a node to match that can never match. This is shown in examples/02b-Missing-Form-2.xml.

Null (empty) morphemes

Mortal Engine will accept null (or empty) forms. If you want there to be a null morpheme for a particular writing system, you must include the <form> tag for that writing system, and the tag must be empty. In the example below, the “Plural” morpheme is a null morpheme for the “wk-LA” writing system:

<morpheme label="Plural">
    <allomorph>
        <form lang="wk-AR">لار</form>
        <form lang="wk-LA"></form>
    </allomorph>
</morpheme>

Consequently, if we have a noun stem “ata”, then the input “ata” will parse as both [Stem] and [Stem][Plural]. (The morphology has no idea, of course, whether the Plural morpheme is there or not, since it's a null morpheme anyway.)

This is crucially different from leaving out the <form> tag. In the example below, the “Plural” morpheme will never be parsed in the “wk-LA” writing system. It's as if the morpheme doesn't exist for that writing system.

<morpheme label="Plural">
    <allomorph>
        <form lang="wk-AR">لار</form>
    </allomorph>
</morpheme>

If there are null morphemes in your model, that comes with a performance cost. It is not a huge cost, but you should avoid null morphemes if you can. (Indeed, for purely linguistic reasons, you should avoid null morphemes if you can.)

Allomorphy & the need for constraints

Suppose we have a plural suffix with forms “lar” and “ler”, which occur respectively in back and front phonological contexts. You might be tempted to try something like this:

<morpheme label="Plural">
    <optional/>
    <allomorph>
        <form lang="wk-LA">ler</form>
    </allomorph>
    <allomorph>
        <form lang="wk-LA">lar</form>
    </allomorph>
</morpheme>

But this will not produce the desired results. The grammar will accept “atalar”, but it will also accept “ataler”. It will accept “gözler”, but it will also accept “gözlar”.

What is needed is a way to specify which allomorphs go where, a way to constrain where each allomorph can occur. This is a big topic, so it gets its own page.