Tutorial: Game Dialogue Tags

<< Home

Build a complete visual novel / game dialogue DSL from scratch. By the end you will have color, shake, wait, speed, speaker, and choice tags -- plus a typewriter renderer that consumes the token tree.

The Scenario

You are building a visual novel engine. Text is written by writers and designers, not programmers. They need a small markup language embedded in their script files:

Tag	Form	Purpose
`$$color(red \| This text is red)$$`	inline, pipe	Color a span of text
`$$shake(This text shakes)$$`	inline	Apply a shake animation to text
`$$wait(500)$$`	inline	Pause the typewriter for 500 ms (no visible output)
`$$speed(50)$$`	inline	Change typewriter speed to 50 ms/char
`$$speaker(Alice)* ... *end$$`	block	Attribute a block of dialogue to a character

Sample dialogue script

Here is what a writer's script file looks like:

$$speaker(Alice)*
Hello! $$color(blue | Nice to meet you)$$.
$$wait(300)$$
Have you seen the $$shake(strange creature)$$ in the forest?
*end$$

$$speaker(Bob)*
$$speed(30)$$Yes... it was $$color(red | terrifying)$$.
*end$$

Two speakers, nested inline tags inside block tags, a pause between sentences, and a speed change. The rest of this tutorial shows how to parse that script and feed the result into a typewriter renderer.

Step 1: Set Up the Parser

Design decisions first

Before writing code, decide which DSL form each tag should use:

Tag	Form	Why
color	inline with pipe	It wraps a span of text, and it needs an argument (the color name). Pipe separates the argument from the content: `$$color(red \| text)$$`.
shake	inline, no pipe	It wraps text, but needs no argument -- the tag name itself is the effect. `$$shake(text)$$`.
wait	inline	It appears in the flow of text (between sentences, inside dialogue), but it produces no visible content. Its `value` will be `""`. The argument (delay in ms) is the entire inline content.
speed	inline	Same pattern as wait: a "command" tag that changes state. No visible output, argument is the content.
speaker	block	It contains multiple lines of dialogue with nested tags. Block form gives writers a natural start/end structure and supports recursive parsing of the body.

The distinction between "content" tags (color, shake) and "command" tags (wait, speed) is important. Content tags wrap children and render them with a visual effect. Command tags produce no visible output -- they inject a side-effect into the rendering pipeline. Both are inline because they appear within the text flow.

Handler implementations

import {
    createParser,
    createPipeHandlers,
    materializeTextTokens,
    extractText,
    declareMultilineTags,
    type TextToken,
    type DslContext,
    type TagHandler,
    type TokenDraft,
} from "yume-dsl-rich-text";

// ── color ──────────────────────────────────────────────
// Inline with pipe: $$color(red | This text is red)$$
//
// Pipe segment 0 → color name (plain string)
// Pipe segment 1+ → the content to colorize (token tree, may contain nested tags)
//
// We use createPipeHandlers so the pipe splitting is handled for us.
// materializedTailTokens(1) collects everything after the first pipe
// into a single token array, which means the writer can use pipes
// inside the colored text if they escape them.

const pipeTags = createPipeHandlers({
    color: {
        inline: (args, ctx) => ({
            type: "color",
            color: args.text(0),                      // "red", "blue", "#ff0"
            value: args.materializedTailTokens(1),     // the colored content
        }),
    },
});

// ── shake ──────────────────────────────────────────────
// Inline, no pipe: $$shake(This text shakes)$$
//
// The entire inline content is the text to animate.
// materializeTextTokens resolves escape sequences in text leaves
// while preserving nested tag structure (e.g., $$shake($$color(red | wow)$$)$$).

const shakeHandler: Record<string, TagHandler> = {
    shake: {
        inline: (tokens, ctx) => ({
            type: "shake",
            value: materializeTextTokens(tokens, ctx),
        }),
    },
};

// ── wait ───────────────────────────────────────────────
// Inline command: $$wait(500)$$
//
// No visible output — value is "".
// The number inside the parens is the delay in milliseconds.
// extractText pulls the raw text content from the token array,
// then we parse it as an integer.
//
// Why inline and not raw? Because wait appears *inside* dialogue text,
// between sentences. It sits in the same text flow as words and other
// inline tags. Raw form would require its own line pair, which is
// awkward for a tiny command embedded in a paragraph.

const waitHandler: Record<string, TagHandler> = {
    wait: {
        inline: (tokens, ctx) => {
            const ms = parseInt(extractText(tokens), 10) || 0;
            return { type: "wait", delay: ms, value: "" };
        },
    },
};

// ── speed ──────────────────────────────────────────────
// Inline command: $$speed(50)$$
//
// Same pattern as wait: inline command, no visible output.
// The number is the new typewriter delay per character in ms.

const speedHandler: Record<string, TagHandler> = {
    speed: {
        inline: (tokens, ctx) => {
            const ms = parseInt(extractText(tokens), 10) || 0;
            return { type: "speed", delay: ms, value: "" };
        },
    },
};

// ── speaker ────────────────────────────────────────────
// Block form: $$speaker(Alice)*\n...\n*end$$
//
// The arg is the speaker's name.
// The block body is recursively parsed, so the writer can use
// color, shake, wait, speed — any inline tag — inside the dialogue.
//
// Why block and not inline? Because dialogue spans multiple lines.
// The block form gives a clear visual boundary that writers understand:
//   $$speaker(Alice)*
//   ...lines of dialogue...
//   *end$$

const speakerHandler: Record<string, TagHandler> = {
    speaker: {
        block: (arg, content, ctx) => ({
            type: "speaker",
            name: arg ?? "???",    // fallback if writer forgets the name
            value: content,         // recursively parsed dialogue body
        }),
    },
};

// ── Assemble the parser ────────────────────────────────

const dsl = createParser({
    handlers: {
        ...pipeTags,
        ...shakeHandler,
        ...waitHandler,
        ...speedHandler,
        ...speakerHandler,
    },
    blockTags: declareMultilineTags(["speaker"]),
});

Why `declareMultilineTags`?

speaker is a block-level container tag. In the DSL, the block form is written as:

$$speaker(Alice)*
Hello!
*end$$

Authors naturally place )* and *end$$ on their own lines, but this means the raw content becomes "\nHello!\n" — with a boundary line break at each end. Without normalization, the rendered dialogue box would show an extra blank line above and below the content.

declareMultilineTags(["speaker"]) tells the parser to strip exactly one line break at each boundary in block / raw forms, so content starts cleanly at the first actual line and ends cleanly at the last.

In most cases, the parser auto-derives this from handlers that have block / raw methods. Declaring manually makes the intent explicit and covers situations where auto-derivation is insufficient. See Handler Helpers — declareMultilineTags for the full explanation.

Step 2: Parse the Script

Feed the sample dialogue into the parser:

const script = `$$speaker(Alice)*
Hello! $$color(blue | Nice to meet you)$$.
$$wait(300)$$
Have you seen the $$shake(strange creature)$$ in the forest?
*end$$

$$speaker(Bob)*
$$speed(30)$$Yes... it was $$color(red | terrifying)$$.
*end$$`;

const tokens = dsl.parse(script);

Token tree walkthrough

The result is an array of top-level tokens. Let us walk through them:

[
    // ── Token 0: Alice's dialogue block ──
    {
        type: "speaker",
        name: "Alice",
        id: "rt-...",
        value: [
            // Line 1: "Hello! " + color tag + "."
            {type: "text", value: "Hello! ", id: "rt-..."},
            {
                type: "color",
                color: "blue",
                id: "rt-...",
                value: [
                    {type: "text", value: "Nice to meet you", id: "rt-..."}
                ]
            },
            {type: "text", value: ".\n", id: "rt-..."},

            // Line 2: wait command (no visible text)
            {type: "wait", delay: 300, value: "", id: "rt-..."},

            // Line 3: text + shake tag + text
            {type: "text", value: "\nHave you seen the ", id: "rt-..."},
            {
                type: "shake",
                id: "rt-...",
                value: [
                    {type: "text", value: "strange creature", id: "rt-..."}
                ]
            },
            {type: "text", value: " in the forest?\n", id: "rt-..."},
        ]
    },

    // ── Between blocks: whitespace ──
    {type: "text", value: "\n\n", id: "rt-..."},

    // ── Token 1: Bob's dialogue block ──
    {
        type: "speaker",
        name: "Bob",
        id: "rt-...",
        value: [
            // speed command at the start of the line
            {type: "speed", delay: 30, value: "", id: "rt-..."},

            // "Yes... it was " + color tag + "."
            {type: "text", value: "Yes... it was ", id: "rt-..."},
            {
                type: "color",
                color: "red",
                id: "rt-...",
                value: [
                    {type: "text", value: "terrifying", id: "rt-..."}
                ]
            },
            {type: "text", value: ".\n", id: "rt-..."},
        ]
    },
]

Key observations:

speaker tokens have value: TextToken[] -- the recursively parsed dialogue body.
color and shake tokens wrap their children in value: TextToken[].
wait and speed tokens have value: "" -- they are commands, not content.
Newlines between lines appear as part of adjacent text tokens.
The whitespace between the two speaker blocks is a plain text token.

Step 3: Build a Typewriter Renderer

Now connect the token tree to a real rendering engine. This typewriter state machine walks the tree and produces a queue of timed rendering instructions.

Rendering instruction types

// A single instruction for the rendering engine
type RenderOp =
    | { kind: "char"; char: string; delay: number; styles: StyleStack }
    | { kind: "pause"; delay: number }
    | { kind: "speaker"; name: string }
    | { kind: "linebreak" };

// Style state: which effects are active at any point
interface StyleState {
    color: string | null;
    shake: boolean;
}

// Stack of active styles (for nesting: color inside shake, etc.)
type StyleStack = StyleState;

The renderer

import {walkTokens, type TextToken} from "yume-dsl-rich-text";

function buildRenderQueue(tokens: TextToken[]): RenderOp[] {
    const ops: RenderOp[] = [];
    let speed = 50; // default: 50ms per character

    // We need to track style context as we descend into the tree.
    // walkTokens visits depth-first pre-order, so we can maintain
    // a style stack by examining the parent chain.

    // Helper: compute current styles from the token's ancestor chain.
    // In a real engine you would maintain an explicit stack; here we
    // use a simplified approach for clarity.
    function getStyles(token: TextToken, parent: TextToken | null): StyleStack {
        const state: StyleState = {color: null, shake: false};

        // Walk up through ancestors by checking token context
        // For this tutorial, we read color/shake from the immediate parent
        if (parent) {
            if (parent.type === "color" && typeof parent.color === "string") {
                state.color = parent.color;
            }
            if (parent.type === "shake") {
                state.shake = true;
            }
        }
        return state;
    }

    walkTokens(tokens, {
        // ── Speaker: emit a speaker header ──
        speaker: (token) => {
            if (typeof token.name === "string") {
                ops.push({kind: "speaker", name: token.name});
            }
            // Children will be visited automatically by walkTokens
        },

        // ── Text: queue each character with the current speed ──
        text: (token, ctx) => {
            if (typeof token.value !== "string") return;
            const styles = getStyles(token, ctx.parent);

            for (const char of token.value) {
                if (char === "\n") {
                    ops.push({kind: "linebreak"});
                } else {
                    ops.push({kind: "char", char, delay: speed, styles});
                }
            }
        },

        // ── Wait: insert a pause ──
        wait: (token) => {
            const delay = typeof token.delay === "number" ? token.delay : 0;
            ops.push({kind: "pause", delay});
        },

        // ── Speed: change the character delay ──
        speed: (token) => {
            const newSpeed = typeof token.delay === "number" ? token.delay : 50;
            speed = newSpeed;
        },

        // color and shake don't need their own visitors —
        // their effect is picked up by the text visitor via getStyles().
    });

    return ops;
}

Running the renderer

const queue = buildRenderQueue(tokens);

// Example: play the queue in a browser
async function play(queue: RenderOp[]) {
    for (const op of queue) {
        switch (op.kind) {
            case "speaker":
                // Show speaker name plate
                showSpeakerName(op.name);
                break;

            case "char":
                // Append one character with styles
                appendChar(op.char, op.styles);
                await sleep(op.delay);
                break;

            case "pause":
                // Freeze the typewriter
                await sleep(op.delay);
                break;

            case "linebreak":
                appendLineBreak();
                break;
        }
    }
}

function sleep(ms: number): Promise<void> {
    return new Promise((resolve) => setTimeout(resolve, ms));
}

What this demonstrates

The parser does not know about typewriters, DOM elements, or animation. It produces a clean, typed token tree. The renderer walks that tree with walkTokens and translates each node into engine-specific instructions. This separation means:

Writers work with readable DSL markup.
The parser validates structure and produces tokens.
The renderer maps tokens to platform-specific effects.

You can swap the renderer for a Unity C# renderer, a terminal renderer, or a test harness -- the parser and DSL stay the same.

Step 4: Add a Custom Tag -- `$$choice(option1 | option2 | option3)$$`

Visual novels need player choices. Add a choice tag where each pipe segment is one option:

$$speaker(Alice)*
What should we do?
$$choice(Run away | Fight the creature | Hide in the bushes)$$
*end$$

Handler implementation

This is a natural fit for createPipeHandlers -- each pipe segment maps to a choice option:

import { createPipeHandlers } from "yume-dsl-rich-text";

const choiceTag = createPipeHandlers({
    choice: {
        inline: (args, ctx) => {
            // Collect all pipe segments as plain text strings.
            // args.parts tells us how many segments exist.
            const options: string[] = [];
            for (let i = 0; i < args.parts.length; i++) {
                options.push(args.text(i));
            }

            return {
                type: "choice",
                options,       // ["Run away", "Fight the creature", "Hide in the bushes"]
                value: "",     // no display content -- the renderer shows buttons
            };
        },
    },
});

Add it to the parser

const dsl = createParser({
    handlers: {
        ...pipeTags,
        ...shakeHandler,
        ...waitHandler,
        ...speedHandler,
        ...speakerHandler,
        ...choiceTag,           // ← add choice
    },
    blockTags: declareMultilineTags(["speaker"]),
});

Extend the renderer

Add a handler for the new token type in the walkTokens visitor:

walkTokens(tokens, {
    // ... existing handlers ...

    choice: (token) => {
        if (Array.isArray(token.options)) {
            ops.push({
                kind: "choice" as const,
                options: token.options as string[],
            });
        }
    },
});

In the browser renderer, choice ops create clickable buttons:

case
"choice"
:
// Pause typewriter and show choice buttons
const chosen = await showChoiceButtons(op.options);
handlePlayerChoice(chosen);
break;

Why inline?

A choice tag appears inside dialogue flow, after the question text. It does not span multiple lines of content -- it is a single command that lists options. Inline is the right form. The pipe divider naturally separates the options, so createPipeHandlers handles the splitting.

Step 5: Custom Syntax for Your Game Engine

Some game engines use $ for variable interpolation (e.g., $playerName). Having $$ as the DSL prefix would conflict. Use createEasySyntax to switch to @@:

import {createEasySyntax, createParser} from "yume-dsl-rich-text";

const syntax = createEasySyntax({tagPrefix: "@@"});

const dsl = createParser({
    syntax,
    handlers: {
        ...pipeTags,
        ...shakeHandler,
        ...waitHandler,
        ...speedHandler,
        ...speakerHandler,
        ...choiceTag,
    },
    blockTags: declareMultilineTags(["speaker"]),
});

The script with `@@` prefix

All compound tokens update automatically -- endTag becomes )@@, blockClose becomes *end@@:

@@speaker(Alice)*
Hello! @@color(blue | Nice to meet you)@@.
@@wait(300)@@
Have you seen the @@shake(strange creature)@@ in the forest?
*end@@

@@speaker(Bob)*
@@speed(30)@@Yes... it was @@color(red | terrifying)@@.
*end@@

The handlers are unchanged. Only the writer-facing syntax is different. createEasySyntax derives all compound tokens from the new prefix:

Token	Default (`$$`)	Custom (`@@`)
`tagPrefix`	`$$`	`@@`
`endTag`	`)$$`	`)@@`
`rawClose`	`%end$$`	`%end@@`
`blockClose`	`*end$$`	`*end@@`

The tagOpen, tagClose, tagDivider, rawOpen, blockOpen, and escapeChar tokens remain at their defaults: (, ), |, )%, )*, \.

If you also want to change the shared "end" part inside *end@@ / %end@@, you can stay in easy mode:

const syntax = createEasySyntax({ tagPrefix: "@@", closeMiddle: "fin" });
// rawClose -> "%fin@@"   blockClose -> "*fin@@"

When to use `createEasySyntax` vs `createSyntax`

createEasySyntax -- change one or two base tokens, or add closeMiddle, and let compound tokens auto-derive. Recommended for most cases.
createSyntax -- full manual control. Use when your syntax is irregular (e.g., different open/close bracket types, non-standard raw/block markers).

What You Learned

This tutorial covered four core concepts:

1. Choosing the right tag form

Need	Form	Example
Wrap text with an effect	inline	`$$color(red \| text)$$`, `$$shake(text)$$`
Inject a command into text flow	inline (value `""`)	`$$wait(500)$$`, `$$speed(50)$$`
Contain multiple lines of content	block	`$$speaker(Alice)* ... *end$$`

The decision comes down to: does the tag wrap content or issue a command? Does it span one phrase or multiple lines?

2. "Command" tags vs "content" tags

Both are inline, but they differ in how the renderer treats them:

Content tags (color, shake) produce value: TextToken[] -- the renderer recurses into children and applies visual effects.
Command tags (wait, speed) produce value: "" -- the renderer reads the tag's metadata fields (delay) and changes its internal state.

The parser treats them identically. The semantic difference lives entirely in your handler's return value and your renderer's interpretation.

3. Connecting the parser to a rendering engine

walkTokens is the bridge between parsed tokens and your engine. The visitor pattern lets you:

Dispatch on type to handle each tag differently.
Access ctx.parent to inherit styles from ancestor tags.
Maintain mutable state (like speed) across the traversal.

The parser is framework-agnostic. The same token tree can feed a DOM typewriter, a Unity coroutine, a terminal emulator, or a test assertion.

4. Adapting syntax to your environment

createEasySyntax({ tagPrefix: "@@" }) changes the writer-facing syntax without touching handlers. This lets you avoid conflicts with your engine's existing conventions ($ for variables, # for comments, etc.).

Next: Tutorial: Safe UGC Chat -- whitelist tags, block dangerous forms, and handle malformed input in a user-generated-content system.

en Tutorial Game Dialogue - chiba233/yumeDSL GitHub Wiki

Tutorial: Game Dialogue Tags

The Scenario

Sample dialogue script

Step 1: Set Up the Parser

Design decisions first

Handler implementations

Why `declareMultilineTags`?

Step 2: Parse the Script

Token tree walkthrough

Step 3: Build a Typewriter Renderer

Rendering instruction types

The renderer

Running the renderer

What this demonstrates

Step 4: Add a Custom Tag -- `$$choice(option1 | option2 | option3)$$`

Handler implementation

Add it to the parser

Extend the renderer

Why inline?

Step 5: Custom Syntax for Your Game Engine

The script with `@@` prefix

When to use `createEasySyntax` vs `createSyntax`

What You Learned

1. Choosing the right tag form

2. "Command" tags vs "content" tags

3. Connecting the parser to a rendering engine

4. Adapting syntax to your environment

⚠️ GitHub.com Fallback ⚠️

en Tutorial Game Dialogue - chiba233/yumeDSL GitHub Wiki

Tutorial: Game Dialogue Tags

The Scenario

Sample dialogue script

Step 1: Set Up the Parser

Design decisions first

Handler implementations

Why declareMultilineTags?

Step 2: Parse the Script

Token tree walkthrough

Step 3: Build a Typewriter Renderer

Rendering instruction types

The renderer

Running the renderer

What this demonstrates

Step 4: Add a Custom Tag -- $$choice(option1 | option2 | option3)$$

Handler implementation

Add it to the parser

Extend the renderer

Why inline?

Step 5: Custom Syntax for Your Game Engine

The script with @@ prefix

When to use createEasySyntax vs createSyntax

What You Learned

1. Choosing the right tag form

2. "Command" tags vs "content" tags

3. Connecting the parser to a rendering engine

4. Adapting syntax to your environment

⚠️ **GitHub.com Fallback** ⚠️

Why `declareMultilineTags`?

Step 4: Add a Custom Tag -- `$$choice(option1 | option2 | option3)$$`

The script with `@@` prefix

When to use `createEasySyntax` vs `createSyntax`

⚠️ GitHub.com Fallback ⚠️