en Tutorial Game Dialogue - chiba233/yumeDSL GitHub Wiki
Build a complete visual novel / game dialogue DSL from scratch. By the end you will have color, shake, wait, speed, speaker, and choice tags -- plus a typewriter renderer that consumes the token tree.
You are building a visual novel engine. Text is written by writers and designers, not programmers. They need a small markup language embedded in their script files:
| Tag | Form | Purpose |
|---|---|---|
$$color(red | This text is red)$$ |
inline, pipe | Color a span of text |
$$shake(This text shakes)$$ |
inline | Apply a shake animation to text |
$$wait(500)$$ |
inline | Pause the typewriter for 500 ms (no visible output) |
$$speed(50)$$ |
inline | Change typewriter speed to 50 ms/char |
$$speaker(Alice)* ... *end$$ |
block | Attribute a block of dialogue to a character |
Here is what a writer's script file looks like:
$$speaker(Alice)*
Hello! $$color(blue | Nice to meet you)$$.
$$wait(300)$$
Have you seen the $$shake(strange creature)$$ in the forest?
*end$$
$$speaker(Bob)*
$$speed(30)$$Yes... it was $$color(red | terrifying)$$.
*end$$
Two speakers, nested inline tags inside block tags, a pause between sentences, and a speed change. The rest of this tutorial shows how to parse that script and feed the result into a typewriter renderer.
Before writing code, decide which DSL form each tag should use:
| Tag | Form | Why |
|---|---|---|
| color | inline with pipe | It wraps a span of text, and it needs an argument (the color name). Pipe separates the argument from the content: $$color(red | text)$$. |
| shake | inline, no pipe | It wraps text, but needs no argument -- the tag name itself is the effect. $$shake(text)$$. |
| wait | inline | It appears in the flow of text (between sentences, inside dialogue), but it produces no visible content. Its value will be "". The argument (delay in ms) is the entire inline content. |
| speed | inline | Same pattern as wait: a "command" tag that changes state. No visible output, argument is the content. |
| speaker | block | It contains multiple lines of dialogue with nested tags. Block form gives writers a natural start/end structure and supports recursive parsing of the body. |
The distinction between "content" tags (color, shake) and "command" tags (wait, speed) is important. Content tags wrap children and render them with a visual effect. Command tags produce no visible output -- they inject a side-effect into the rendering pipeline. Both are inline because they appear within the text flow.
import {
createParser,
createPipeHandlers,
materializeTextTokens,
extractText,
declareMultilineTags,
type TextToken,
type DslContext,
type TagHandler,
type TokenDraft,
} from "yume-dsl-rich-text";
// ── color ──────────────────────────────────────────────
// Inline with pipe: $$color(red | This text is red)$$
//
// Pipe segment 0 → color name (plain string)
// Pipe segment 1+ → the content to colorize (token tree, may contain nested tags)
//
// We use createPipeHandlers so the pipe splitting is handled for us.
// materializedTailTokens(1) collects everything after the first pipe
// into a single token array, which means the writer can use pipes
// inside the colored text if they escape them.
const pipeTags = createPipeHandlers({
color: {
inline: (args, ctx) => ({
type: "color",
color: args.text(0), // "red", "blue", "#ff0"
value: args.materializedTailTokens(1), // the colored content
}),
},
});
// ── shake ──────────────────────────────────────────────
// Inline, no pipe: $$shake(This text shakes)$$
//
// The entire inline content is the text to animate.
// materializeTextTokens resolves escape sequences in text leaves
// while preserving nested tag structure (e.g., $$shake($$color(red | wow)$$)$$).
const shakeHandler: Record<string, TagHandler> = {
shake: {
inline: (tokens, ctx) => ({
type: "shake",
value: materializeTextTokens(tokens, ctx),
}),
},
};
// ── wait ───────────────────────────────────────────────
// Inline command: $$wait(500)$$
//
// No visible output — value is "".
// The number inside the parens is the delay in milliseconds.
// extractText pulls the raw text content from the token array,
// then we parse it as an integer.
//
// Why inline and not raw? Because wait appears *inside* dialogue text,
// between sentences. It sits in the same text flow as words and other
// inline tags. Raw form would require its own line pair, which is
// awkward for a tiny command embedded in a paragraph.
const waitHandler: Record<string, TagHandler> = {
wait: {
inline: (tokens, ctx) => {
const ms = parseInt(extractText(tokens), 10) || 0;
return { type: "wait", delay: ms, value: "" };
},
},
};
// ── speed ──────────────────────────────────────────────
// Inline command: $$speed(50)$$
//
// Same pattern as wait: inline command, no visible output.
// The number is the new typewriter delay per character in ms.
const speedHandler: Record<string, TagHandler> = {
speed: {
inline: (tokens, ctx) => {
const ms = parseInt(extractText(tokens), 10) || 0;
return { type: "speed", delay: ms, value: "" };
},
},
};
// ── speaker ────────────────────────────────────────────
// Block form: $$speaker(Alice)*\n...\n*end$$
//
// The arg is the speaker's name.
// The block body is recursively parsed, so the writer can use
// color, shake, wait, speed — any inline tag — inside the dialogue.
//
// Why block and not inline? Because dialogue spans multiple lines.
// The block form gives a clear visual boundary that writers understand:
// $$speaker(Alice)*
// ...lines of dialogue...
// *end$$
const speakerHandler: Record<string, TagHandler> = {
speaker: {
block: (arg, content, ctx) => ({
type: "speaker",
name: arg ?? "???", // fallback if writer forgets the name
value: content, // recursively parsed dialogue body
}),
},
};
// ── Assemble the parser ────────────────────────────────
const dsl = createParser({
handlers: {
...pipeTags,
...shakeHandler,
...waitHandler,
...speedHandler,
...speakerHandler,
},
blockTags: declareMultilineTags(["speaker"]),
});speaker is a block-level container tag. In the DSL, the block form is written as:
$$speaker(Alice)*
Hello!
*end$$
Authors naturally place )* and *end$$ on their own lines, but this means the raw content becomes "\nHello!\n" — with a boundary line break at each end. Without normalization, the rendered dialogue box would show an extra blank line above and below the content.
declareMultilineTags(["speaker"]) tells the parser to strip exactly one line break at each boundary in block / raw forms, so content starts cleanly at the first actual line and ends cleanly at the last.
In most cases, the parser auto-derives this from handlers that have block / raw methods. Declaring manually makes the intent explicit and covers situations where auto-derivation is insufficient. See Handler Helpers — declareMultilineTags for the full explanation.
Feed the sample dialogue into the parser:
const script = `$$speaker(Alice)*
Hello! $$color(blue | Nice to meet you)$$.
$$wait(300)$$
Have you seen the $$shake(strange creature)$$ in the forest?
*end$$
$$speaker(Bob)*
$$speed(30)$$Yes... it was $$color(red | terrifying)$$.
*end$$`;
const tokens = dsl.parse(script);The result is an array of top-level tokens. Let us walk through them:
[
// ── Token 0: Alice's dialogue block ──
{
type: "speaker",
name: "Alice",
id: "rt-...",
value: [
// Line 1: "Hello! " + color tag + "."
{type: "text", value: "Hello! ", id: "rt-..."},
{
type: "color",
color: "blue",
id: "rt-...",
value: [
{type: "text", value: "Nice to meet you", id: "rt-..."}
]
},
{type: "text", value: ".\n", id: "rt-..."},
// Line 2: wait command (no visible text)
{type: "wait", delay: 300, value: "", id: "rt-..."},
// Line 3: text + shake tag + text
{type: "text", value: "\nHave you seen the ", id: "rt-..."},
{
type: "shake",
id: "rt-...",
value: [
{type: "text", value: "strange creature", id: "rt-..."}
]
},
{type: "text", value: " in the forest?\n", id: "rt-..."},
]
},
// ── Between blocks: whitespace ──
{type: "text", value: "\n\n", id: "rt-..."},
// ── Token 1: Bob's dialogue block ──
{
type: "speaker",
name: "Bob",
id: "rt-...",
value: [
// speed command at the start of the line
{type: "speed", delay: 30, value: "", id: "rt-..."},
// "Yes... it was " + color tag + "."
{type: "text", value: "Yes... it was ", id: "rt-..."},
{
type: "color",
color: "red",
id: "rt-...",
value: [
{type: "text", value: "terrifying", id: "rt-..."}
]
},
{type: "text", value: ".\n", id: "rt-..."},
]
},
]Key observations:
-
speaker tokens have
value: TextToken[]-- the recursively parsed dialogue body. -
color and shake tokens wrap their children in
value: TextToken[]. -
wait and speed tokens have
value: ""-- they are commands, not content. - Newlines between lines appear as part of adjacent text tokens.
- The whitespace between the two speaker blocks is a plain text token.
Now connect the token tree to a real rendering engine. This typewriter state machine walks the tree and produces a queue of timed rendering instructions.
// A single instruction for the rendering engine
type RenderOp =
| { kind: "char"; char: string; delay: number; styles: StyleStack }
| { kind: "pause"; delay: number }
| { kind: "speaker"; name: string }
| { kind: "linebreak" };
// Style state: which effects are active at any point
interface StyleState {
color: string | null;
shake: boolean;
}
// Stack of active styles (for nesting: color inside shake, etc.)
type StyleStack = StyleState;import {walkTokens, type TextToken} from "yume-dsl-rich-text";
function buildRenderQueue(tokens: TextToken[]): RenderOp[] {
const ops: RenderOp[] = [];
let speed = 50; // default: 50ms per character
// We need to track style context as we descend into the tree.
// walkTokens visits depth-first pre-order, so we can maintain
// a style stack by examining the parent chain.
// Helper: compute current styles from the token's ancestor chain.
// In a real engine you would maintain an explicit stack; here we
// use a simplified approach for clarity.
function getStyles(token: TextToken, parent: TextToken | null): StyleStack {
const state: StyleState = {color: null, shake: false};
// Walk up through ancestors by checking token context
// For this tutorial, we read color/shake from the immediate parent
if (parent) {
if (parent.type === "color" && typeof parent.color === "string") {
state.color = parent.color;
}
if (parent.type === "shake") {
state.shake = true;
}
}
return state;
}
walkTokens(tokens, {
// ── Speaker: emit a speaker header ──
speaker: (token) => {
if (typeof token.name === "string") {
ops.push({kind: "speaker", name: token.name});
}
// Children will be visited automatically by walkTokens
},
// ── Text: queue each character with the current speed ──
text: (token, ctx) => {
if (typeof token.value !== "string") return;
const styles = getStyles(token, ctx.parent);
for (const char of token.value) {
if (char === "\n") {
ops.push({kind: "linebreak"});
} else {
ops.push({kind: "char", char, delay: speed, styles});
}
}
},
// ── Wait: insert a pause ──
wait: (token) => {
const delay = typeof token.delay === "number" ? token.delay : 0;
ops.push({kind: "pause", delay});
},
// ── Speed: change the character delay ──
speed: (token) => {
const newSpeed = typeof token.delay === "number" ? token.delay : 50;
speed = newSpeed;
},
// color and shake don't need their own visitors —
// their effect is picked up by the text visitor via getStyles().
});
return ops;
}const queue = buildRenderQueue(tokens);
// Example: play the queue in a browser
async function play(queue: RenderOp[]) {
for (const op of queue) {
switch (op.kind) {
case "speaker":
// Show speaker name plate
showSpeakerName(op.name);
break;
case "char":
// Append one character with styles
appendChar(op.char, op.styles);
await sleep(op.delay);
break;
case "pause":
// Freeze the typewriter
await sleep(op.delay);
break;
case "linebreak":
appendLineBreak();
break;
}
}
}
function sleep(ms: number): Promise<void> {
return new Promise((resolve) => setTimeout(resolve, ms));
}The parser does not know about typewriters, DOM elements, or animation. It produces a clean, typed token tree. The
renderer walks that tree with walkTokens and translates each node into engine-specific instructions. This separation
means:
- Writers work with readable DSL markup.
- The parser validates structure and produces tokens.
- The renderer maps tokens to platform-specific effects.
You can swap the renderer for a Unity C# renderer, a terminal renderer, or a test harness -- the parser and DSL stay the same.
Visual novels need player choices. Add a choice tag where each pipe segment is one option:
$$speaker(Alice)*
What should we do?
$$choice(Run away | Fight the creature | Hide in the bushes)$$
*end$$
This is a natural fit for createPipeHandlers -- each pipe segment maps to a choice option:
import { createPipeHandlers } from "yume-dsl-rich-text";
const choiceTag = createPipeHandlers({
choice: {
inline: (args, ctx) => {
// Collect all pipe segments as plain text strings.
// args.parts tells us how many segments exist.
const options: string[] = [];
for (let i = 0; i < args.parts.length; i++) {
options.push(args.text(i));
}
return {
type: "choice",
options, // ["Run away", "Fight the creature", "Hide in the bushes"]
value: "", // no display content -- the renderer shows buttons
};
},
},
});const dsl = createParser({
handlers: {
...pipeTags,
...shakeHandler,
...waitHandler,
...speedHandler,
...speakerHandler,
...choiceTag, // ← add choice
},
blockTags: declareMultilineTags(["speaker"]),
});Add a handler for the new token type in the walkTokens visitor:
walkTokens(tokens, {
// ... existing handlers ...
choice: (token) => {
if (Array.isArray(token.options)) {
ops.push({
kind: "choice" as const,
options: token.options as string[],
});
}
},
});In the browser renderer, choice ops create clickable buttons:
case
"choice"
:
// Pause typewriter and show choice buttons
const chosen = await showChoiceButtons(op.options);
handlePlayerChoice(chosen);
break;A choice tag appears inside dialogue flow, after the question text. It does not span multiple lines of content -- it is
a single command that lists options. Inline is the right form. The pipe divider naturally separates the options, so
createPipeHandlers handles the splitting.
Some game engines use $ for variable interpolation (e.g., $playerName). Having $$ as the DSL prefix would
conflict. Use createEasySyntax to switch to @@:
import {createEasySyntax, createParser} from "yume-dsl-rich-text";
const syntax = createEasySyntax({tagPrefix: "@@"});
const dsl = createParser({
syntax,
handlers: {
...pipeTags,
...shakeHandler,
...waitHandler,
...speedHandler,
...speakerHandler,
...choiceTag,
},
blockTags: declareMultilineTags(["speaker"]),
});All compound tokens update automatically -- endTag becomes )@@, blockClose becomes *end@@:
@@speaker(Alice)*
Hello! @@color(blue | Nice to meet you)@@.
@@wait(300)@@
Have you seen the @@shake(strange creature)@@ in the forest?
*end@@
@@speaker(Bob)*
@@speed(30)@@Yes... it was @@color(red | terrifying)@@.
*end@@
The handlers are unchanged. Only the writer-facing syntax is different. createEasySyntax derives all compound tokens
from the new prefix:
| Token | Default ($$) |
Custom (@@) |
|---|---|---|
tagPrefix |
$$ |
@@ |
endTag |
)$$ |
)@@ |
rawClose |
%end$$ |
%end@@ |
blockClose |
*end$$ |
*end@@ |
The tagOpen, tagClose, tagDivider, rawOpen, blockOpen, and escapeChar tokens remain at their defaults: (,
), |, )%, )*, \.
If you also want to change the shared "end" part inside *end@@ / %end@@, you can stay in easy mode:
const syntax = createEasySyntax({ tagPrefix: "@@", closeMiddle: "fin" });
// rawClose -> "%fin@@" blockClose -> "*fin@@"-
createEasySyntax-- change one or two base tokens, or addcloseMiddle, and let compound tokens auto-derive. Recommended for most cases. -
createSyntax-- full manual control. Use when your syntax is irregular (e.g., different open/close bracket types, non-standard raw/block markers).
This tutorial covered four core concepts:
| Need | Form | Example |
|---|---|---|
| Wrap text with an effect | inline |
$$color(red | text)$$, $$shake(text)$$
|
| Inject a command into text flow | inline (value "") |
$$wait(500)$$, $$speed(50)$$
|
| Contain multiple lines of content | block | $$speaker(Alice)* ... *end$$ |
The decision comes down to: does the tag wrap content or issue a command? Does it span one phrase or multiple lines?
Both are inline, but they differ in how the renderer treats them:
-
Content tags (color, shake) produce
value: TextToken[]-- the renderer recurses into children and applies visual effects. -
Command tags (wait, speed) produce
value: ""-- the renderer reads the tag's metadata fields (delay) and changes its internal state.
The parser treats them identically. The semantic difference lives entirely in your handler's return value and your renderer's interpretation.
walkTokens is the bridge between parsed tokens and your engine. The visitor pattern lets you:
- Dispatch on
typeto handle each tag differently. - Access
ctx.parentto inherit styles from ancestor tags. - Maintain mutable state (like
speed) across the traversal.
The parser is framework-agnostic. The same token tree can feed a DOM typewriter, a Unity coroutine, a terminal emulator, or a test assertion.
createEasySyntax({ tagPrefix: "@@" }) changes the writer-facing syntax without touching handlers. This lets you avoid
conflicts with your engine's existing conventions ($ for variables, # for comments, etc.).
Next: Tutorial: Safe UGC Chat -- whitelist tags, block dangerous forms, and handle malformed input in a user-generated-content system.