en Token Structure - chiba233/yumeDSL GitHub Wiki

Token Structure

ParseOptions | Stable Token IDs

The parser turns DSL text into a token tree. Every node in the tree is a TextToken. Your handler returns a TokenDraft (half-finished), the parser adds id and position to make the final TextToken.


The big picture

Your DSL text
    │
    ▼
Parser scans
    │
    ├─ Plain text → TextToken { type: "text", value: "Hello ", id: "rt-0" }
    │
    └─ $$bold(world)$$ → calls your handler
                              │
                              ▼
                         Handler returns TokenDraft
                         { type: "bold", value: [...children] }
                              │
                              ▼
                         Parser adds id + position
                              │
                              ▼
                         TextToken { type: "bold", value: [...], id: "rt-1" }

TextToken

interface TextToken {
    type: string;                // "text" or the type returned by your handler
    value: string | TextToken[]; // text content or child token array
    id: string;                  // unique within a parse
    position?: SourceSpan;       // source coordinates (only with trackPositions)
    [key: string]: unknown;      // extra fields from handler
}

Fields at a glance

Field What it is
type "text" for plain text, or the type your handler returned (usually the tag name like "bold", but can be any string like "version-note")
value Text node → string; inline/block tag → TextToken[] (children); raw tag → string (raw content)
id Sequential by default rt-0, rt-1, .... For stable IDs → Stable Token IDs
position Only when trackPositions: true. See Source Position Tracking
[key] Whatever extra fields your handler returns. e.g. link's url, code's lang

Discriminate value type: typeof token.value === "string" → text/raw; otherwise → child token array.


TokenDraft

interface TokenDraft {
    type: string;
    value: string | TextToken[];
    [key: string]: unknown;
}

What handlers return. Same as TextToken but without id and position — the parser adds those.

Handler must set type and value. Any extra fields are preserved:

return {
    type: "link",
    value: childTokens,               // required
    url: "https://example.com",       // extra field — kept on final TextToken
};

Strong Typing

The base TextToken uses an index signature for flexibility, but you can — and should — define precise types for your tags.

Clarify the boundaries first

  • Public library boundary: parseRichText() returns generic TextToken[] (open for unknown extra fields)
  • App boundary: your renderer/handlers narrow tokens with a local TokenMap

Keeping these two layers separate gives you both extensibility and strict type checks.

Recommended: NarrowToken + createTokenGuard 1.1.0+

Define a token map, then use createTokenGuard for zero-boilerplate narrowing:

import {
    type NarrowToken,
    type NarrowDraft,
    type NarrowTokenUnion,
    createTokenGuard,
    type TextToken,
} from "yume-dsl-rich-text";

// 1. Define a token map — each key is a type, each value is extra fields
interface MyTokenMap {
    text: Record<string, never>;
    bold: Record<string, never>;
    link: { url: string };
    code: { lang: string };
}

type MyToken = NarrowTokenUnion<MyTokenMap>;

// 2. Create a type guard
const is = createTokenGuard<MyTokenMap>();

const renderChildren = (value: TextToken["value"]) =>
    Array.isArray(value) ? value.map(render).join("") : value;

// 3. Narrow in if branches — TypeScript infers extra fields automatically
function render(token: TextToken): string {
    if (is(token, "text")) return typeof token.value === "string" ? token.value : renderChildren(token.value);
    if (is(token, "bold")) return `<b>${renderChildren(token.value)}</b>`;
    if (is(token, "link")) return `<a href="${token.url}">${renderChildren(token.value)}</a>`;
    if (is(token, "code")) return `<pre data-lang="${token.lang}">${renderChildren(token.value)}</pre>`;
    return "";
}

// 4. If you want a discriminated union in specific modules:
const tokens = parseRichText(input, { handlers }) as MyToken[];

Utility types:

Type What it does
NarrowToken<TType, TExtra?> Narrow a TextToken to a specific type literal + known extra fields
NarrowDraft<TType, TExtra?> Narrow a TokenDraft for handler return type annotations
NarrowTokenUnion<TMap> Generate a union of NarrowToken from a token map — useful for exhaustive switch
createTokenGuard<TMap>() Create a runtime type guard that narrows TextToken by type key

Tip: for token types with no extra fields, prefer Record<string, never> over {} to avoid strict ESLint no-empty-object-type warnings.

Handler-side type safety with NarrowDraft:

import { type NarrowDraft, type TagHandler, parsePipeArgs } from "yume-dsl-rich-text";

type LinkDraft = NarrowDraft<"link", { url: string }>;

const linkHandler: TagHandler = {
    inline: (tokens, ctx): LinkDraft => {
        const args = parsePipeArgs(tokens, ctx);
        return {
            type: "link",
            url: args.text(0),              // ← forget this and TS reports an error
            value: args.materializedTailTokens(1),
        };
    },
};

Alternative: discriminated union (manual)

If you prefer explicit interfaces:

// 1. Define interfaces per tag
interface PlainText extends TextToken { type: "text"; value: string; }
interface BoldToken extends TextToken { type: "bold"; value: TextToken[]; }
interface LinkToken extends TextToken { type: "link"; url: string; value: TextToken[]; }
interface CodeBlockToken extends TextToken { type: "code"; lang: string; value: string; }

// 2. Union type
type MyToken = PlainText | BoldToken | LinkToken | CodeBlockToken;

// 3. Cast once at parse boundary
const tokens = parseRichText(input, { handlers }) as MyToken[];

// 4. Exhaustive switch
function render(token: MyToken): string {
    switch (token.type) {
        case "text": return token.value;
        case "bold": return `<b>${token.value.map(t => render(t as MyToken)).join("")}</b>`;
        case "link": return `<a href="${token.url}">${token.value.map(t => render(t as MyToken)).join("")}</a>`;
        case "code": return `<pre data-lang="${token.lang}">${token.value}</pre>`;
        default: { const _: never = token; return String(_); }
    }
}

Simple cases: typeof narrowing

Don't want to define interfaces? Runtime typeof works too:

if (token.type === "link" && typeof token.url === "string") {
    console.log("Link to:", token.url);
}

Less safe (no exhaustiveness check), but fine for ad-hoc access.

⚠️ **GitHub.com Fallback** ⚠️