en Handler Helpers - chiba233/yumeDSL GitHub Wiki

Handler Helpers

Custom Tag Name Characters | ParseOptions

You've got a dozen tags to register (bold, link, code…) and each one needs a handler object? That's tedious. Handler helpers let you register tags in bulk — one line per tag.

This page covers "how to register quickly". For lower-level control — manual pipe splitting, unescaping, character-level scanning — see Handler Utilities.

Signature note: The createPipeHandlers callback signatures differ from the raw TagHandler interface. createPipeHandlers pre-parses pipe arguments for you, so the first parameter is PipeArgs, not raw tokens or arg. For the raw TagHandler signatures, see Writing Tag Handlers.

When to use which

                    "I need to register tag handlers"
                            │
              ┌─────────────┼──────────────┐
              ▼             ▼              ▼
        Need pipe params?  Simple wrapper?  Just declare
        Multiple forms?    (no pipe)        line-break normalization?
              │             │              │
              ▼             ▼              ▼
     createPipeHandlers   createSimple*   declareMultilineTags
     (recommended, all-in-one) Handlers   (used with blockTags)
                            │
                  ┌─────────┼─────────┐
                  ▼         ▼         ▼
              Inline     Block      Raw
              Handlers   Handlers   Handlers

Quick reference:

Helper	What it does	When to use
`createPipeHandlers`	One definition covers any combo of inline/raw/block, auto-parses pipe args	Use this for most tags
`createSimpleInlineHandlers`	Pass a name array, get bulk inline wrappers	Simple tags like `["bold", "italic", "underline"]` that don't need params
`createSimpleBlockHandlers`	Same, but block form	Simple block wrappers
`createSimpleRawHandlers`	Same, but raw form	`["code", "math"]` — content that shouldn't be recursively parsed
Empty handler objects	Declare tag names and rely on default materialization / fallback	Zero-cost declaration syntax for tags that only need implicit inline output
`declareMultilineTags`	Does not create handlers — just tells the parser which tags need line-break normalization	Only for tags that need special treatment (merges with auto-derivation)

`createPipeHandlers(definitions)`

function createPipeHandlers<const T extends Record<string, PipeHandlerDefinition>>(
    definitions: T
): { [K in keyof T]: TagHandler }

The recommended handler helper. It does one simple thing: automatically calls parsePipeArgs before your callback, so the first argument you receive is an already-split PipeArgs instead of raw tokens or arg.

What it actually saves you

The two snippets below are fully equivalent. Left is a hand-written TagHandler, right uses createPipeHandlers:

// ── Hand-written TagHandler (you call parsePipeArgs yourself) ──
const handlers = {
    link: {
        inline: (tokens, ctx) => {
            const args = parsePipeArgs(tokens, ctx);
            return { type: "link", url: args.text(0), value: args.materializedTailTokens(1) };
        },
    },
};

// ── createPipeHandlers (it calls parsePipeArgs for you) ──
const handlers = createPipeHandlers({
    link: {
        inline: (args, ctx) => ({
            type: "link", url: args.text(0), value: args.materializedTailTokens(1),
        }),
    },
});

The only difference: the first callback parameter changes from tokens: TextToken[] to args: PipeArgs. Everything else (content, ctx) stays in the same position with the same type.

`PipeHandlerDefinition` — the callback signatures you write

interface PipeHandlerDefinition {
    inline?: (args: PipeArgs, ctx?: DslContext) => TokenDraft;
    raw?: (args: PipeArgs, content: string, ctx?: DslContext, rawArg?: string) => TokenDraft;
    block?: (args: PipeArgs, content: TextToken[], ctx?: DslContext, rawArg?: string) => TokenDraft;
}

Side-by-side with the underlying TagHandler:

Form	`TagHandler` (underlying)	`PipeHandlerDefinition` (what you write)	What changed
inline	`(tokens, ctx) => TokenDraft`	`(args, ctx) => TokenDraft`	`tokens` → `args` (auto `parsePipeArgs`)
raw	`(arg, content, ctx) => …`	`(args, content, ctx, rawArg) => …`	`arg` → `args` (auto `parsePipeTextArgs`); original `arg` moved to `rawArg`
block	`(arg, content, ctx) => …`	`(args, content, ctx, rawArg) => …`	Same as raw

rawArg is the original arg string the underlying TagHandler receives (before pipe splitting). Use it only when you need the un-split original.

`PipeArgs` — the argument object you receive

interface PipeArgs {
    parts: TextToken[][];
    has: (index: number) => boolean;
    text: (index: number, fallback?: string) => string;
    materializedTokens: (index: number, fallback?: TextToken[]) => TextToken[];
    materializedTailTokens: (startIndex: number, fallback?: TextToken[]) => TextToken[];
}

Method	In a nutshell
`parts`	Raw token segments split by pipe
`has(i)`	Does segment i exist?
`text(i)`	Plain text of segment i (unescaped, trimmed)
`materializedTokens(i)`	Tokens of segment i (text unescaped, structure preserved)
`materializedTailTokens(start)`	All segments from start merged into one array — for "label content that may itself contain pipes"

Full example

import {createPipeHandlers} from "yume-dsl-rich-text";

const handlers = createPipeHandlers({
    // Inline-only: $$link(https://example.com | click here)$$
    link: {
        inline: (args, ctx) => ({
            type: "link",
            url: args.text(0),
            value: args.materializedTailTokens(1),
        }),
    },

    // Inline + block: $$info(tip)$$ or $$info(tip)*\ncontent\n*end$$
    info: {
        inline: (args, ctx) => ({
            type: "info",
            title: args.text(0),
            value: args.materializedTailTokens(1),
        }),
        block: (args, content, ctx) => ({
            type: "info",
            title: args.text(0),
            value: content,
        }),
    },

    // Raw-only: $$code(ts)%\nconst x = 1;\n%end$$
    code: {
        raw: (args, content, ctx) => ({
            type: "code",
            lang: args.text(0, "text"),
            value: content,
        }),
    },
});

`createSimpleInlineHandlers(names)`

function createSimpleInlineHandlers<const T extends readonly string[]>(
    names: T
): Record<T[number], TagHandler>

Pass a tag-name array, get bulk inline handlers. Each one auto-unescapes child tokens and wraps them as { type: tagName, value: materializedTokens }.

const handlers = createSimpleInlineHandlers(["bold", "italic", "underline"]);
// Equivalent to writing three { inline: (tokens, ctx) => ({ type: "bold", value: materializeTextTokens(tokens, ctx) }) }

Standard implicit syntax: empty handler objects

When you only want to declare that certain tag names exist, the shortest form is:

const handlers = {
    bold: {},
    italic: {},
};

This means:

the tag names are registered
final output is delegated to the parser's default materialization / fallback logic
you did not explicitly define a fixed output shape
in practice, this gives you only the default inline output path

Compared with createSimpleInlineHandlers(...):

Form	Meaning
`createSimpleInlineHandlers(["bold"])`	installs an explicit `inline` handler and always returns `{ type: "bold", value: materializedTokens }`
`bold: {}`	only declares the tag name and relies on default materialization / fallback

Observed output shape:

parseRichText("$$bold(world)$$", {
    handlers: {bold: {}},
});
// => [{ type: "bold", value: [{ type: "text", value: "world" }] }]

parseRichText("$$code(js)%\nconst x = 1;\n%end$$", {
    handlers: {code: {}},
});
// => [{ type: "text", value: "$$code(js)%\nconst x = 1;\n%end$$" }]

parseRichText("$$info(note)*\nhello\n*end$$", {
    handlers: {info: {}},
});
// => [{ type: "text", value: "$$info(note)*\nhello\n*end$$" }]

So:

the empty-object form is a recommended zero-cost declaration syntax
it does not auto-enable raw or block forms
if the same tag should support inline plus raw/block, write inline + raw / block explicitly

`createSimpleBlockHandlers(names)`

function createSimpleBlockHandlers<const T extends readonly string[]>(
    names: T
): Record<T[number], TagHandler>

Bulk-generate block handlers. Each handler passes through arg and recursively-parsed content directly, without unescaping. Output: { type: tagName, arg, value: content }.

const handlers = createSimpleBlockHandlers(["info", "warning", "collapse"]);

`createSimpleRawHandlers(names)`

function createSimpleRawHandlers<const T extends readonly string[]>(
    names: T
): Record<T[number], TagHandler>

Bulk-generate raw handlers. Like block, but content is a string (not recursively parsed).

const handlers = createSimpleRawHandlers(["code", "math", "latex"]);

`declareMultilineTags(names)`

function declareMultilineTags<const T extends readonly BlockTagInput[]>(
    names: T
): BlockTagInput[]

The problem it solves

For any tag with block-level / container rendering semantics — dialogue boxes, code blocks, collapsible panels, info cards — the multiline DSL syntax introduces boundary line breaks that don't belong in the content.

Take the block form as an example:

$$speaker(Alice)*
Hello!
*end$$

Authors naturally place )* and *end$$ on their own lines. But from the parser's perspective, there is a \n immediately after )* and another \n immediately before *end$$ — so the raw content becomes "\nHello!\n" instead of "Hello!". These boundary line breaks are an artefact of multiline syntax, not intentional content.

Without normalization, every block-level tag produces extra blank lines in the rendered output — an extremely subtle and hard-to-debug visual bug.

What line-break normalization does

For declared tags, the parser strips exactly one leading and one trailing line break at the content boundaries:

Position	Raw content	After normalization
After `)*` / `)%`	`\n` or `\r\n` → stripped	Content starts at first actual line
Before `*end$$` / `%end$$`	`\n` or `\r\n` → stripped	Content ends at last actual line

Only exactly one line break is stripped at each boundary. If the author deliberately included additional blank lines, only the one touching the boundary is removed; the rest are preserved.

The offset produced by stripping is precisely fed back to the position tracker, so source-location mapping remains accurate even with trackPositions enabled.

Default behaviour: auto-derivation

In most cases you don't need to call this manually. When creating the parser, it automatically scans handlers:

Handler has a raw method → that tag is normalized in raw form
Handler has a block method → that tag is normalized in block form

In other words, as long as you register multiline handlers with createSimpleBlockHandlers, createSimpleRawHandlers, createPipeHandlers, etc., normalization is already in effect.

How `blockTags` merges with auto-derivation

Auto-derivation always runs as the base. When you pass blockTags, overrides are per-tag, not global:

Tags you explicitly list in blockTags → your declaration completely replaces auto-derivation for that tag (all forms, not just the ones you list — unlisted forms become disabled for that tag)
Tags you don't mention → auto-derivation stays in effect, untouched

This means you only need to declare the tags that need special treatment. You never have to re-list all your raw/block tags just to add inline normalization to one. But if you do list a tag, make sure you include all the forms you want — auto-derivation won't fill in the rest.

// Only need to declare center — all other raw/block tags keep auto-derivation
blockTags: declareMultilineTags([{tag: "center", forms: ["inline"]}])

Rule of thumb

For tags with block-level / container rendering semantics, always make sure they appear in blockTags (whether auto-derived or manually declared). Otherwise boundary line breaks leak into the content, causing extra blank lines at render time.

When to declare manually

When auto-derivation doesn't match your needs:

Scenario	Action
Tag registered only with empty handler objects, but you know it will be used in block form	Declare manually
Tag renders as block-level but only has an inline handler — auto-derivation won't cover it	Declare with `{ tag, forms: ["inline"] }`
Tag has both raw and block handlers, but you only want normalization in raw form	Use `{ tag, forms: ["raw"] }` — overrides auto-derivation for that tag only

Usage

Pass a string: declares normalization for all three forms (raw + block + inline) — the most common approach.

blockTags: declareMultilineTags(["info", "warning", "center"])

Pass an object: use { tag, forms } to control exactly which forms get normalization.

blockTags: declareMultilineTags([
    "info",                                // string: all three forms
    {tag: "code", forms: ["raw"]},       // only raw form
    {tag: "center", forms: ["inline"]},  // only inline form
])

forms accepts:

Value	Normalization	Use case
`"raw"`	Strip leading `\n` after `)%`, trailing `\n` before `%end$$`	Multiline raw tags (`$$code(ts)%\n...\n%end$$`)
`"block"`	Strip leading `\n` after `)`, trailing `\n` before `end$$`	Multiline block tags (`$$info()\n...\nend$$`)
`"inline"`	Strip trailing `\n` immediately after inline close `$$`	Tags using inline syntax that render as block-level elements (`$$center(...)$$`)

Object form without forms defaults to ["raw", "block"] (backward compatible).

Note: declareMultilineTags does not create handlers or register tags — it only controls line-break normalization policy. Register tags using the other helpers on this page or custom handlers.

Deprecated

Deprecated	Use instead
`createPipeBlockHandlers`	`createPipeHandlers` with a `block` method
`createPipeRawHandlers`	`createPipeHandlers` with a `raw` method
`createPassthroughTags`	Empty handler objects / local helper (if you intentionally want the implicit fallback)