en Handler Utilities - chiba233/yumeDSL GitHub Wiki

Handler Utilities

Lower-level utility functions for writing handlers.

Relationship with createPipeHandlers: createPipeHandlers (see Handler Helpers) internally calls parsePipeArgs / parsePipeTextArgs from this page. If you use createPipeHandlers, you typically don't need these functions yourself. Reach for them when createPipeHandlers' wrapping isn't enough — e.g., you need to pre-process input before pipe splitting, or you don't use pipe splitting at all.

Split into two tiers: everyday ones cover most cases, advanced ones are the building blocks underneath.

Big picture: who calls who

createPipeHandlers (Handler Helpers page)
    │ internally calls ↓
    │
Your handler code (or createPipeHandlers internals)
    │
    ├─ parsePipeArgs(tokens, ctx)      ← split inline tokens by pipe
    │      └─ splitTokensByPipe        ← low-level: char-by-char scan, escape handling
    │
    ├─ parsePipeTextArgs(text, ctx)    ← same, but input is a string (raw/block arg)
    │      └─ createTextToken → parsePipeArgs
    │
    ├─ parsePipeTextList(text, ctx)    ← pipe split → plain string array
    │      └─ parsePipeTextArgs
    │
    ├─ extractText(tokens)             ← pull plain text from a token tree
    │
    ├─ createTextToken(value, ctx)     ← manually create a text token
    │      └─ createToken
    │
    └─ materializeTextTokens(tokens)   ← recursively unescape text leaves
           └─ unescapeInline
                  └─ readEscaped
                        └─ readEscapedSequence  ← lowest level: char-level escape scan

Quick pick

You want to…	Use	Note
Split pipe params in an inline handler	`parsePipeArgs(tokens, ctx)`	`createPipeHandlers` does this for you
Split the arg string in a raw/block handler	`parsePipeTextArgs(arg ?? "", ctx)`	`createPipeHandlers` does this for you
Just get a plain string array, no tokens	`parsePipeTextList(text, ctx)`
Extract plain text from a token tree (search/display)	`extractText(tokens)`
Manually create a text token	`createTextToken(value, ctx)`

Everyday Utilities

`parsePipeArgs(tokens, ctx?)`

function parsePipeArgs(tokens: TextToken[], ctx?: DslContext): PipeArgs

Split inline tokens by the pipe divider and return a PipeArgs view. Use .text(0) for plain text, .materializedTokens(0) for unescaped tokens.

If you use createPipeHandlers, this step is already done for you — your callback receives PipeArgs directly. Call parsePipeArgs manually when you don't use createPipeHandlers, or when you need finer-grained control in a custom handler.

const linkHandler: TagHandler = {
    inline: (tokens, ctx) => {
        const args = parsePipeArgs(tokens, ctx);
        return {
            type: "link",
            url: args.text(0),
            value: args.materializedTailTokens(1),
        };
    },
};

Edge cases: empty tokens → one empty segment; no pipe → entire array is one segment; escaped \| is not a split point.

`parsePipeTextArgs(text, ctx?)`

function parsePipeTextArgs(text: string, ctx?: DslContext): PipeArgs

Same as parsePipeArgs but takes a string instead of a token array. Use in raw/block handlers — because raw/block args are strings, not tokens.

createPipeHandlers' raw / block callbacks already call parsePipeTextArgs for you. The original string arg is passed through as the rawArg fourth parameter when you need it.

const codeHandler: TagHandler = {
    raw: (arg, content, ctx) => {
        const args = parsePipeTextArgs(arg ?? "", ctx);
        return { type: "code", lang: args.text(0, "text"), value: content };
    },
};

`parsePipeTextList(text, ctx?)`

function parsePipeTextList(text: string, ctx?: DslContext): string[]

Pipe split → plain string array. No tokens involved, simplest option.

parsePipeTextList("ts | Demo | Label");
// → ["ts", "Demo", "Label"]

parsePipeTextList("a \\| b | c");
// → ["a | b", "c"]  — escaped pipe is not a split point

`extractText(tokens?)`

function extractText(tokens?: TextToken[]): string

Recursively concatenate all plain text from a token tree. Does not unescape (\| stays as-is) and does not preserve structure (bold wrapping "hello" contributes the same string as plain "hello").

const tokens = parseRichText("Hello $$bold(world)$$", { handlers });
extractText(tokens);
// → "Hello world"

Need unescaped text? Use unescapeInline(extractText(tokens), ctx) or PipeArgs.text().

`createTextToken(value, ctx?)`

function createTextToken(value: string, ctx?: DslContext): TextToken

Manually create a { type: "text", value, id } token. Useful for inserting separator tokens inside a handler.

const separator = createTextToken(" | ", ctx);

Advanced Utilities

Building blocks underneath the everyday functions. Use when you need lower-level control.

`splitTokensByPipe(tokens, ctx?)`

function splitTokensByPipe(tokens: TextToken[], ctx?: DslContext): TextToken[][]

Raw pipe splitting without the PipeArgs wrapper. Scans text tokens char by char, splits on pipe.

Behavior details:

Escaped pipe \| is not split, kept in escaped form
Whitespace after pipe is consumed ("a | b" → ["a "] + ["b"])
Empty segments possible ("a || b" → three segments)
Non-text tokens are not scanned, appended to current segment as-is

Most of the time parsePipeArgs is all you need.

`materializeTextTokens(tokens, ctx?)`

function materializeTextTokens(tokens: TextToken[], ctx?: DslContext): TextToken[]

Recursively unescape text leaf tokens: \| → |, \)$$ → )$$, etc.

Why "materialize"? During parsing, text tokens deliberately keep escape sequences (so pipe splitting can tell real pipes from escaped ones). Materialization is the final step that resolves escapes to literal values.

Why skip non-text tokens? A raw-code token's \n is real JavaScript, not a DSL escape. Only processing type === "text" tokens is safe.

`unescapeInline(str, ctx?)`

function unescapeInline(str: string, ctx?: DslContext | SyntaxConfig): string

Unescape all DSL escape sequences in a string. Scans left to right, consuming one "output unit" at a time.

unescapeInline("hello \\) world");  // → "hello ) world"
unescapeInline("a \\| b \\| c");    // → "a | b | c"
unescapeInline("path\\to\\file");   // → "path\\to\\file"  (t is not escapable, kept as-is)

`readEscapedSequence(text, i, ctx?)`

function readEscapedSequence(text: string, i: number, ctx?): [string | null, number]

Character-level escape scanner. Checks if position i starts an escape sequence.

Found escape → [literal value, position after sequence]
Not an escape → [null, i] (position unchanged, caller advances)

readEscapedSequence("hello \\| world", 6);  // → ["|", 8]
readEscapedSequence("hello \\| world", 0);  // → [null, 0]

`readEscaped(text, i, ctx?)`

function readEscaped(text: string, i: number, ctx?): [string, number]

readEscapedSequence with a fallback: returns the literal value if escape found, otherwise returns the current character. Always returns a value — ideal for scan loops.

readEscaped("a\\|b", 0);  // → ["a", 1]
readEscaped("a\\|b", 1);  // → ["|", 3]  — consumed \|, output |
readEscaped("a\\|b", 3);  // → ["b", 4]

unescapeInline is essentially a loop calling readEscaped and concatenating results.

`createToken(draft, position?, ctx?)`

function createToken(draft: TokenDraft, position?: SourceSpan, ctx?: DslContext | CreateId): TextToken

Build a final TextToken from a TokenDraft — assigns ID, attaches position. All token creation flows through this function.

ID resolution order:

ctx is a function → use directly as CreateId
ctx is DslContext with createId → use it
Module-level activeCreateId (deprecated path)
Fallback: sequential counter rt-0, rt-1, ...

New code should pass DslContext, not a bare function.

Performance

Measured on Kunpeng 920 aarch64 / Node v24.14.0 — 5000 iterations, averaged.

`unescapeInline` (since 1.1.0)

Rewritten to batch non-escape runs via slice() instead of per-character readEscaped() + +=. When no escape sequences are found, returns the original string unchanged (zero allocation fast path).

Scenario	Input size	1.0.x	1.1.0	Speedup
No escapes (most common)	4950 chars	0.164 ms	0.054 ms	3.0x
Many escapes (`$`, `$`, `\|`, `\\`)	2500 chars	0.151 ms	0.127 ms	1.2x

The "no escapes" case is the dominant path in real-world usage — most text content contains no DSL escape sequences. The 3x speedup comes from eliminating ~5000 individual readEscaped calls (each doing a text.slice(i, i+1) single-char allocation) and replacing them with a single return str when no escapeChar is found.

`splitTokensByPipe` (since 1.1.0)

Rewritten to track a run start position instead of per-character buffer += val[i]. Slices the entire non-divider/non-escape run at once.

Scenario	Input size	1.0.x	1.1.0	Speedup
50 pipe segments	587 chars	0.068 ms	0.051 ms	1.3x

`extractText`

Tested string[] + join("") as an alternative to recursive +=. Benchmarks showed V8's ConsString optimization makes += faster for typical token tree sizes (300 tokens). No change from 1.0.x behavior — kept the original implementation.

What this means for your app

These utilities are called inside handler code — unescapeInline runs during parsePipeArgs, materializeTextTokens, and parsePipeTextList. If your DSL document has 1000 tags each with pipe parameters, the 3x unescapeInline improvement saves ~100 ms of handler processing time on a ~50 KB document.

For most applications the difference is invisible. If you're building a real-time editor with keystroke-level re-parsing, the savings compound.

en Handler Utilities - chiba233/yumeDSL GitHub Wiki

Handler Utilities

Big picture: who calls who

Quick pick

Everyday Utilities

parsePipeArgs(tokens, ctx?)

parsePipeTextArgs(text, ctx?)

parsePipeTextList(text, ctx?)

extractText(tokens?)

createTextToken(value, ctx?)

Advanced Utilities

splitTokensByPipe(tokens, ctx?)

materializeTextTokens(tokens, ctx?)

unescapeInline(str, ctx?)

readEscapedSequence(text, i, ctx?)

readEscaped(text, i, ctx?)

createToken(draft, position?, ctx?)