en Handler Utilities - chiba233/yumeDSL GitHub Wiki
Handler Utilities
Writing Tag Handlers | Token Structure
Lower-level utility functions for writing handlers.
Relationship with
createPipeHandlers:createPipeHandlers(see Handler Helpers) internally callsparsePipeArgs/parsePipeTextArgsfrom this page. If you usecreatePipeHandlers, you typically don't need these functions yourself. Reach for them whencreatePipeHandlers' wrapping isn't enough β e.g., you need to pre-process input before pipe splitting, or you don't use pipe splitting at all.
Split into two tiers: everyday ones cover most cases, advanced ones are the building blocks underneath.
Big picture: who calls who
createPipeHandlers (Handler Helpers page)
β internally calls β
β
Your handler code (or createPipeHandlers internals)
β
ββ parsePipeArgs(tokens, ctx) β split inline tokens by pipe
β ββ splitTokensByPipe β low-level: char-by-char scan, escape handling
β
ββ parsePipeTextArgs(text, ctx) β same, but input is a string (raw/block arg)
β ββ createTextToken β parsePipeArgs
β
ββ parsePipeTextList(text, ctx) β pipe split β plain string array
β ββ parsePipeTextArgs
β
ββ extractText(tokens) β pull plain text from a token tree
β
ββ createTextToken(value, ctx) β manually create a text token
β ββ createToken
β
ββ materializeTextTokens(tokens) β recursively unescape text leaves
ββ unescapeInline
ββ readEscaped
ββ readEscapedSequence β lowest level: char-level escape scan
Quick pick
| You want to⦠| Use | Note |
|---|---|---|
| Split pipe params in an inline handler | parsePipeArgs(tokens, ctx) |
createPipeHandlers does this for you |
| Split the arg string in a raw/block handler | parsePipeTextArgs(arg ?? "", ctx) |
createPipeHandlers does this for you |
| Just get a plain string array, no tokens | parsePipeTextList(text, ctx) |
|
| Extract plain text from a token tree (search/display) | extractText(tokens) |
|
| Manually create a text token | createTextToken(value, ctx) |
Everyday Utilities
parsePipeArgs(tokens, ctx?)
function parsePipeArgs(tokens: TextToken[], ctx?: DslContext): PipeArgs
Split inline tokens by the pipe divider and return a PipeArgs view. Use .text(0) for plain text, .materializedTokens(0) for unescaped tokens.
If you use
createPipeHandlers, this step is already done for you β your callback receivesPipeArgsdirectly. CallparsePipeArgsmanually when you don't usecreatePipeHandlers, or when you need finer-grained control in a custom handler.
const linkHandler: TagHandler = {
inline: (tokens, ctx) => {
const args = parsePipeArgs(tokens, ctx);
return {
type: "link",
url: args.text(0),
value: args.materializedTailTokens(1),
};
},
};
Edge cases: empty tokens β one empty segment; no pipe β entire array is one segment; escaped \| is not a split point.
parsePipeTextArgs(text, ctx?)
function parsePipeTextArgs(text: string, ctx?: DslContext): PipeArgs
Same as parsePipeArgs but takes a string instead of a token array. Use in raw/block handlers β because raw/block args are strings, not tokens.
createPipeHandlers'raw/blockcallbacks already callparsePipeTextArgsfor you. The original string arg is passed through as therawArgfourth parameter when you need it.
const codeHandler: TagHandler = {
raw: (arg, content, ctx) => {
const args = parsePipeTextArgs(arg ?? "", ctx);
return { type: "code", lang: args.text(0, "text"), value: content };
},
};
parsePipeTextList(text, ctx?)
function parsePipeTextList(text: string, ctx?: DslContext): string[]
Pipe split β plain string array. No tokens involved, simplest option.
parsePipeTextList("ts | Demo | Label");
// β ["ts", "Demo", "Label"]
parsePipeTextList("a \\| b | c");
// β ["a | b", "c"] β escaped pipe is not a split point
extractText(tokens?)
function extractText(tokens?: TextToken[]): string
Recursively concatenate all plain text from a token tree. Does not unescape (\| stays as-is) and does not preserve structure (bold wrapping "hello" contributes the same string as plain "hello").
const tokens = parseRichText("Hello $$bold(world)$$", { handlers });
extractText(tokens);
// β "Hello world"
Need unescaped text? Use unescapeInline(extractText(tokens), ctx) or PipeArgs.text().
createTextToken(value, ctx?)
function createTextToken(value: string, ctx?: DslContext): TextToken
Manually create a { type: "text", value, id } token. Useful for inserting separator tokens inside a handler.
const separator = createTextToken(" | ", ctx);
Advanced Utilities
Building blocks underneath the everyday functions. Use when you need lower-level control.
splitTokensByPipe(tokens, ctx?)
function splitTokensByPipe(tokens: TextToken[], ctx?: DslContext): TextToken[][]
Raw pipe splitting without the PipeArgs wrapper. Scans text tokens char by char, splits on pipe.
Behavior details:
- Escaped pipe
\|is not split, kept in escaped form - Whitespace after pipe is consumed (
"a | b"β["a "]+["b"]) - Empty segments possible (
"a || b"β three segments) - Non-text tokens are not scanned, appended to current segment as-is
Most of the time parsePipeArgs is all you need.
materializeTextTokens(tokens, ctx?)
function materializeTextTokens(tokens: TextToken[], ctx?: DslContext): TextToken[]
Recursively unescape text leaf tokens: \| β |, \)$$ β )$$, etc.
Why "materialize"? During parsing, text tokens deliberately keep escape sequences (so pipe splitting can tell real pipes from escaped ones). Materialization is the final step that resolves escapes to literal values.
Why skip non-text tokens? A raw-code token's \n is real JavaScript, not a DSL escape. Only processing type === "text" tokens is safe.
unescapeInline(str, ctx?)
function unescapeInline(str: string, ctx?: DslContext | SyntaxConfig): string
Unescape all DSL escape sequences in a string. Scans left to right, consuming one "output unit" at a time.
unescapeInline("hello \\) world"); // β "hello ) world"
unescapeInline("a \\| b \\| c"); // β "a | b | c"
unescapeInline("path\\to\\file"); // β "path\\to\\file" (t is not escapable, kept as-is)
readEscapedSequence(text, i, ctx?)
function readEscapedSequence(text: string, i: number, ctx?): [string | null, number]
Character-level escape scanner. Checks if position i starts an escape sequence.
- Found escape β
[literal value, position after sequence] - Not an escape β
[null, i](position unchanged, caller advances)
readEscapedSequence("hello \\| world", 6); // β ["|", 8]
readEscapedSequence("hello \\| world", 0); // β [null, 0]
readEscaped(text, i, ctx?)
function readEscaped(text: string, i: number, ctx?): [string, number]
readEscapedSequence with a fallback: returns the literal value if escape found, otherwise returns the current character. Always returns a value β ideal for scan loops.
readEscaped("a\\|b", 0); // β ["a", 1]
readEscaped("a\\|b", 1); // β ["|", 3] β consumed \|, output |
readEscaped("a\\|b", 3); // β ["b", 4]
unescapeInline is essentially a loop calling readEscaped and concatenating results.
createToken(draft, position?, ctx?)
function createToken(draft: TokenDraft, position?: SourceSpan, ctx?: DslContext | CreateId): TextToken
Build a final TextToken from a TokenDraft β assigns ID, attaches position. All token creation flows through this function.
ID resolution order:
ctxis a function β use directly as CreateIdctxis DslContext withcreateIdβ use it- Module-level
activeCreateId(deprecated path) - Fallback: sequential counter
rt-0, rt-1, ...
New code should pass DslContext, not a bare function.
Performance
Measured on Kunpeng 920 aarch64 / Node v24.14.0 β 5000 iterations, averaged.
unescapeInline (since 1.1.0)
Rewritten to batch non-escape runs via slice() instead of per-character readEscaped() + +=. When no escape sequences are found, returns the original string unchanged (zero allocation fast path).
| Scenario | Input size | 1.0.x | 1.1.0 | Speedup |
|---|---|---|---|---|
| No escapes (most common) | 4950 chars | 0.164 ms | 0.054 ms | 3.0x |
Many escapes (\(, \), |, \\) |
2500 chars | 0.151 ms | 0.127 ms | 1.2x |
The "no escapes" case is the dominant path in real-world usage β most text content contains no DSL escape sequences. The 3x speedup comes from eliminating ~5000 individual readEscaped calls (each doing a text.slice(i, i+1) single-char allocation) and replacing them with a single return str when no escapeChar is found.
splitTokensByPipe (since 1.1.0)
Rewritten to track a run start position instead of per-character buffer += val[i]. Slices the entire non-divider/non-escape run at once.
| Scenario | Input size | 1.0.x | 1.1.0 | Speedup |
|---|---|---|---|---|
| 50 pipe segments | 587 chars | 0.068 ms | 0.051 ms | 1.3x |
extractText
Tested string[] + join("") as an alternative to recursive +=. Benchmarks showed V8's ConsString optimization makes += faster for typical token tree sizes (300 tokens). No change from 1.0.x behavior β kept the original implementation.
What this means for your app
These utilities are called inside handler code β unescapeInline runs during parsePipeArgs, materializeTextTokens, and parsePipeTextList. If your DSL document has 1000 tags each with pipe parameters, the 3x unescapeInline improvement saves ~100 ms of handler processing time on a ~50 KB document.
For most applications the difference is invisible. If you're building a real-time editor with keystroke-level re-parsing, the savings compound.