en API Reference - chiba233/yumeDSL GitHub Wiki
API Reference
Home | DSL Syntax | Custom Syntax
Four core functions, four jobs:
DSL source text
│
┌──────────────────┼──────────────────┐
▼ ▼ ▼
parseRichText stripRichText parseStructural
→ TextToken[] → plain string → StructuralNode[]
(for rendering) (search/preview) (highlight/lint/editor)
│
▼
printStructural
→ DSL source (lossless round-trip)
Recommended: use createParser to bind config once, then call .parse() / .strip() / .structural() /
.print() everywhere.
createParser(defaults) -- recommended entry point
createParser binds your ParseOptions into a reusable instance. This is the recommended way to use the parser --
define your tag handlers once, then call dsl.parse() / dsl.strip() / dsl.structural() / dsl.print() everywhere
without repeating config.
import {
createParser,
createSimpleInlineHandlers,
parsePipeArgs,
} from "yume-dsl-rich-text";
const dsl = createParser({
handlers: {
...createSimpleInlineHandlers(["bold", "italic", "underline"]),
link: {
inline: (tokens, ctx) => {
const args = parsePipeArgs(tokens, ctx);
return {
type: "link",
url: args.text(0),
value: args.materializedTailTokens(1),
};
},
},
},
});
// Use everywhere -- handlers are already bound
dsl.parse("Hello $$bold(world)$$!");
dsl.strip("Hello $$bold(world)$$!");
Parser interface
interface Parser {
parse: (text: string, overrides?: ParseOptions) => TextToken[];
strip: (text: string, overrides?: ParseOptions) => string;
structural: (text: string, overrides?: StructuralParseOptions) => StructuralNode[];
print: (nodes: StructuralNode[], overrides?: PrintOptions) => string;
}
What createParser binds
Most of the time you only need to bind handlers. The rest tags along for convenience.
| Option | What it does when pre-bound |
|---|---|
handlers |
Your tag definitions -- the main reason to use createParser |
syntax |
Custom syntax tokens (if you override $$ prefix, etc.) |
tagName |
Custom tag-name character rules |
allowForms |
Restrict accepted tag forms (default: all forms enabled) |
implicitInlineShorthand |
Enable tag(...) shorthand for inline tags (since 1.3) |
depthLimit |
Nesting limit -- rarely changes per call |
createId |
Custom token id generator (can be overridden per call) |
blockTags |
Tags that receive block-level line-break normalization |
onError |
Default error handler (can still be overridden per call) |
trackPositions |
Attach source positions to all output nodes (can be overridden per call) |
Methods
| Method | Input | Output | Inherits from defaults |
|---|---|---|---|
parse |
DSL text + overrides? | TextToken[] |
All ParseOptions -- overrides merge one level deep for syntax/tagName |
strip |
DSL text + overrides? | string |
Same as parse |
structural |
DSL text + overrides? | StructuralNode[] |
handlers, allowForms, implicitInlineShorthand, syntax, tagName, depthLimit, trackPositions |
print |
StructuralNode[] + overrides? |
string |
syntax only -- overrides merge with defaults. Lossless serializer, no gating |
Per-call override merging (since 1.0.11)
Per-call overrides are shallow-merged onto defaults, but syntax and tagName additionally merge one level deep so
that partial overrides keep the rest of the defaults:
const dsl = createParser({
handlers,
syntax: {tagPrefix: "@@"},
});
// This override merges into the existing syntax -- tagPrefix stays "@@"
dsl.parse(text, {syntax: {tagDivider: ";"}});
// Effective syntax: { tagPrefix: "@@", tagDivider: ";", ...rest from DEFAULT_SYNTAX }
Internally, createParser performs the merge like this:
const merge = <T extends ParserBaseOptions>(overrides: T): ParseOptions & T => {
const merged = {...defaults, ...overrides};
if (defaults.syntax && overrides.syntax) {
merged.syntax = {...defaults.syntax, ...overrides.syntax};
}
if (defaults.tagName && overrides.tagName) {
merged.tagName = {...defaults.tagName, ...overrides.tagName};
}
return merged;
};
Note: one-level-deep merging only happens when both defaults and overrides contain the field. If only the
override has syntax, it is used as-is (no merge with defaults).
With vs without createParser
// Without createParser -- repetitive, must pass handlers everywhere
parseRichText(text1, {handlers});
parseRichText(text2, {handlers});
stripRichText(text3, {handlers});
parseStructural(text4, {handlers});
// With createParser -- bind once, use everywhere
const dsl = createParser({handlers});
dsl.parse(text1);
dsl.parse(text2);
dsl.strip(text3);
dsl.structural(text4);
dsl.print(tree);
parseRichText(text, options?)
function parseRichText(text: string, options?: ParseOptions): TextToken[];
The core parse function. Takes DSL source text and returns a TextToken[] tree. Unregistered or malformed tags degrade
to plain text -- never throws.
Parameters
| Parameter | Type | Description |
|---|---|---|
text |
string |
DSL source text. If empty, returns [] immediately. |
options |
ParseOptions |
Optional. Tag handlers, syntax config, error callback, and other settings. |
ParseOptions fields
| Field | Type | Default | Description |
|---|---|---|---|
handlers |
Record<string, TagHandler> |
{} |
Tag handler map. Keys are tag names, values define how each tag is parsed. |
allowForms |
readonly TagForm[] |
["inline","raw","block"] |
Restrict accepted tag forms. Unlisted forms degrade gracefully. |
depthLimit |
number |
50 |
Maximum nesting depth. Tags beyond this limit degrade to plain text. |
syntax |
Partial<SyntaxInput> |
DEFAULT_SYNTAX |
Override DSL syntax tokens. |
tagName |
Partial<TagNameConfig> |
DEFAULT_TAG_NAME |
Override tag-name character rules. |
createId |
CreateId |
() => "rt-${seed++}" |
Token id generator. Defaults to a parse-local counter: rt-0, rt-1, ... |
blockTags |
readonly BlockTagInput[] |
(derived from handlers) | Tags that receive block-level line-break normalization. Defaults to every tag with a raw/block handler. Each entry is either a plain tag name (both raw and block forms) or { tag, forms } to restrict to specific multiline forms. |
onError |
(error: ParseError) => void |
(silent) | Called for every parse error. If omitted, errors are silently discarded. |
trackPositions |
boolean |
false |
Attach position: SourceSpan to every TextToken. |
baseOffset |
number |
0 |
Base offset added to all source positions. Use when parsing a substring from a larger document. |
tracker |
PositionTracker |
(none) | Pre-built position tracker from the original full document. Build with buildPositionTracker(text). |
mode |
"render" |
"render" |
Parse mode. Currently only "render" is supported. |
implicitInlineShorthand |
boolean | readonly string[] |
false |
Enable tag(...) shorthand for inline tags. true = all handlers, string[] = allowlist. Since 1.3. See ParseOptions. |
See ParseOptions for the full deep-dive.
Return value
TextToken[] -- an array of token objects:
interface TextToken {
type: string; // "text" for plain text, or the handler-defined type
value: string | TextToken[]; // plain string or nested children
id: string; // sequential id (default) or custom via createId
position?: SourceSpan; // present when trackPositions is true
[key: string]: unknown; // handler-defined extra fields (url, lang, etc.)
}
When to use parseRichText vs createParser
Use createParser for application code where you reuse the same handler set. Use parseRichText directly for:
- One-off utility scripts
- When you need full per-call control over every option
- Testing and prototyping
Edge cases
- Empty string: returns
[]immediately (no allocation, no side effects). - No handlers: all tag-like syntax degrades to plain text.
- Unclosed tags: degrade to plain text with an
onErrorcallback if provided (error codes:INLINE_NOT_CLOSED,SHORTHAND_NOT_CLOSED,BLOCK_NOT_CLOSED,RAW_NOT_CLOSED). Surrounding content is never corrupted. - Malformed close markers: reported via
BLOCK_CLOSE_MALFORMEDorRAW_CLOSE_MALFORMEDand degrade gracefully. - Unexpected close: a close marker without a matching open is reported via
UNEXPECTED_CLOSEand treated as plain text. - Depth limit exceeded: the offending tag degrades to plain text with error code
DEPTH_LIMIT.
Example
import {parseRichText, createSimpleInlineHandlers} from "yume-dsl-rich-text";
const tokens = parseRichText("Hello $$bold(world)$$!", {
handlers: createSimpleInlineHandlers(["bold"]),
});
// [
// { type: "text", value: "Hello ", id: "rt-0" },
// { type: "bold", value: [{ type: "text", value: "world", id: "rt-1" }], id: "rt-2" },
// { type: "text", value: "!", id: "rt-3" },
// ]
stripRichText(text, options?)
function stripRichText(text: string, options?: ParseOptions): string;
Parses DSL text and extracts only the plain text content, discarding all tag structure.
Parameters
Identical to parseRichText. Accepts the same ParseOptions.
Return value
A plain string with all tag markup removed. Only the text content of tokens remains.
Implementation detail
Internally calls parseRichText(text, options) then extractText(tokens). The cost is identical to parseRichText --
there is no cheaper "strip-only" code path. If you need both tokens and plain text, call parseRichText once and then
extractText on the result to avoid parsing twice.
Examples
import {stripRichText, createSimpleInlineHandlers} from "yume-dsl-rich-text";
stripRichText("Hello $$bold(world)$$!", {
handlers: createSimpleInlineHandlers(["bold"]),
});
// "Hello world!"
stripRichText("");
// ""
Edge cases
- Empty string: returns
""immediately. - Degradation behavior is inherited from
parseRichText, which affects strip output:- Unregistered inline tags: strip outputs the inner content (delimiters stripped)
- Unregistered raw/block tags: strip outputs the entire raw markup (including delimiters)
- Unsupported forms or allowForms restrictions: strip outputs the entire raw markup
Use cases
- Extracting searchable plain text for indexing
- Generating text previews or summaries
- Building accessibility labels (e.g.
aria-label) - Character/word counting that ignores markup
parseStructural structural parse
function parseStructural(text: string, options?: StructuralParseOptions): StructuralNode[];
For structural consumers -- highlighting, linting, editors, source inspection, or any scenario where you need to know which tag form was used, not just the semantic result. Preserves the tag form (inline / raw / block) explicitly in the output tree.
It shares the same language configuration (handlers, allowForms, syntax, tagName, depthLimit,
trackPositions) as parseRichText, so you do not maintain two separate sets of DSL rules.
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
text |
string |
-- | DSL source. Returns [] for empty string. |
options.handlers |
Record<string, TagHandler> |
(none) | Tag recognition and form gating. Omit to accept all syntactically valid tags/forms without semantic gating. |
options.allowForms |
readonly TagForm[] |
(all forms) | Restrict accepted forms (requires handlers). |
options.depthLimit |
number |
50 |
Max nesting depth. |
options.syntax |
Partial<SyntaxInput> |
DEFAULT_SYNTAX |
Override syntax tokens. |
options.tagName |
Partial<TagNameConfig> |
DEFAULT_TAG_NAME |
Override tag-name character rules. |
options.trackPositions |
boolean |
false |
Attach position: SourceSpan to every StructuralNode. |
options.baseOffset |
number |
0 |
Base offset for position tracking. |
options.tracker |
PositionTracker |
(none) | Pre-built position tracker from the original document. |
options.implicitInlineShorthand |
boolean | readonly string[] |
false |
Enable tag(...) shorthand for inline tags. true = all handlers, string[] = allowlist. Since 1.3. |
StructuralParseOptions
interface StructuralParseOptions extends ParserBaseOptions {
trackPositions?: boolean;
}
interface ParserBaseOptions {
handlers?: Record<string, TagHandler>;
allowForms?: readonly TagForm[];
implicitInlineShorthand?: InlineShorthandOption;
depthLimit?: number;
syntax?: Partial<SyntaxInput>;
tagName?: Partial<TagNameConfig>;
baseOffset?: number;
tracker?: PositionTracker;
}
Note: StructuralParseOptions does not include createId, blockTags, mode, or onError -- those are
ParseOptions-only fields for the render pipeline.
StructuralNode variants
| Type | Fields | Description |
|---|---|---|
text |
value: string |
Plain text |
escape |
raw: string |
Escape sequence (e.g. \)) |
separator |
-- | Pipe ` |
inline |
tag: string, children: StructuralNode[], implicitInlineShorthand?: boolean |
$$tag(...)$$ or tag(...) shorthand |
raw |
tag: string, args: StructuralNode[], content: string |
$$tag(...)% ... %end$$ |
block |
tag: string, args: StructuralNode[], children: StructuralNode[] |
$$tag(...)* ... *end$$ |
All variants carry an optional position?: SourceSpan when trackPositions is enabled.
Example
import {parseStructural} from "yume-dsl-rich-text";
const tree = parseStructural("$$bold(hello)$$ and $$code(ts)%\nconst x = 1;\n%end$$");
// [
// { type: "inline", tag: "bold", children: [{ type: "text", value: "hello" }] },
// { type: "text", value: " and " },
// { type: "raw", tag: "code",
// args: [{ type: "text", value: "ts" }],
// content: "\nconst x = 1;\n" },
// ]
Handler gating behavior
When handlers is provided: gating is identical to parseRichText. The same supportsInlineForm decision table
and filterHandlersByForms logic are used (shared code in resolveOptions.ts, not mirrored). Handler functions
themselves are never called -- only the presence of inline / raw / block methods on the handler object
matters for determining which forms a tag supports.
When handlers is omitted: all syntactically valid tags in all forms are accepted. This is the typical mode for
syntax highlighting and linting tools that need to see the full source structure without semantic filtering.
Differences from parseRichText
These are features, not bugs -- the two parsers serve different audiences.
| Aspect | parseRichText |
parseStructural |
|---|---|---|
| Tag recognition | Same (shared ParserBaseOptions) |
Same (shared ParserBaseOptions) |
| Form gating | Same | Same |
| Line-break normalization | Always strips (render mode) | Always preserves |
Pipe | |
Part of text (consumed by handlers via parsePipeArgs) |
separator node in args; text elsewhere |
| Error reporting | onError callback |
Silent degradation |
| Escape handling | Unescaped at root level (produces literal text) | Structural escape nodes (preserves the raw sequence) |
| Position tracking | trackPositions on TextToken.position (normalized spans) |
trackPositions on StructuralNode.position (raw syntax spans) |
| Output type | TextToken[] |
StructuralNode[] |
Which one should I use?
- If your goal is rendering content (Vue component, HTML, terminal output), use
parseRichText. - If your goal is analyzing source structure (syntax highlighting, linting, editor integration, source inspection),
use
parseStructural.
Deprecated ambient state warning
Deprecated: when called inside a
withSyntax()/withTagNameConfig()wrapper without explicitoptions.syntax/options.tagName,parseStructuralstill reads the ambient state as a fallback, but this path is deprecated and emits aconsole.warn. Passoptions.syntax/options.tagNameexplicitly instead. This ambient fallback will be removed in a future major version.
printStructural(nodes, options?) -- structural print
function printStructural(nodes: StructuralNode[], options?: PrintOptions): string;
The inverse of parseStructural -- serializes a StructuralNode[] tree back to DSL source text. This is a lossless
serializer with no gating or validation. Inline nodes marked with implicitInlineShorthand: true are serialized as
shorthand (tag(...)) only when they appear in an inline-argument context; all other nodes use full syntax.
Parameters
| Parameter | Type | Description |
|---|---|---|
nodes |
StructuralNode[] |
The structural tree to serialize. |
options.syntax |
Partial<SyntaxInput> |
Override syntax tokens. Must match the syntax used during parseStructural. |
PrintOptions
interface PrintOptions {
syntax?: Partial<SyntaxInput>;
}
The syntax field is the only option. Internally, printStructural calls createSyntax(options?.syntax) to resolve
the full syntax config.
Round-trip example
When the structural tree preserves the original syntax-relevant information and the same syntax configuration is used, round-trip serialization is lossless:
import {parseStructural, printStructural} from "yume-dsl-rich-text";
const input = "Hello $$bold(world)$$!";
const tree = parseStructural(input);
printStructural(tree); // "Hello $$bold(world)$$!"
Programmatic tree building
You can construct StructuralNode[] trees programmatically and serialize them:
import type {StructuralNode} from "yume-dsl-rich-text";
import {printStructural} from "yume-dsl-rich-text";
const tree: StructuralNode[] = [
{type: "text", value: "Hello "},
{
type: "inline",
tag: "bold",
children: [{type: "text", value: "world"}],
},
];
printStructural(tree); // "Hello $$bold(world)$$"
Building a raw node:
const rawTree: StructuralNode[] = [
{
type: "raw",
tag: "code",
args: [{type: "text", value: "ts"}],
content: "\nconst x = 1;\n",
},
];
printStructural(rawTree); // "$$code(ts)%\nconst x = 1;\n%end$$"
Building a block node with pipe-separated args:
const blockTree: StructuralNode[] = [
{
type: "block",
tag: "info",
args: [
{type: "text", value: "title"},
{type: "separator"},
{type: "text", value: "subtitle"},
],
children: [{type: "text", value: "Block content here"}],
},
];
printStructural(blockTree); // "$$info(title|subtitle)*Block content here*end$$"
Shorthand-aware serialization (since 1.3)
When parseStructural is called with implicitInlineShorthand enabled, inline nodes produced by the shorthand syntax
carry implicitInlineShorthand: true. printStructural respects this flag when the node appears inside another inline
tag's arguments, producing tag(...) instead of $$tag(...)$$:
import {parseStructural, printStructural, createSimpleInlineHandlers} from "yume-dsl-rich-text";
const source = "$$bold(bold(x))$$";
const tree = parseStructural(source, {
handlers: createSimpleInlineHandlers(["bold"]),
implicitInlineShorthand: true,
});
printStructural(tree); // "$$bold(bold(x))$$" — inner `bold(x)` stays shorthand
At top level (outside any inline args), a shorthand-flagged node still serializes as full syntax to ensure the output is always valid DSL regardless of the consumer's shorthand configuration.
createParser integration
parser.print(nodes) inherits syntax from the parser's defaults closure.
Per-call overrides are deep-merged with defaults, matching the behavior of parse and structural:
import {createParser, createSimpleInlineHandlers} from "yume-dsl-rich-text";
const dsl = createParser({
syntax: {tagPrefix: "@@", tagOpen: "[", tagClose: "]", endTag: "]@@"},
handlers: createSimpleInlineHandlers(["bold"]),
});
const tree = dsl.structural("@@bold[hello]@@");
dsl.print(tree); // "@@bold[hello]@@" -- syntax inherited from createParser defaults
// Per-call override: deep-merged with defaults
dsl.print(tree, {syntax: {tagPrefix: "%%", endTag: "]%%"}});
// "%%bold[hello]%%" -- tagPrefix/endTag overridden, tagOpen/tagClose kept from defaults
This means you do not need to re-specify syntax when round-tripping through a createParser instance.
When you need a different syntax for a single print call, pass it as an override — defaults are not mutated.
Internal behavior
The printer switches on each node's type:
text: appendsnode.valueas-is.escape: appendsnode.rawas-is (preserves the escape sequence).separator: appendssyntax.tagDivider.inline: appendstagPrefix + tag + tagOpen + (recurse children) + endTag.raw: appendstagPrefix + tag + tagOpen + (recurse args) + rawOpen + content + rawClose. Thecontentfield is emitted as-is with no recursive processing.block: appendstagPrefix + tag + tagOpen + (recurse args) + blockOpen + (recurse children) + blockClose.
Position information (position fields) is ignored during printing.
buildZones(nodes) -- zone grouping
function buildZones(nodes: readonly StructuralNode[]): Zone[];
Groups a top-level StructuralNode[] into contiguous Zone[] for zone-level caching or batch processing.
Requires nodes parsed with trackPositions: true. Throws if the first node has no position (likely forgot
trackPositions). Empty input returns [] without error.
Rules
- Adjacent text / escape / separator / inline nodes merge into one zone
- Each raw or block node gets a dedicated zone
- A new zone starts after every raw / block node
Zone
interface Zone {
startOffset: number; // source offset (inclusive)
endOffset: number; // source offset (exclusive)
nodes: StructuralNode[];
}
Example
import {createParser, createSimpleInlineHandlers, createSimpleRawHandlers, buildZones} from "yume-dsl-rich-text";
const parser = createParser({
handlers: {
...createSimpleInlineHandlers(["bold"]),
...createSimpleRawHandlers(["code"]),
},
});
const tree = parser.structural(
"hello $$bold(world)$$\n$$code(ts)%\nlet x = 1;\n%end$$\nend",
{trackPositions: true},
);
const zones = buildZones(tree);
// zones[0]: { startOffset: 0, endOffset: 22, nodes: [text, inline:bold] }
// zones[1]: { startOffset: 22, endOffset: 49, nodes: [raw:code] }
// zones[2]: { startOffset: 49, endOffset: 53, nodes: [text] }
Use case: zone-level render caching
In an editor, you can cache rendered HTML per zone and only re-render zones that overlap with the edit range:
const tree = parser.structural(source, {trackPositions: true});
const zones = buildZones(tree);
for (const zone of zones) {
const key = source.slice(zone.startOffset, zone.endOffset);
if (cache.has(key)) continue; // reuse cached render
const tokens = parseSlice(source, zone.nodes[0].position, parser, tracker);
cache.set(key, renderTokens(tokens));
}
If the tree contains nodes whose form is not supported by the runtime parser, they are still printed with full syntax and will naturally degrade to plain text when re-parsed. This is intentional -- the printer is a serializer, not a validator.
Incremental parsing: parseIncremental / createIncrementalSession
Incremental structural caching API for editor workflows. This maintains an updated StructuralNode[] + Zone[]
snapshot across edits without reparsing the entire document every time.
Full details and examples: Incremental Parsing.
parseIncremental(source, options?)
function parseIncremental(
source: string,
options?: IncrementalParseOptions,
): IncrementalDocument;
Builds a snapshot by parsing structural tree with trackPositions: true (forced) and running buildZones(...).
createIncrementalSession(source, options?, sessionOptions?)
function createIncrementalSession(
source: string,
options?: IncrementalParseOptions,
sessionOptions?: IncrementalSessionOptions,
): IncrementalSession;
Creates a session-level incremental controller (getDocument / applyEdit / rebuild) with automatic full-fallback
handling for edits.
Performance: printStructural, buildZones, walkTokens, mapTokens
Measured on a ~200 KB document (126,270 chars, 12,161 structural nodes, 9,042 render tokens).
Environment: Kunpeng 920 aarch64 / Node v24.14.0 (1.1.6) — 5 rounds x 5 iterations, median.
| Operation | Time | Notes |
|---|---|---|
| printStructural (12,161 nodes) | 2.0 ms | Lossless round-trip verified |
| buildZones (→ 1,873 zones) | 0.87 ms | From 6,235 top-level structural nodes |
| walkTokens (9,042 visited) | 0.39 ms | Pre-order DFS over 9,042 render tokens |
| mapTokens identity | 0.86 ms | Returns same tree, no allocations beyond wrapper |
| mapTokens transform | 1.2 ms | Renames bold → strong across entire tree |
All five operations are sub-4ms on a 200 KB document. In an editor pipeline where parseStructural (1.1.6 ~24.6 ms)
is the bottleneck, printStructural / buildZones / walkTokens are effectively free.
For parseRichText and parseStructural benchmarks, see Performance.
For createEasyStableId benchmarks, see Stable Token IDs — Performance.
For unescapeInline / splitTokensByPipe benchmarks,
see Handler Utilities — Performance.