en API Reference - chiba233/yumeDSL GitHub Wiki

API Reference

Home | DSL Syntax | Custom Syntax

Four core functions, four jobs:

                        DSL source text
                            │
         ┌──────────────────┼──────────────────┐
         ▼                  ▼                  ▼
    parseRichText     stripRichText      parseStructural
    → TextToken[]     → plain string     → StructuralNode[]
    (for rendering)   (search/preview)   (highlight/lint/editor)
                                               │
                                               ▼
                                        printStructural
                                        → DSL source (lossless round-trip)

Recommended: use createParser to bind config once, then call .parse() / .strip() / .structural() / .print() everywhere.


createParser(defaults) -- recommended entry point

createParser binds your ParseOptions into a reusable instance. This is the recommended way to use the parser -- define your tag handlers once, then call dsl.parse() / dsl.strip() / dsl.structural() / dsl.print() everywhere without repeating config.

import {
    createParser,
    createSimpleInlineHandlers,
    parsePipeArgs,
} from "yume-dsl-rich-text";

const dsl = createParser({
    handlers: {
        ...createSimpleInlineHandlers(["bold", "italic", "underline"]),

        link: {
            inline: (tokens, ctx) => {
                const args = parsePipeArgs(tokens, ctx);
                return {
                    type: "link",
                    url: args.text(0),
                    value: args.materializedTailTokens(1),
                };
            },
        },
    },
});

// Use everywhere -- handlers are already bound
dsl.parse("Hello $$bold(world)$$!");
dsl.strip("Hello $$bold(world)$$!");

Parser interface

interface Parser {
    parse: (text: string, overrides?: ParseOptions) => TextToken[];
    strip: (text: string, overrides?: ParseOptions) => string;
    structural: (text: string, overrides?: StructuralParseOptions) => StructuralNode[];
    print: (nodes: StructuralNode[], overrides?: PrintOptions) => string;
}

What createParser binds

Most of the time you only need to bind handlers. The rest tags along for convenience.

Option What it does when pre-bound
handlers Your tag definitions -- the main reason to use createParser
syntax Custom syntax tokens (if you override $$ prefix, etc.)
tagName Custom tag-name character rules
allowForms Restrict accepted tag forms (default: all forms enabled)
implicitInlineShorthand Enable tag(...) shorthand for inline tags (since 1.3)
depthLimit Nesting limit -- rarely changes per call
createId Custom token id generator (can be overridden per call)
blockTags Tags that receive block-level line-break normalization
onError Default error handler (can still be overridden per call)
trackPositions Attach source positions to all output nodes (can be overridden per call)

Methods

Method Input Output Inherits from defaults
parse DSL text + overrides? TextToken[] All ParseOptions -- overrides merge one level deep for syntax/tagName
strip DSL text + overrides? string Same as parse
structural DSL text + overrides? StructuralNode[] handlers, allowForms, implicitInlineShorthand, syntax, tagName, depthLimit, trackPositions
print StructuralNode[] + overrides? string syntax only -- overrides merge with defaults. Lossless serializer, no gating

Per-call override merging (since 1.0.11)

Per-call overrides are shallow-merged onto defaults, but syntax and tagName additionally merge one level deep so that partial overrides keep the rest of the defaults:

const dsl = createParser({
    handlers,
    syntax: {tagPrefix: "@@"},
});

// This override merges into the existing syntax -- tagPrefix stays "@@"
dsl.parse(text, {syntax: {tagDivider: ";"}});
// Effective syntax: { tagPrefix: "@@", tagDivider: ";", ...rest from DEFAULT_SYNTAX }

Internally, createParser performs the merge like this:

const merge = <T extends ParserBaseOptions>(overrides: T): ParseOptions & T => {
    const merged = {...defaults, ...overrides};
    if (defaults.syntax && overrides.syntax) {
        merged.syntax = {...defaults.syntax, ...overrides.syntax};
    }
    if (defaults.tagName && overrides.tagName) {
        merged.tagName = {...defaults.tagName, ...overrides.tagName};
    }
    return merged;
};

Note: one-level-deep merging only happens when both defaults and overrides contain the field. If only the override has syntax, it is used as-is (no merge with defaults).

With vs without createParser

// Without createParser -- repetitive, must pass handlers everywhere
parseRichText(text1, {handlers});
parseRichText(text2, {handlers});
stripRichText(text3, {handlers});
parseStructural(text4, {handlers});

// With createParser -- bind once, use everywhere
const dsl = createParser({handlers});
dsl.parse(text1);
dsl.parse(text2);
dsl.strip(text3);
dsl.structural(text4);
dsl.print(tree);

parseRichText(text, options?)

function parseRichText(text: string, options?: ParseOptions): TextToken[];

The core parse function. Takes DSL source text and returns a TextToken[] tree. Unregistered or malformed tags degrade to plain text -- never throws.

Parameters

Parameter Type Description
text string DSL source text. If empty, returns [] immediately.
options ParseOptions Optional. Tag handlers, syntax config, error callback, and other settings.

ParseOptions fields

Field Type Default Description
handlers Record<string, TagHandler> {} Tag handler map. Keys are tag names, values define how each tag is parsed.
allowForms readonly TagForm[] ["inline","raw","block"] Restrict accepted tag forms. Unlisted forms degrade gracefully.
depthLimit number 50 Maximum nesting depth. Tags beyond this limit degrade to plain text.
syntax Partial<SyntaxInput> DEFAULT_SYNTAX Override DSL syntax tokens.
tagName Partial<TagNameConfig> DEFAULT_TAG_NAME Override tag-name character rules.
createId CreateId () => "rt-${seed++}" Token id generator. Defaults to a parse-local counter: rt-0, rt-1, ...
blockTags readonly BlockTagInput[] (derived from handlers) Tags that receive block-level line-break normalization. Defaults to every tag with a raw/block handler. Each entry is either a plain tag name (both raw and block forms) or { tag, forms } to restrict to specific multiline forms.
onError (error: ParseError) => void (silent) Called for every parse error. If omitted, errors are silently discarded.
trackPositions boolean false Attach position: SourceSpan to every TextToken.
baseOffset number 0 Base offset added to all source positions. Use when parsing a substring from a larger document.
tracker PositionTracker (none) Pre-built position tracker from the original full document. Build with buildPositionTracker(text).
mode "render" "render" Parse mode. Currently only "render" is supported.
implicitInlineShorthand boolean | readonly string[] false Enable tag(...) shorthand for inline tags. true = all handlers, string[] = allowlist. Since 1.3. See ParseOptions.

See ParseOptions for the full deep-dive.

Return value

TextToken[] -- an array of token objects:

interface TextToken {
    type: string;                 // "text" for plain text, or the handler-defined type
    value: string | TextToken[];  // plain string or nested children
    id: string;                   // sequential id (default) or custom via createId
    position?: SourceSpan;        // present when trackPositions is true
    [key: string]: unknown;       // handler-defined extra fields (url, lang, etc.)
}

When to use parseRichText vs createParser

Use createParser for application code where you reuse the same handler set. Use parseRichText directly for:

  • One-off utility scripts
  • When you need full per-call control over every option
  • Testing and prototyping

Edge cases

  • Empty string: returns [] immediately (no allocation, no side effects).
  • No handlers: all tag-like syntax degrades to plain text.
  • Unclosed tags: degrade to plain text with an onError callback if provided (error codes: INLINE_NOT_CLOSED, SHORTHAND_NOT_CLOSED, BLOCK_NOT_CLOSED, RAW_NOT_CLOSED). Surrounding content is never corrupted.
  • Malformed close markers: reported via BLOCK_CLOSE_MALFORMED or RAW_CLOSE_MALFORMED and degrade gracefully.
  • Unexpected close: a close marker without a matching open is reported via UNEXPECTED_CLOSE and treated as plain text.
  • Depth limit exceeded: the offending tag degrades to plain text with error code DEPTH_LIMIT.

Example

import {parseRichText, createSimpleInlineHandlers} from "yume-dsl-rich-text";

const tokens = parseRichText("Hello $$bold(world)$$!", {
    handlers: createSimpleInlineHandlers(["bold"]),
});
// [
//   { type: "text", value: "Hello ", id: "rt-0" },
//   { type: "bold", value: [{ type: "text", value: "world", id: "rt-1" }], id: "rt-2" },
//   { type: "text", value: "!", id: "rt-3" },
// ]

stripRichText(text, options?)

function stripRichText(text: string, options?: ParseOptions): string;

Parses DSL text and extracts only the plain text content, discarding all tag structure.

Parameters

Identical to parseRichText. Accepts the same ParseOptions.

Return value

A plain string with all tag markup removed. Only the text content of tokens remains.

Implementation detail

Internally calls parseRichText(text, options) then extractText(tokens). The cost is identical to parseRichText -- there is no cheaper "strip-only" code path. If you need both tokens and plain text, call parseRichText once and then extractText on the result to avoid parsing twice.

Examples

import {stripRichText, createSimpleInlineHandlers} from "yume-dsl-rich-text";

stripRichText("Hello $$bold(world)$$!", {
    handlers: createSimpleInlineHandlers(["bold"]),
});
// "Hello world!"

stripRichText("");
// ""

Edge cases

  • Empty string: returns "" immediately.
  • Degradation behavior is inherited from parseRichText, which affects strip output:
    • Unregistered inline tags: strip outputs the inner content (delimiters stripped)
    • Unregistered raw/block tags: strip outputs the entire raw markup (including delimiters)
    • Unsupported forms or allowForms restrictions: strip outputs the entire raw markup

Use cases

  • Extracting searchable plain text for indexing
  • Generating text previews or summaries
  • Building accessibility labels (e.g. aria-label)
  • Character/word counting that ignores markup

parseStructural structural parse

function parseStructural(text: string, options?: StructuralParseOptions): StructuralNode[];

For structural consumers -- highlighting, linting, editors, source inspection, or any scenario where you need to know which tag form was used, not just the semantic result. Preserves the tag form (inline / raw / block) explicitly in the output tree.

It shares the same language configuration (handlers, allowForms, syntax, tagName, depthLimit, trackPositions) as parseRichText, so you do not maintain two separate sets of DSL rules.

Parameters

Parameter Type Default Description
text string -- DSL source. Returns [] for empty string.
options.handlers Record<string, TagHandler> (none) Tag recognition and form gating. Omit to accept all syntactically valid tags/forms without semantic gating.
options.allowForms readonly TagForm[] (all forms) Restrict accepted forms (requires handlers).
options.depthLimit number 50 Max nesting depth.
options.syntax Partial<SyntaxInput> DEFAULT_SYNTAX Override syntax tokens.
options.tagName Partial<TagNameConfig> DEFAULT_TAG_NAME Override tag-name character rules.
options.trackPositions boolean false Attach position: SourceSpan to every StructuralNode.
options.baseOffset number 0 Base offset for position tracking.
options.tracker PositionTracker (none) Pre-built position tracker from the original document.
options.implicitInlineShorthand boolean | readonly string[] false Enable tag(...) shorthand for inline tags. true = all handlers, string[] = allowlist. Since 1.3.

StructuralParseOptions

interface StructuralParseOptions extends ParserBaseOptions {
    trackPositions?: boolean;
}

interface ParserBaseOptions {
    handlers?: Record<string, TagHandler>;
    allowForms?: readonly TagForm[];
    implicitInlineShorthand?: InlineShorthandOption;
    depthLimit?: number;
    syntax?: Partial<SyntaxInput>;
    tagName?: Partial<TagNameConfig>;
    baseOffset?: number;
    tracker?: PositionTracker;
}

Note: StructuralParseOptions does not include createId, blockTags, mode, or onError -- those are ParseOptions-only fields for the render pipeline.

StructuralNode variants

Type Fields Description
text value: string Plain text
escape raw: string Escape sequence (e.g. \))
separator -- Pipe `
inline tag: string, children: StructuralNode[], implicitInlineShorthand?: boolean $$tag(...)$$ or tag(...) shorthand
raw tag: string, args: StructuralNode[], content: string $$tag(...)% ... %end$$
block tag: string, args: StructuralNode[], children: StructuralNode[] $$tag(...)* ... *end$$

All variants carry an optional position?: SourceSpan when trackPositions is enabled.

Example

import {parseStructural} from "yume-dsl-rich-text";

const tree = parseStructural("$$bold(hello)$$ and $$code(ts)%\nconst x = 1;\n%end$$");
// [
//   { type: "inline", tag: "bold", children: [{ type: "text", value: "hello" }] },
//   { type: "text", value: " and " },
//   { type: "raw", tag: "code",
//     args: [{ type: "text", value: "ts" }],
//     content: "\nconst x = 1;\n" },
// ]

Handler gating behavior

When handlers is provided: gating is identical to parseRichText. The same supportsInlineForm decision table and filterHandlersByForms logic are used (shared code in resolveOptions.ts, not mirrored). Handler functions themselves are never called -- only the presence of inline / raw / block methods on the handler object matters for determining which forms a tag supports.

When handlers is omitted: all syntactically valid tags in all forms are accepted. This is the typical mode for syntax highlighting and linting tools that need to see the full source structure without semantic filtering.

Differences from parseRichText

These are features, not bugs -- the two parsers serve different audiences.

Aspect parseRichText parseStructural
Tag recognition Same (shared ParserBaseOptions) Same (shared ParserBaseOptions)
Form gating Same Same
Line-break normalization Always strips (render mode) Always preserves
Pipe | Part of text (consumed by handlers via parsePipeArgs) separator node in args; text elsewhere
Error reporting onError callback Silent degradation
Escape handling Unescaped at root level (produces literal text) Structural escape nodes (preserves the raw sequence)
Position tracking trackPositions on TextToken.position (normalized spans) trackPositions on StructuralNode.position (raw syntax spans)
Output type TextToken[] StructuralNode[]

Which one should I use?

  • If your goal is rendering content (Vue component, HTML, terminal output), use parseRichText.
  • If your goal is analyzing source structure (syntax highlighting, linting, editor integration, source inspection), use parseStructural.

Deprecated ambient state warning

Deprecated: when called inside a withSyntax() / withTagNameConfig() wrapper without explicit options.syntax / options.tagName, parseStructural still reads the ambient state as a fallback, but this path is deprecated and emits a console.warn. Pass options.syntax / options.tagName explicitly instead. This ambient fallback will be removed in a future major version.


printStructural(nodes, options?) -- structural print

function printStructural(nodes: StructuralNode[], options?: PrintOptions): string;

The inverse of parseStructural -- serializes a StructuralNode[] tree back to DSL source text. This is a lossless serializer with no gating or validation. Inline nodes marked with implicitInlineShorthand: true are serialized as shorthand (tag(...)) only when they appear in an inline-argument context; all other nodes use full syntax.

Parameters

Parameter Type Description
nodes StructuralNode[] The structural tree to serialize.
options.syntax Partial<SyntaxInput> Override syntax tokens. Must match the syntax used during parseStructural.

PrintOptions

interface PrintOptions {
    syntax?: Partial<SyntaxInput>;
}

The syntax field is the only option. Internally, printStructural calls createSyntax(options?.syntax) to resolve the full syntax config.

Round-trip example

When the structural tree preserves the original syntax-relevant information and the same syntax configuration is used, round-trip serialization is lossless:

import {parseStructural, printStructural} from "yume-dsl-rich-text";

const input = "Hello $$bold(world)$$!";
const tree = parseStructural(input);
printStructural(tree);  // "Hello $$bold(world)$$!"

Programmatic tree building

You can construct StructuralNode[] trees programmatically and serialize them:

import type {StructuralNode} from "yume-dsl-rich-text";
import {printStructural} from "yume-dsl-rich-text";

const tree: StructuralNode[] = [
    {type: "text", value: "Hello "},
    {
        type: "inline",
        tag: "bold",
        children: [{type: "text", value: "world"}],
    },
];

printStructural(tree);  // "Hello $$bold(world)$$"

Building a raw node:

const rawTree: StructuralNode[] = [
    {
        type: "raw",
        tag: "code",
        args: [{type: "text", value: "ts"}],
        content: "\nconst x = 1;\n",
    },
];

printStructural(rawTree);  // "$$code(ts)%\nconst x = 1;\n%end$$"

Building a block node with pipe-separated args:

const blockTree: StructuralNode[] = [
    {
        type: "block",
        tag: "info",
        args: [
            {type: "text", value: "title"},
            {type: "separator"},
            {type: "text", value: "subtitle"},
        ],
        children: [{type: "text", value: "Block content here"}],
    },
];

printStructural(blockTree);  // "$$info(title|subtitle)*Block content here*end$$"

Shorthand-aware serialization (since 1.3)

When parseStructural is called with implicitInlineShorthand enabled, inline nodes produced by the shorthand syntax carry implicitInlineShorthand: true. printStructural respects this flag when the node appears inside another inline tag's arguments, producing tag(...) instead of $$tag(...)$$:

import {parseStructural, printStructural, createSimpleInlineHandlers} from "yume-dsl-rich-text";

const source = "$$bold(bold(x))$$";
const tree = parseStructural(source, {
    handlers: createSimpleInlineHandlers(["bold"]),
    implicitInlineShorthand: true,
});
printStructural(tree);  // "$$bold(bold(x))$$"  — inner `bold(x)` stays shorthand

At top level (outside any inline args), a shorthand-flagged node still serializes as full syntax to ensure the output is always valid DSL regardless of the consumer's shorthand configuration.

createParser integration

parser.print(nodes) inherits syntax from the parser's defaults closure. Per-call overrides are deep-merged with defaults, matching the behavior of parse and structural:

import {createParser, createSimpleInlineHandlers} from "yume-dsl-rich-text";

const dsl = createParser({
    syntax: {tagPrefix: "@@", tagOpen: "[", tagClose: "]", endTag: "]@@"},
    handlers: createSimpleInlineHandlers(["bold"]),
});

const tree = dsl.structural("@@bold[hello]@@");
dsl.print(tree);  // "@@bold[hello]@@" -- syntax inherited from createParser defaults

// Per-call override: deep-merged with defaults
dsl.print(tree, {syntax: {tagPrefix: "%%", endTag: "]%%"}});
// "%%bold[hello]%%" -- tagPrefix/endTag overridden, tagOpen/tagClose kept from defaults

This means you do not need to re-specify syntax when round-tripping through a createParser instance. When you need a different syntax for a single print call, pass it as an override — defaults are not mutated.

Internal behavior

The printer switches on each node's type:

  • text: appends node.value as-is.
  • escape: appends node.raw as-is (preserves the escape sequence).
  • separator: appends syntax.tagDivider.
  • inline: appends tagPrefix + tag + tagOpen + (recurse children) + endTag.
  • raw: appends tagPrefix + tag + tagOpen + (recurse args) + rawOpen + content + rawClose. The content field is emitted as-is with no recursive processing.
  • block: appends tagPrefix + tag + tagOpen + (recurse args) + blockOpen + (recurse children) + blockClose.

Position information (position fields) is ignored during printing.


buildZones(nodes) -- zone grouping

function buildZones(nodes: readonly StructuralNode[]): Zone[];

Groups a top-level StructuralNode[] into contiguous Zone[] for zone-level caching or batch processing.

Requires nodes parsed with trackPositions: true. Throws if the first node has no position (likely forgot trackPositions). Empty input returns [] without error.

Rules

  • Adjacent text / escape / separator / inline nodes merge into one zone
  • Each raw or block node gets a dedicated zone
  • A new zone starts after every raw / block node

Zone

interface Zone {
    startOffset: number;   // source offset (inclusive)
    endOffset: number;     // source offset (exclusive)
    nodes: StructuralNode[];
}

Example

import {createParser, createSimpleInlineHandlers, createSimpleRawHandlers, buildZones} from "yume-dsl-rich-text";

const parser = createParser({
    handlers: {
        ...createSimpleInlineHandlers(["bold"]),
        ...createSimpleRawHandlers(["code"]),
    },
});

const tree = parser.structural(
    "hello $$bold(world)$$\n$$code(ts)%\nlet x = 1;\n%end$$\nend",
    {trackPositions: true},
);
const zones = buildZones(tree);
// zones[0]: { startOffset: 0,  endOffset: 22, nodes: [text, inline:bold] }
// zones[1]: { startOffset: 22, endOffset: 49, nodes: [raw:code] }
// zones[2]: { startOffset: 49, endOffset: 53, nodes: [text] }

Use case: zone-level render caching

In an editor, you can cache rendered HTML per zone and only re-render zones that overlap with the edit range:

const tree = parser.structural(source, {trackPositions: true});
const zones = buildZones(tree);

for (const zone of zones) {
    const key = source.slice(zone.startOffset, zone.endOffset);
    if (cache.has(key)) continue;  // reuse cached render
    const tokens = parseSlice(source, zone.nodes[0].position, parser, tracker);
    cache.set(key, renderTokens(tokens));
}

If the tree contains nodes whose form is not supported by the runtime parser, they are still printed with full syntax and will naturally degrade to plain text when re-parsed. This is intentional -- the printer is a serializer, not a validator.


Incremental parsing: parseIncremental / createIncrementalSession

Incremental structural caching API for editor workflows. This maintains an updated StructuralNode[] + Zone[] snapshot across edits without reparsing the entire document every time.

Full details and examples: Incremental Parsing.

parseIncremental(source, options?)

function parseIncremental(
  source: string,
  options?: IncrementalParseOptions,
): IncrementalDocument;

Builds a snapshot by parsing structural tree with trackPositions: true (forced) and running buildZones(...).

createIncrementalSession(source, options?, sessionOptions?)

function createIncrementalSession(
  source: string,
  options?: IncrementalParseOptions,
  sessionOptions?: IncrementalSessionOptions,
): IncrementalSession;

Creates a session-level incremental controller (getDocument / applyEdit / rebuild) with automatic full-fallback handling for edits.


Performance: printStructural, buildZones, walkTokens, mapTokens

Measured on a ~200 KB document (126,270 chars, 12,161 structural nodes, 9,042 render tokens). Environment: Kunpeng 920 aarch64 / Node v24.14.0 (1.1.6) — 5 rounds x 5 iterations, median.

Operation Time Notes
printStructural (12,161 nodes) 2.0 ms Lossless round-trip verified
buildZones (→ 1,873 zones) 0.87 ms From 6,235 top-level structural nodes
walkTokens (9,042 visited) 0.39 ms Pre-order DFS over 9,042 render tokens
mapTokens identity 0.86 ms Returns same tree, no allocations beyond wrapper
mapTokens transform 1.2 ms Renames boldstrong across entire tree

All five operations are sub-4ms on a 200 KB document. In an editor pipeline where parseStructural (1.1.6 ~24.6 ms) is the bottleneck, printStructural / buildZones / walkTokens are effectively free.

For parseRichText and parseStructural benchmarks, see Performance. For createEasyStableId benchmarks, see Stable Token IDs — Performance. For unescapeInline / splitTokensByPipe benchmarks, see Handler Utilities — Performance.