en DSL Syntax - chiba233/yumeDSL GitHub Wiki

DSL Syntax

Getting Started | API Reference

yume-dsl-rich-text has three tag forms. Memorize this diagram and you're good:


Three tag forms

β”Œβ”€ Inline ───────────────────────────────────────┐
β”‚  $$bold(Hello $$italic(world)$$)$$             β”‚
β”‚                                                β”‚
β”‚  β†’ Opens and closes within text flow           β”‚
β”‚  β†’ Content parsed recursively (nesting!)       β”‚
β”‚  β†’ Most common: bold, italic, links            β”‚
β”‚                                                β”‚
β”‚  Shorthand (1.3): $$bold(Hello italic(world))$$β”‚
β”‚  β†’ Requires implicitInlineShorthand            β”‚
β”‚  β†’ Closed by `)` alone, no `$$` wrapper        β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

β”Œβ”€ Raw ─────────────────────────────────────┐
β”‚  $$code(typescript)%                      β”‚
β”‚  const x = 1;                             β”‚
β”‚  %end$$                                   β”‚
β”‚                                           β”‚
β”‚  β†’ Content NOT parsed, kept verbatim      β”‚
β”‚  β†’ Close marker %end$$ must be on own lineβ”‚
β”‚  β†’ For code blocks, math, etc.            β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

β”Œβ”€ Block ───────────────────────────────────┐
β”‚  $$info(Note)*                            β”‚
β”‚  This is $$bold(important)$$ info.        β”‚
β”‚  *end$$                                   β”‚
β”‚                                           β”‚
β”‚  β†’ Content parsed recursively (like inline)β”‚
β”‚  β†’ Close marker *end$$ must be on own lineβ”‚
β”‚  β†’ For callouts, warnings, collapsibles   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Quick comparison

Form Syntax Content parsed? Close marker Typical use
Inline $$tag(content)$$ Yes )$$ in text flow Bold, links, inline annotations
Inline shorthand tag(content) Yes ) in text flow Same (requires implicitInlineShorthand, since 1.3)
Raw $$tag(arg)% ... %end$$ No %end$$ on its own line Code, math
Block $$tag(arg)* ... *end$$ Yes *end$$ on its own line Callouts, collapsibles

Tag names

Allowed: a-z, A-Z, 0-9, _, -. First character can't be a digit or -.

βœ… bold  myTag  h1  code_block  custom-tag
❌ 1tag  -tag

Want colons, dots, etc.? β†’ Custom Tag Name Characters


Inline tags

$$tagName(content)$$

The most common form. Content is parsed recursively, tags can nest:

$$bold(Hello $$italic(beautiful)$$ world)$$

β†’ bold wrapping an italic β€” totally valid. Nest as deep as you want (default limit: 50 levels).

Implicit inline shorthand

Since 1.3 β€” requires implicitInlineShorthand option enabled.

Inside an inline argument context, registered tag names can use a shorter name(...) form β€” no $$ prefix needed:

$$bold(Hello italic(world))$$

is equivalent to:

$$bold(Hello $$italic(world)$$)$$

Rules:

  • Only works inside inline args, never at top level.
  • Full DSL syntax ($$tag(...)$$) always takes priority over shorthand.
  • Literal parentheses in shorthand args must be escaped: \( and \).
  • Only tags registered in handlers (and supporting inline form) are recognized as shorthand.

Token competition and nearest match

The parser is a stack-top-priority stepping state machine. It scans characters left to right, always checking against the topmost frame's close token. Two core rules:

  1. Nearest match: the topmost frame gets first dibs on every token. If it matches, the frame closes. If not, the token is treated as text and the scanner moves on. No lookahead, no future prediction.
  2. Longer token wins at the same position: when the shorthand close token ()) is a prefix of the full endTag ()$$), and the full endTag matches at the current position, endTag takes priority β€” the shorthand yields and degrades to plain text. This is local lexical disambiguation, not syntactic lookahead.

Why token disambiguation is needed

The shorthand close token ) is a prefix of the full close token )$$. Without disambiguation, the shorthand would split )$$ into ) + $$, breaking the outer tag's close token atomicity.

$$bold(bold(hi)$$)$$
                ^
shorthand bold('s ) happens to be part of )$$
β†’ if shorthand takes ), )$$ is broken, outer $$bold(...) can never close
β†’ disambiguation: )$$ fully matches, shorthand yields, "bold(hi" degrades to text
β†’ outer $$bold(...) closes normally βœ…

) not overlapping )$$ β€” shorthand closes normally

$$bold(bold(hi $$italic(world)$$))$$
$$bold(     -> push bold (wants )$$)
bold(       -> push shorthand (wants ))
hi          -> text
$$italic(   -> push italic (wants )$$)
world       -> text
)$$         -> italic closes
)           -> shorthand wants ), next chars are )$$ not $$
             -> current position is NOT )$$ -> shorthand closes normally
)$$         -> bold closes

Result: bold -> [shorthand bold -> [text "hi ", italic("world")]]. All three layers close correctly.

Shorthand yields, rescan finds two )$$

$$bold(bold(hi $$italic(world)$$)$$

Note the tail: world + )$$ (italic close) + )$$ (bold close).

$$bold(     -> push bold (wants )$$)
bold(       -> push shorthand (wants ))
hi          -> text
$$italic(   -> push italic (wants )$$)
world       -> text
)$$         -> italic closes
)$$         -> shorthand wants ), but )$$ fully matches
             -> yields, "bold(" degrades to text, bold rescans from argStartI
hi          -> text (bold rescanning)
$$italic(   -> push italic (wants )$$) (bold re-encounters same segment)
world       -> text
)$$         -> italic closes (first )$$)
)$$         -> bold closes (second )$$)

Result: bold's children are "bold(hi " + italic("world"). The shorthand degraded to text, italic was re-parsed, and the two )$$ close italic and bold respectively.

Shorthands nesting inside shorthands

$$bold(a(b(c)))$$
$$bold(   -> push bold (wants )$$)
a(        -> push shorthand a (wants ))
b(        -> push shorthand b (wants ))
c         -> text
)         -> b closes (next chars are ))$$, not $$)
)         -> a closes (next chars are )$$, not $$)
)$$       -> bold closes

Each ) doesn't overlap with )$$, so nearest match applies cleanly at every level.

Classic cases (verified against current behavior)

All outputs below are from real parseRichText + extractText runs.

Default syntax ($$ / ) / )$$)

Input implicitInlineShorthand=false implicitInlineShorthand=true
$$bold(bold(1))$$ bold(1) 1
$$bold(Hello italic(world))$$ Hello italic(world) Hello world
$$bold(ε€©ζ°—γŒbold(い$$italic(い)$$)から)$$散歩しましょう ε€©ζ°—γŒbold(いい)から散歩しましょう ε€©ζ°—γŒγ„γ„γ‹γ‚‰ζ•£ζ­©γ—γΎγ—γ‚‡γ†

Note: in default syntax, shorthand is only active inside inline-arg context; the switch only affects name(...).

Custom syntax (tagPrefix="=", tagOpen="<", tagClose=">", endTag=">=")

Input implicitInlineShorthand=false implicitInlineShorthand=true
=bold<bold<>= bold< bold<
=bold<bold<1>>= bold<1> 1
=bold<bold<=bold<>=>= bold< bold<
=bold<ε€©ζ°—γŒbold<い=italic<い>=>から>=散歩しましょう ε€©ζ°—γŒbold<いい>から散歩しましょう ε€©ζ°—γŒγ„γ„γ‹γ‚‰ζ•£ζ­©γ—γΎγ—γ‚‡γ†
=bold<ε€©ζ°—γŒbold<いlink<baidu.com>=>から>=散歩しましょう ε€©ζ°—γŒbold<いlink<baidu.com>から>=散歩しましょう ε€©ζ°—γŒbold<いlink<baidu.com>から>=散歩しましょう

Note: malformed-input recovery is intentionally local (local success/local failure at nearest boundaries), and is not guaranteed to be byte-for-byte identical to historical versions.

Ownership poison case (shorthand must be rejected)

Input:

=bold<ε€©ζ°—γŒbold<いlink<baidu.com>=>から>=散歩しましょう

Expected (current spec):

ε€©ζ°—γŒbold<いlink<baidu.com>から>=散歩しましょう

Intuitively, link<baidu.com> looks like a well-formed shorthand tag. But applying the rules:

  1. The >= after baidu.com is a complete endTag β€” not "link's > followed by a stray =".
  2. Full close wins β€” >= is indivisible and belongs to the outer =bold<...>=, which closes.
  3. The inner bold< and link< have no independent close token β†’ local degradation to text.

If > were allowed to win instead (letting link close), the shorthand would close and return to root context β€” but root cannot host shorthand, so the entire deduction chain becomes self-contradictory. EndTag priority is not a preference; it is the only logically consistent rule.

To make link close normally, just insert a space between > and = so they don't form an endTag:

=bold<ε€©ζ°—γŒbold<いlink<baidu.com> =>から>=散歩しましょう

Rules are explicit; the user is in control.

See ParseOptions β€” implicitInlineShorthand for configuration.


Raw tags

$$tagName(arg)%
raw content β€” not parsed
even $$fake(tags)$$ are kept as-is
%end$$

arg in parentheses is an optional parameter (e.g., language name for code). %end$$ must be on its own line.

Example β€” code block:

$$code(typescript)%
function greet(name: string) {
    return `Hello, ${name}!`;
}
%end$$

Handler receives: arg = "typescript", content = "function greet(name: string) {...}".


Block tags

$$tagName(arg)*
content is parsed recursively β€” other tags work inside
*end$$

*end$$ must be on its own line. For larger structural containers.

Example β€” info callout:

$$info(Note)*
This is $$bold(important)$$ information.
*end$$

Pipe parameters

Inside any tag's parentheses, | separates parameters:

$$link(Click here | https://example.com)$$
$$link(Click here | https://example.com | _blank)$$

Works in raw/block tags too:

$$code(typescript | line-numbers)%
const x = 1;
%end$$

Use parsePipeArgs / PipeArgs in your handler to split them.

Escaping a literal pipe

Use \| for a pipe that shouldn't be treated as a separator:

$$tag(a\|b | c)$$
β†’ two params: "a|b" and "c"

Escape sequences

Backslash \ is the default escape character. Escaping only works for tokens that have structural meaning in the current context β€” if a token has no special meaning at the current position, the backslash is kept as-is.

Escapable tokens by context

In default syntax, tagOpen = (, tagClose = ), endTag = )$$. Note that $$ is the tagPrefix, not an independent token β€” it cannot be escaped.

Context Escapable tokens Notes
Root (top-level text) ( (tagOpen), ) (tagClose), )$$ (endTag) Only tokens that can open or close tags. |, \\, etc. have no structural meaning at root β€” backslash is kept verbatim
Args (tag argument region) tagOpen ((), tagClose ()), endTag ()$$), tagDivider (|), escapeChar (\\), rawOpen ()%), blockOpen ()*) Richest syntax context, most escapable tokens. But rawClose (%end$$) and blockClose (*end$$) have no meaning inside args β€” not escapable
Block content (block tag body) tagOpen ((), tagClose ()), endTag ()$$), blockClose (\*end$$) Block bodies parse recursively β€” escape structural tokens + their own close marker
Raw content rawClose (\%end$$) parseStructural does not generate escape nodes inside raw β€” content is kept verbatim. parseRichText recognizes \%end$$ and unescapes it to %end$$ in the output

Common escapes inside args

Sequence Output Purpose
\( ( Prevent opening a tag
\) ) Prevent closing a tag
\)$$ )$$ Prevent endTag
| | Prevent pipe separation
\\ \ Literal backslash

Common escapes inside block content

Sequence Output Purpose
\( ( Prevent opening a tag
\)$$ )$$ Prevent endTag
\*end$$ *end$$ Prevent block close marker

Common escapes at root

Sequence Output Purpose
\( ( Prevent opening a tag
\)$$ )$$ Prevent endTag

Note: At root level, \|, \\, \%end$$, \*end$$ etc. are not escaped β€” the backslash is kept verbatim, because these tokens have no structural meaning at root.

The escape character itself can be changed via SyntaxConfig.escapeChar. See Custom Syntax.


Graceful degradation rules

The parser never crashes or throws on malformed input. All unexpected patterns degrade to literal plain text, with errors reported when possible.

Degradation principles

All degradation behavior is derived from two core rules β€” no special cases:

  1. Nearest match β€” the topmost stack frame gets first dibs on every token. A close token belongs to the nearest matching frame. No skipping, no lookahead.
  2. Full close wins β€” when a short token (e.g. )) is a prefix of a longer token (e.g. )$$), and the longer token fully matches at the current position, the longer token wins.

From these two rules, one degradation strategy follows: local error, local degradation.

  • Correctly written parts keep their structure; only the layer that is actually broken degrades to plain text.
  • The blast radius of an error is precisely contained to the nearest layer β€” parent and sibling nodes are unaffected.
  • The parser never guesses user intent and never attempts heuristic "maximize salvage" recovery β€” determinism is preferred over cosmetically nicer output.

Example β€” shorthand yields to protect the outer structure:

$$bold(bold(hi)$$)$$

The )$$ after bold(hi is both the shorthand's ) and the outer bold's )$$. Full close wins β†’ )$$ belongs to the outer bold; the shorthand bold( has no close token and degrades to plain text. The outer bold closes normally; its content is bold(hi. The trailing )$$ is a stray close marker.

If ) were allowed to win instead, the shorthand would close and split )$$ into ) + $$, destroying the outer bold's close token β€” the entire tree collapses.

Overview

Situation Behavior Error code
Handler doesn't implement the form user wrote Entire markup degrades to plain text None (silent)
Tag not registered (not in handlers) Parsed as inline form None
Nesting exceeds depthLimit Tag head degrades to plain text DEPTH_LIMIT
Unbalanced brackets (missing close bracket) Forced inline child frame, character-by-character scan Depends on inner state
Close marker missing (EOF without close) Content falls back to plain text INLINE_NOT_CLOSED / SHORTHAND_NOT_CLOSED / RAW_NOT_CLOSED / BLOCK_NOT_CLOSED
%end$$ / *end$$ malformed Content falls back to plain text RAW_CLOSE_MALFORMED / BLOCK_CLOSE_MALFORMED
Stray )$$ with no matching open Output as plain text UNEXPECTED_CLOSE
Shorthand ) competes with )$$ at same position Shorthand yields, tag head degrades to plain text None (yield only)

Form mismatch degradation

When a handler only declares some forms, writing an undeclared form β†’ entire markup degrades to literal text, no error.

// handler only declares raw
code: { raw: (arg, content) => ... }

// user writes inline form β†’ degrades to plain text
$$code(hello world)$$
β†’ plain text output: "$$code(hello world)$$"

Decision rules (supportsInlineForm table, first match wins top-to-bottom):

Condition Result
Global allowInline = false Reject inline
Handler missing + tag not registered Allow inline (passthrough)
Handler missing + tag is registered Reject (filtered out by allowForms)
Handler declares inline Allow
Handler only declares raw and/or block Reject
Handler is empty {} Allow (passthrough)

⚠️ Nesting raw / block tags inside inline arguments

This is the most common gotcha. To nest a raw or block tag inside an inline argument, the handler must also declare inline.

Why

The inline argument scanner advances character-by-character and checks every nested tag against supportsInlineForm before pushing a child frame. Only tags that pass this check enter the child frame; only inside the child frame can the parser see )% or )* to switch to the raw / block branch.

                    β”Œβ”€β”€ must pass supportsInlineForm ──┐
                    β”‚                                   β”‚
$$bold( ... $$code(arg)%...%end$$ ... )$$
              β”‚
              └── code only has raw, no inline
                  β†’ can't enter child frame β†’ entire $$code(...)%...%end$$ becomes plain text

Wrong

// ❌ code only has raw β†’ degrades to plain text when nested inside inline args
const handlers = {
    bold: { inline: (tokens) => ({ type: "bold", value: tokens }) },
    code: { raw: (arg, content) => ({ type: "code", value: content }) },
};

// $$bold(before $$code(ts)%const x = 1;%end$$ after)$$
// β†’ code section inside bold becomes plain text

Correct

// βœ… code also declares inline β†’ parser enters child frame, then sees )% and switches to raw
const handlers = {
    bold: { inline: (tokens) => ({ type: "bold", value: tokens }) },
    code: {
        inline: (tokens) => ({ type: "code", value: tokens }),  // ← add this
        raw: (arg, content) => ({ type: "code", value: content }),
    },
};

Note: The inline handler doesn't need complex logic β€” it's just a "ticket" to let the parser push a child frame for that tag. If your tag doesn't semantically need an inline form, the inline handler can return a fallback output (e.g., echo the text as-is).

Same applies to block tags nested inside inline arguments β€” they also need inline declared:

// βœ… warn declares inline + block β†’ can nest block form inside inline args
warn: {
    inline: (tokens) => ({ type: "warn", value: tokens }),
    block: (arg, content) => ({ type: "warn", value: content }),
}

Nesting depth limit

When depth exceeds depthLimit (default 50), the tag head degrades to plain text with a DEPTH_LIMIT error. Outer tags are unaffected.

// with depthLimit: 3
$$a($$b($$c($$d(too deep)$$)$$)$$)$$
                 ^^^^^^^^^^^^^^^^
                 degrades to plain text "$$d(too deep)$$"

Shorthand form also respects depthLimit (fixed in 1.3.1).


Unbalanced brackets (fixed in 1.3.1)

When brackets inside the argument region don't balance (e.g., raw parentheses in content, or a missing close bracket), the fast argument-close scanner fails. In this case:

  • The tag does not degrade entirely to plain text
  • A forced inline child frame scans character-by-character for the real close token ()$$)
  • Only the innermost unbalanced tag is affected; outer tags are preserved
$$bold(Hello $$italic(world)$$)$$
β†’ normal parse βœ…

$$bold(Hello $$italic(world)$$ )$$
                               ^ extra space but brackets balance β†’ normal βœ…

$$bold(some text with ( inside)$$
                       ^ raw bracket β†’ forced inline fallback β†’ bold still closes correctly βœ…

Unclosed tags

Tag opened but no close marker found before EOF:

  • inline: content falls back to plain text, reports INLINE_NOT_CLOSED
  • raw: content falls back to plain text, reports RAW_NOT_CLOSED
  • block: content falls back to plain text, reports BLOCK_NOT_CLOSED
  • shorthand: content falls back to plain text, reports SHORTHAND_NOT_CLOSED (and may co-occur with INLINE_NOT_CLOSED when outer full-form also remains unclosed)
$$bold(never closed
β†’ plain text: "$$bold(never closed"
β†’ error: INLINE_NOT_CLOSED

Malformed close markers

%end$$ / *end$$ not on its own line:

$$code(ts)%
const x = 1;  %end$$     ← not at line start
%end$$                     ← correct position

Reports RAW_CLOSE_MALFORMED / BLOCK_CLOSE_MALFORMED, content falls back to plain text.


Stray close markers

)$$ appearing without a matching open:

Hello )$$ world
β†’ plain text: "Hello )$$ world"
β†’ error: UNEXPECTED_CLOSE
⚠️ **GitHub.com Fallback** ⚠️