en DSL Syntax - chiba233/yumeDSL GitHub Wiki
Getting Started | API Reference
yume-dsl-rich-text has three tag forms. Memorize this diagram and you're good:
ββ Inline ββββββββββββββββββββββββββββββββββββββββ
β $$bold(Hello $$italic(world)$$)$$ β
β β
β β Opens and closes within text flow β
β β Content parsed recursively (nesting!) β
β β Most common: bold, italic, links β
β β
β Shorthand (1.3): $$bold(Hello italic(world))$$β
β β Requires implicitInlineShorthand β
β β Closed by `)` alone, no `$$` wrapper β
ββββββββββββββββββββββββββββββββββββββββββββββββββ
ββ Raw ββββββββββββββββββββββββββββββββββββββ
β $$code(typescript)% β
β const x = 1; β
β %end$$ β
β β
β β Content NOT parsed, kept verbatim β
β β Close marker %end$$ must be on own lineβ
β β For code blocks, math, etc. β
βββββββββββββββββββββββββββββββββββββββββββββ
ββ Block ββββββββββββββββββββββββββββββββββββ
β $$info(Note)* β
β This is $$bold(important)$$ info. β
β *end$$ β
β β
β β Content parsed recursively (like inline)β
β β Close marker *end$$ must be on own lineβ
β β For callouts, warnings, collapsibles β
βββββββββββββββββββββββββββββββββββββββββββββ
| Form | Syntax | Content parsed? | Close marker | Typical use |
|---|---|---|---|---|
| Inline | $$tag(content)$$ |
Yes |
)$$ in text flow |
Bold, links, inline annotations |
| Inline shorthand | tag(content) |
Yes |
) in text flow |
Same (requires implicitInlineShorthand, since 1.3) |
| Raw | $$tag(arg)% ... %end$$ |
No |
%end$$ on its own line |
Code, math |
| Block | $$tag(arg)* ... *end$$ |
Yes |
*end$$ on its own line |
Callouts, collapsibles |
Allowed: a-z, A-Z, 0-9, _, -. First character can't be a digit or -.
β
bold myTag h1 code_block custom-tag
β 1tag -tag
Want colons, dots, etc.? β Custom Tag Name Characters
$$tagName(content)$$
The most common form. Content is parsed recursively, tags can nest:
$$bold(Hello $$italic(beautiful)$$ world)$$
β bold wrapping an italic β totally valid. Nest as deep as you want (default limit: 50 levels).
Since 1.3 β requires
implicitInlineShorthandoption enabled.
Inside an inline argument context, registered tag names can use a shorter name(...) form β no $$ prefix needed:
$$bold(Hello italic(world))$$
is equivalent to:
$$bold(Hello $$italic(world)$$)$$
Rules:
- Only works inside inline args, never at top level.
- Full DSL syntax (
$$tag(...)$$) always takes priority over shorthand. - Literal parentheses in shorthand args must be escaped:
\(and\). - Only tags registered in
handlers(and supporting inline form) are recognized as shorthand.
The parser is a stack-top-priority stepping state machine. It scans characters left to right, always checking against the topmost frame's close token. Two core rules:
- Nearest match: the topmost frame gets first dibs on every token. If it matches, the frame closes. If not, the token is treated as text and the scanner moves on. No lookahead, no future prediction.
-
Longer token wins at the same position: when the shorthand close token (
)) is a prefix of the full endTag ()$$), and the full endTag matches at the current position, endTag takes priority β the shorthand yields and degrades to plain text. This is local lexical disambiguation, not syntactic lookahead.
The shorthand close token ) is a prefix of the full close token )$$. Without disambiguation, the shorthand would split )$$ into ) + $$, breaking the outer tag's close token atomicity.
$$bold(bold(hi)$$)$$
^
shorthand bold('s ) happens to be part of )$$
β if shorthand takes ), )$$ is broken, outer $$bold(...) can never close
β disambiguation: )$$ fully matches, shorthand yields, "bold(hi" degrades to text
β outer $$bold(...) closes normally β
$$bold(bold(hi $$italic(world)$$))$$
$$bold( -> push bold (wants )$$)
bold( -> push shorthand (wants ))
hi -> text
$$italic( -> push italic (wants )$$)
world -> text
)$$ -> italic closes
) -> shorthand wants ), next chars are )$$ not $$
-> current position is NOT )$$ -> shorthand closes normally
)$$ -> bold closes
Result: bold -> [shorthand bold -> [text "hi ", italic("world")]]. All three layers close correctly.
$$bold(bold(hi $$italic(world)$$)$$
Note the tail: world + )$$ (italic close) + )$$ (bold close).
$$bold( -> push bold (wants )$$)
bold( -> push shorthand (wants ))
hi -> text
$$italic( -> push italic (wants )$$)
world -> text
)$$ -> italic closes
)$$ -> shorthand wants ), but )$$ fully matches
-> yields, "bold(" degrades to text, bold rescans from argStartI
hi -> text (bold rescanning)
$$italic( -> push italic (wants )$$) (bold re-encounters same segment)
world -> text
)$$ -> italic closes (first )$$)
)$$ -> bold closes (second )$$)
Result: bold's children are "bold(hi " + italic("world"). The shorthand degraded to text, italic was re-parsed, and the two )$$ close italic and bold respectively.
$$bold(a(b(c)))$$
$$bold( -> push bold (wants )$$)
a( -> push shorthand a (wants ))
b( -> push shorthand b (wants ))
c -> text
) -> b closes (next chars are ))$$, not $$)
) -> a closes (next chars are )$$, not $$)
)$$ -> bold closes
Each ) doesn't overlap with )$$, so nearest match applies cleanly at every level.
All outputs below are from real parseRichText + extractText runs.
| Input | implicitInlineShorthand=false |
implicitInlineShorthand=true |
|---|---|---|
$$bold(bold(1))$$ |
bold(1) |
1 |
$$bold(Hello italic(world))$$ |
Hello italic(world) |
Hello world |
$$bold(倩ζ°γbold(γ$$italic(γ)$$)γγ)$$ζ£ζ©γγΎγγγ |
倩ζ°γbold(γγ)γγζ£ζ©γγΎγγγ |
倩ζ°γγγγγζ£ζ©γγΎγγγ |
Note: in default syntax, shorthand is only active inside inline-arg context; the switch only affects
name(...).
| Input | implicitInlineShorthand=false |
implicitInlineShorthand=true |
|---|---|---|
=bold<bold<>= |
bold< |
bold< |
=bold<bold<1>>= |
bold<1> |
1 |
=bold<bold<=bold<>=>= |
bold< |
bold< |
=bold<倩ζ°γbold<γ=italic<γ>=>γγ>=ζ£ζ©γγΎγγγ |
倩ζ°γbold<γγ>γγζ£ζ©γγΎγγγ |
倩ζ°γγγγγζ£ζ©γγΎγγγ |
=bold<倩ζ°γbold<γlink<baidu.com>=>γγ>=ζ£ζ©γγΎγγγ |
倩ζ°γbold<γlink<baidu.com>γγ>=ζ£ζ©γγΎγγγ |
倩ζ°γbold<γlink<baidu.com>γγ>=ζ£ζ©γγΎγγγ |
Note: malformed-input recovery is intentionally local (local success/local failure at nearest boundaries), and is not guaranteed to be byte-for-byte identical to historical versions.
Input:
=bold<倩ζ°γbold<γlink<baidu.com>=>γγ>=ζ£ζ©γγΎγγγExpected (current spec):
倩ζ°γbold<γlink<baidu.com>γγ>=ζ£ζ©γγΎγγγIntuitively, link<baidu.com> looks like a well-formed shorthand tag. But applying the rules:
- The
>=afterbaidu.comis a complete endTag β not "link's>followed by a stray=". -
Full close wins β
>=is indivisible and belongs to the outer=bold<...>=, which closes. - The inner
bold<andlink<have no independent close token β local degradation to text.
If > were allowed to win instead (letting link close), the shorthand would close and return to root context β but root cannot host shorthand, so the entire deduction chain becomes self-contradictory. EndTag priority is not a preference; it is the only logically consistent rule.
To make link close normally, just insert a space between > and = so they don't form an endTag:
=bold<倩ζ°γbold<γlink<baidu.com> =>γγ>=ζ£ζ©γγΎγγγRules are explicit; the user is in control.
See ParseOptions β implicitInlineShorthand for configuration.
$$tagName(arg)%
raw content β not parsed
even $$fake(tags)$$ are kept as-is
%end$$
arg in parentheses is an optional parameter (e.g., language name for code). %end$$ must be on its own line.
Example β code block:
$$code(typescript)%
function greet(name: string) {
return `Hello, ${name}!`;
}
%end$$
Handler receives: arg = "typescript", content = "function greet(name: string) {...}".
$$tagName(arg)*
content is parsed recursively β other tags work inside
*end$$
*end$$ must be on its own line. For larger structural containers.
Example β info callout:
$$info(Note)*
This is $$bold(important)$$ information.
*end$$
Inside any tag's parentheses, | separates parameters:
$$link(Click here | https://example.com)$$
$$link(Click here | https://example.com | _blank)$$
Works in raw/block tags too:
$$code(typescript | line-numbers)%
const x = 1;
%end$$
Use parsePipeArgs / PipeArgs in your handler to split them.
Use \| for a pipe that shouldn't be treated as a separator:
$$tag(a\|b | c)$$
β two params: "a|b" and "c"
Backslash \ is the default escape character. Escaping only works for tokens that have structural meaning in the current context β if a token has no special meaning at the current position, the backslash is kept as-is.
In default syntax, tagOpen = (, tagClose = ), endTag = )$$. Note that $$ is the tagPrefix, not an independent token β it cannot be escaped.
| Context | Escapable tokens | Notes |
|---|---|---|
| Root (top-level text) |
( (tagOpen), ) (tagClose), )$$ (endTag) |
Only tokens that can open or close tags. |, \\, etc. have no structural meaning at root β backslash is kept verbatim |
| Args (tag argument region) | tagOpen ((), tagClose ()), endTag ()$$), tagDivider (|), escapeChar (\\), rawOpen ()%), blockOpen ()*) |
Richest syntax context, most escapable tokens. But rawClose (%end$$) and blockClose (*end$$) have no meaning inside args β not escapable |
| Block content (block tag body) | tagOpen ((), tagClose ()), endTag ()$$), blockClose (\*end$$) |
Block bodies parse recursively β escape structural tokens + their own close marker |
| Raw content | rawClose (\%end$$) |
parseStructural does not generate escape nodes inside raw β content is kept verbatim. parseRichText recognizes \%end$$ and unescapes it to %end$$ in the output |
| Sequence | Output | Purpose |
|---|---|---|
\( |
( |
Prevent opening a tag |
\) |
) |
Prevent closing a tag |
\)$$ |
)$$ |
Prevent endTag |
| |
| |
Prevent pipe separation |
\\ |
\ |
Literal backslash |
| Sequence | Output | Purpose |
|---|---|---|
\( |
( |
Prevent opening a tag |
\)$$ |
)$$ |
Prevent endTag |
\*end$$ |
*end$$ |
Prevent block close marker |
| Sequence | Output | Purpose |
|---|---|---|
\( |
( |
Prevent opening a tag |
\)$$ |
)$$ |
Prevent endTag |
Note: At root level,
\|,\\,\%end$$,\*end$$etc. are not escaped β the backslash is kept verbatim, because these tokens have no structural meaning at root.
The escape character itself can be changed via SyntaxConfig.escapeChar. See Custom Syntax.
The parser never crashes or throws on malformed input. All unexpected patterns degrade to literal plain text, with errors reported when possible.
All degradation behavior is derived from two core rules β no special cases:
- Nearest match β the topmost stack frame gets first dibs on every token. A close token belongs to the nearest matching frame. No skipping, no lookahead.
-
Full close wins β when a short token (e.g.
)) is a prefix of a longer token (e.g.)$$), and the longer token fully matches at the current position, the longer token wins.
From these two rules, one degradation strategy follows: local error, local degradation.
- Correctly written parts keep their structure; only the layer that is actually broken degrades to plain text.
- The blast radius of an error is precisely contained to the nearest layer β parent and sibling nodes are unaffected.
- The parser never guesses user intent and never attempts heuristic "maximize salvage" recovery β determinism is preferred over cosmetically nicer output.
Example β shorthand yields to protect the outer structure:
$$bold(bold(hi)$$)$$
The )$$ after bold(hi is both the shorthand's ) and the outer bold's )$$. Full close wins β )$$ belongs to the outer bold; the shorthand bold( has no close token and degrades to plain text. The outer bold closes normally; its content is bold(hi. The trailing )$$ is a stray close marker.
If ) were allowed to win instead, the shorthand would close and split )$$ into ) + $$, destroying the outer bold's close token β the entire tree collapses.
| Situation | Behavior | Error code |
|---|---|---|
| Handler doesn't implement the form user wrote | Entire markup degrades to plain text | None (silent) |
| Tag not registered (not in handlers) | Parsed as inline form | None |
Nesting exceeds depthLimit
|
Tag head degrades to plain text | DEPTH_LIMIT |
| Unbalanced brackets (missing close bracket) | Forced inline child frame, character-by-character scan | Depends on inner state |
| Close marker missing (EOF without close) | Content falls back to plain text |
INLINE_NOT_CLOSED / SHORTHAND_NOT_CLOSED / RAW_NOT_CLOSED / BLOCK_NOT_CLOSED
|
%end$$ / *end$$ malformed |
Content falls back to plain text |
RAW_CLOSE_MALFORMED / BLOCK_CLOSE_MALFORMED
|
Stray )$$ with no matching open |
Output as plain text | UNEXPECTED_CLOSE |
Shorthand ) competes with )$$ at same position |
Shorthand yields, tag head degrades to plain text | None (yield only) |
When a handler only declares some forms, writing an undeclared form β entire markup degrades to literal text, no error.
// handler only declares raw
code: { raw: (arg, content) => ... }
// user writes inline form β degrades to plain text
$$code(hello world)$$
β plain text output: "$$code(hello world)$$"
Decision rules (supportsInlineForm table, first match wins top-to-bottom):
| Condition | Result |
|---|---|
Global allowInline = false
|
Reject inline |
| Handler missing + tag not registered | Allow inline (passthrough) |
| Handler missing + tag is registered | Reject (filtered out by allowForms) |
Handler declares inline
|
Allow |
Handler only declares raw and/or block
|
Reject |
Handler is empty {}
|
Allow (passthrough) |
This is the most common gotcha. To nest a raw or block tag inside an inline argument, the handler must also declare inline.
The inline argument scanner advances character-by-character and checks every nested tag against supportsInlineForm before pushing a child frame. Only tags that pass this check enter the child frame; only inside the child frame can the parser see )% or )* to switch to the raw / block branch.
βββ must pass supportsInlineForm βββ
β β
$$bold( ... $$code(arg)%...%end$$ ... )$$
β
βββ code only has raw, no inline
β can't enter child frame β entire $$code(...)%...%end$$ becomes plain text
// β code only has raw β degrades to plain text when nested inside inline args
const handlers = {
bold: { inline: (tokens) => ({ type: "bold", value: tokens }) },
code: { raw: (arg, content) => ({ type: "code", value: content }) },
};
// $$bold(before $$code(ts)%const x = 1;%end$$ after)$$
// β code section inside bold becomes plain text// β
code also declares inline β parser enters child frame, then sees )% and switches to raw
const handlers = {
bold: { inline: (tokens) => ({ type: "bold", value: tokens }) },
code: {
inline: (tokens) => ({ type: "code", value: tokens }), // β add this
raw: (arg, content) => ({ type: "code", value: content }),
},
};Note: The
inlinehandler doesn't need complex logic β it's just a "ticket" to let the parser push a child frame for that tag. If your tag doesn't semantically need an inline form, the inline handler can return a fallback output (e.g., echo the text as-is).
Same applies to block tags nested inside inline arguments β they also need inline declared:
// β
warn declares inline + block β can nest block form inside inline args
warn: {
inline: (tokens) => ({ type: "warn", value: tokens }),
block: (arg, content) => ({ type: "warn", value: content }),
}When depth exceeds depthLimit (default 50), the tag head degrades to plain text with a DEPTH_LIMIT error. Outer tags are unaffected.
// with depthLimit: 3
$$a($$b($$c($$d(too deep)$$)$$)$$)$$
^^^^^^^^^^^^^^^^
degrades to plain text "$$d(too deep)$$"
Shorthand form also respects depthLimit (fixed in 1.3.1).
When brackets inside the argument region don't balance (e.g., raw parentheses in content, or a missing close bracket), the fast argument-close scanner fails. In this case:
- The tag does not degrade entirely to plain text
- A forced inline child frame scans character-by-character for the real close token (
)$$) - Only the innermost unbalanced tag is affected; outer tags are preserved
$$bold(Hello $$italic(world)$$)$$
β normal parse β
$$bold(Hello $$italic(world)$$ )$$
^ extra space but brackets balance β normal β
$$bold(some text with ( inside)$$
^ raw bracket β forced inline fallback β bold still closes correctly β
Tag opened but no close marker found before EOF:
-
inline: content falls back to plain text, reports
INLINE_NOT_CLOSED -
raw: content falls back to plain text, reports
RAW_NOT_CLOSED -
block: content falls back to plain text, reports
BLOCK_NOT_CLOSED -
shorthand: content falls back to plain text, reports
SHORTHAND_NOT_CLOSED(and may co-occur withINLINE_NOT_CLOSEDwhen outer full-form also remains unclosed)
$$bold(never closed
β plain text: "$$bold(never closed"
β error: INLINE_NOT_CLOSED
%end$$ / *end$$ not on its own line:
$$code(ts)%
const x = 1; %end$$ β not at line start
%end$$ β correct position
Reports RAW_CLOSE_MALFORMED / BLOCK_CLOSE_MALFORMED, content falls back to plain text.
)$$ appearing without a matching open:
Hello )$$ world
β plain text: "Hello )$$ world"
β error: UNEXPECTED_CLOSE