en Linear Time Complexity - chiba233/yumeDSL GitHub Wiki

Linear-Time Complexity

Home | Performance | Source Position Tracking

This document now stays within one md: the front half is the theorem body for the current full-parse implementation, and the back half is an in-document appendix that preserves walkthroughs, branch inventories, historical paths, empirical constants, and incremental-parsing analysis. That keeps everything on one page without letting the theorem object drift into engineering commentary.

Proof target and main theorem

The main text proves exactly two claims, and only for the current implementation of the normal full-parse entry points.

Theorem S (parseStructural)

T_parseStructural(n) = Θ(n)

Theorem R (parseRichText, library-owned portion)

T_parseRichText, library-owned(n) = Θ(n)

If user-supplied handler work is included in the total wall-clock time, this document does not silently relabel that work as a library theorem. The precise total formula is instead:

T_parseRichText,total(n) = Θ(n + T_user-handlers(n))

Model, input size, assumptions, and proof boundary

  1. Input size: n = text.length, measured in UTF-16 code units.
  2. Proof object: the normal full-parse entry paths of parseStructural(text, options) and parseRichText(text, options); not parseIncremental(...), session.applyEdit(...), or session.applyEditWithDiff(...).
  3. Current-implementation boundary: the main text speaks only about the current structural / render / ownership / EOF-recovery implementation. Historical implementations and version evolution are moved to appendices instead of being interleaved with the theorem body.
  4. Render boundary: the parseRichText theorem counts library-owned render work on parser-produced structural trees: traversal, materialization, degradation, trimming, buffering, and token assembly performed by the library itself.
  5. Explicit exclusions: the complexity of user handler.inline/raw/block bodies, highly overlapping external trees manually injected by callers, and caller-driven output amplification are not smuggled into the library's Θ(n) guarantee.

Under that boundary, the main text does only four things: define the charging scheme, prove why the current implementation keeps those charged quantities linear, close the Θ(n) upper and lower bounds for parseStructural, and then lift the same boundary-aware argument to parseRichText.

Cost decomposition and charging scheme

The main text now leaves branch inventory behind and works directly with a cost decomposition. The point is to define summable library-owned quantities and then show why none of them can repeatedly charge the same source region without bound. Let the source length be n = |source|.

Notation: decompose the structural phase into summable cost terms

Let:

  • N_iter: total iterations of the top-level while (stack.length > 0) loop;
  • B_skip: total bytes skipped by findNextBoundaryChar(...) over ordinary text runs;
  • B_escape: total bytes consumed by escape / literal-text branches;
  • B_head: total bytes scanned by tag-head / end-tag / separator / shorthand-head recognition;
  • B_form: total bytes scanned by form-classification scans such as findTagArgClose(...);
  • B_close: total bytes scanned by raw / block close finders;
  • B_probe: total bytes touched by ownership checks, ancestor-owner comparisons, and close-ownership probes;
  • B_replay: total bytes/work involved in EOF malformed-inline replay / downgrade / child attachment;
  • N_frame: total parse-frame creation / pop / parent-child attachment operations.

Then the structural phase can first be written as:

T_structural(n)
  = c0·N_iter
  + c1·B_skip
  + c2·B_escape
  + c3·B_head
  + c4·B_form
  + c5·B_close
  + c6·B_probe
  + c7·B_replay
  + c8·N_frame

Writing the formula is the easy part. The real proof obligation is to show that none of these terms can grow by repeatedly rescanning the same source region without bound.

First derivation layer: why B_skip + B_escape + B_head + B_form + B_close stays linear

B_skip <= n

findNextBoundaryChar(...) advances across maximal ordinary-text runs. Different loop iterations skip disjoint source intervals, so:

B_skip = Σ length(skipped_run_i) <= n

B_escape <= n

The escape branch either consumes escapeChar + nextChar as a pair or consumes the current byte as literal text. In either case frame.i moves forward, and bytes already consumed by the escape path are not re-consumed by that same escape path on the same frame. Hence:

B_escape = Σ length(escape_chunk_i) <= n

B_head = O(n)

B_head is the term that most often triggers skepticism because tag-head, end-tag, separator, and shorthand recognition can look like repeated speculative probing. The summable invariant is:

  1. successful head recognition corresponds to a real finite head interval in the source;
  2. failed recognition in the current implementation either degrades immediately to text or advances frame.i past the current decision point, rather than staying pinned at the same head start and repeating an unbounded full scan;
  3. in the current shorthand-ownership design, push and close are split, so the same tail is no longer re-judged through repeated frame-to-frame full reclassification.

Lemma H (head-attempt charging). For any source position p, the number of times it can be charged into B_head is bounded by a constant.

The point is not to say only “it seems not to rescan much,” but to pin down the finite event classes that may happen after one head-attempt:

  1. Successful recognition. The head interval is immediately assigned to a created structure (full tag, shorthand child, or end-tag-like dispatch), so the current frame does not keep retrying the same start position in the same role without bound;
  2. Failed recognition with forward progress. After failure, the implementation either degrades immediately to text or advances frame.i to a strictly larger position, so the same frame cannot remain pinned at that head start and perform unbounded full retries;
  3. Recovery replay. A failed frame's tag head may be replayed back into the parent during recovery, but the replay object is exactly one failed-frame head, and each failed-frame head enters that replay accounting at most once.

Therefore each position p can be charged to only finitely many event classes: one normal recognition/failure path plus constant-many recovery replays. There is no path on which the same offset is fully charged by unboundedly many head-attempts. Hence:

B_head <= C_head · n = O(n)

Therefore B_head can be treated as the sum of finitely many head scans, with each head start triggering only constant-many bounded scans:

B_head = Σ length(head_scan_i) = O(n)

B_form = O(n)

B_form is counted separately for form-classification scans such as findTagArgClose(...), rather than being folded into B_close. The reason is semantic as well as analytical: this family does not search for a raw/block closer, but classifies a newly encountered head once.

More precisely, each findTagArgClose(...) call is tied to the event “classify this newly encountered head”:

  1. on success, that head immediately enters the corresponding structure-creation path;
  2. on failure, that head degrades or causes the current frame to advance;
  3. under the current implementation, the same frame does not relaunch an unbounded number of fresh form-classification windows from the same start position.

Hence:

B_form = Σ length(form_scan_i) = O(n)

B_close = O(n)

It is not enough here to collapse every scan into one vague “close-search” bucket. The proof-relevant scanner families are:

  1. findRawClose for raw-content close search;
  2. findBlockClose for block-content close search, with its own depth/nesting state machine;
  3. shorthand ownership probe, which is not counted in B_close at all but in B_probe.

Lemma C (close-scan family charging). For each of findRawClose and findBlockClose, a given window is not relaunched as a fresh full search unboundedly many times under the same parent-frame context.

Concretely:

  • for one successful or failed findRawClose search, the internal scan pointer advances monotonically across the target raw window; after the search, the parent either advances to nextI or the structure degrades, so the same raw window is not reopened as an unbounded number of fresh full searches under the same parent frame;
  • for one successful or failed findBlockClose search, the call may manage block depth and skip nested raw/inline regions, but the scan pointer inside that call is still monotone over the inspected block window; the current implementation does not relaunch the same block window as a fresh full search unboundedly many times under the same parent frame;

So the linearity of B_close is not “trust me, these scans are all roughly the same”; it is that the genuine close-finder families for raw/block windows separately satisfy the charging condition “window-local monotone advance + no unbounded relaunch.”

So:

B_close = Σ length(close_scan_i) = O(n)

Second derivation layer: why recovery remains linear under the current implementation

This subsection gives only the closure formula for recovery. The actual proof bridges are the support lemmas later in this section:

  1. E1: why the probe performed by resolveShorthandOwnershipPush(...) is amortized linear;
  2. E2: why close-site defer-parent is a one-shot downgrade rather than a multi-round rewind of the same suffix;
  3. E3: why EOF malformed-inline replay introduces only one extra recovery rescan.

ancestorEndTagOwnerIndex, buildMalformedInlineReplayPlan(...), replayMalformedInlineChainAtEof(...), and downgradeInlineIntoParent(...) matter here because they are the implementation facts that make E1/E2/E3 true; this paragraph is not intended to replace those lemmas with a summary slogan.

So the new recovery term looks like:

T_new-recovery(m)
  = Θ(number_of_frames)
  + Θ(total_replayed_head_bytes)
  + Θ(total_child_attach_ops)

Each frame, replayed head segment, and child attachment is charged only once in that accounting, so they can be absorbed into:

B_probe + B_replay = O(n)

In an even more explicit summation form:

T_new-recovery(m) = Σ Θ(1) + Σ replay_bytes_i + Σ attach_i = O(m) <= O(n)

That is the formal source of why the current recovery term still folds into the linear upper bound.

Third derivation layer: close the structural formula

Collecting the previous bounds gives:

T_structural(n)
  = c0·N_iter
  + c1·B_skip
  + c2·B_escape
  + c3·B_head
  + c4·B_form
  + c5·B_close
  + c6·B_probe
  + c7·B_replay
  + c8·N_frame

with:

N_iter  = O(n)
B_skip  <= n
B_escape <= n
B_head  = O(n)
B_form  = O(n)
B_close = O(n)
B_probe = O(n)
B_replay = O(n)
N_frame = O(n)

Therefore there exists a constant C_s > 0 such that:

T_structural(n) <= C_s·n

And structural parsing is not o(n) either, because it must at least inspect the input's boundary-relevant characters and perform node/frame work proportional to the parse it constructs. So:

T_structural(n) = Θ(n)

Fourth derivation layer: cost decomposition for the render phase

Let:

  • N_render_iter: iterations of the explicit render-stack main loop;
  • B_textbuf: total text-like bytes touched by bufferText(...) / flushTextBuf(...);
  • B_raw_unescape: total bytes scanned while unescaping raw content inside renderRawNode(...);
  • B_materialize: total token/byte volume materialized into TextToken[] on unknown-tag passthrough / degrade paths;
  • B_trim: total token/byte volume handled by trim helpers such as trimBlockBoundaryTokens(...);
  • N_dispatch: total inline/raw/block/text/comment dispatch operations;
  • T_user-handlers(n): extra work performed inside user-supplied handler bodies on this input of length n.

Then the render side can be written as:

T_render(n)
  = d0·N_render_iter
  + d1·B_textbuf
  + d2·B_raw_unescape
  + d3·B_materialize
  + d4·B_trim
  + d5·N_dispatch
  + T_user-handlers(n)

The separation between library-owned work and user-handler work is essential; otherwise the claimed Θ(n) silently changes meaning.

Why the library-owned render terms are still linearly summable

  1. N_render_iter = O(number_of_nodes) = O(n): each structural node is pushed, popped, and dispatched only finitely many times;
  2. B_textbuf = O(n): text-like content enters the buffer and is flushed into tokens without unbounded repeated concatenation of the same library-owned text segment;
  3. B_raw_unescape = O(n): renderRawNode(...) performs a single forward scan over raw content, and lazy parts allocation changes only constants;
  4. B_materialize = O(n): full materialization of the same TextToken[] object happens only finitely many times; the current implementation tracks already-materialized arrays to avoid repeated full expansion of the same array object;
  5. B_trim = O(n): trim helpers touch only boundary token regions, and the cumulative size of those regions is bounded by total library-owned token volume.

Hence, when we count library-owned work only, we get:

T_render, library-owned(n)
  = d0·N_render_iter
  + d1·B_textbuf
  + d2·B_raw_unescape
  + d3·B_materialize
  + d4·B_trim
  + d5·N_dispatch
  <= C_r·n

so:

T_render, library-owned(n) = O(n)

And because render must still traverse a linear-size portion of nodes/output on the library-owned side, under the normal assumption that library-owned output volume is linear in input size, we also have:

T_render, library-owned(n) = Θ(n)

Fifth derivation layer: closing the rich-text theorem

So the complete, non-slogan formula is:

T_parseStructural(n) = T_structural(n) = Θ(n)
T_parseRichText, library-owned(n)
  = T_structural(n) + T_render, library-owned(n)
  = Θ(n) + Θ(n)
  = Θ(n)

If user-handler complexity is included, the precise statement must instead be:

T_parseRichText,total(n) = Θ(n + T_user-handlers(n))

That is why this page keeps insisting on library-owned Θ(n): not to dodge the hard part, but to separate the linear guarantee owned by the library from the extra complexity deliberately introduced by the caller.


Proof spine and dependency structure

To avoid reading the same Θ(n) claim as “two separate proofs,” the dependency order of the main text should be fixed explicitly:

  1. the main proof spine is the charging scheme in the previous section, where T_structural(n) and T_parseRichText, library-owned(n) are decomposed into summable cost terms;
  2. the next section is not a second independent proof; it is a support-lemma layer whose only purpose is to prove that those charged terms are each O(n);
  3. the theorem closes only once: by substituting the bounds established below back into the charging scheme above.

More concretely, the support obligations map to the earlier symbols as follows:

  • I₁ + main-loop iteration analysis support N_iter = O(n) and the claim that source-consumption terms do not overlap without bound;
  • the auxiliary-scan covering lemma + branch-level analysis support B_form = O(n) and B_close = O(n), together with B_head = O(n) for the structural scanning terms;
  • the recovery-overhead lemmas support B_probe = O(n) and B_replay = O(n);
  • the render-side linear-summation argument supports T_render, library-owned(n) = O(n).

So the intended reading order is not “one proof in the first half, another proof in the second half,” but:

charging scheme
  -> support lemmas for each charged term
  -> substitute bounds back into the total formula
  -> close the theorem

Support lemmas for proving the charged terms are O(n)

The intuition "every character is scanned once" needs a tighter statement. Here is the proof sketch, derived directly from the source.

The task of this section is not to re-prove the whole theorem from scratch. It is to discharge the upper-bound obligations required by the charging scheme above.

Invariant I₁: at the successful-frame level, each source position is consumed by exactly one frame

The structural parser (parseNodesWithFactory, structural.ts line 624+) maintains an explicit stack: ParseFrame[]. Each iteration of the main while (stack.length > 0) loop does exactly one of:

  1. Frame completionframe.i >= frame.textEnd: pops the frame (no character consumed).
  2. Character consumption — all other branches: advances frame.i by ≥ 1.

Definition. A frame f is called a successful frame if it completes via the A2 path (flushBuffer + stack.pop() + completeChild, structural.ts lines 648–652) rather than via the A1 error-recovery path (INLINE_NOT_CLOSED / SHORTHAND_NOT_CLOSED, lines 629–646). The root frame (returnKind === null) always completes via A2.

Definition. The consumed range of a frame f is the interval [startPos, endPos) that f actually scans through character-consuming iterations (startPos = initial value of f.i, endPos = value of f.i when f completes or is popped).

Invariant I₁ statement. At the end of parseNodesWithFactory execution, the consumed ranges of all successful frames form a partition of [0, n) — i.e., (a) their union = [0, n), and (b) they are pairwise disjoint.

Inductive proof

Strong induction on the total number of main-loop iterations T.

Base case (T = 1). The input is the empty string (n = 0). The root frame is created with frame.i = 0 = textEnd, immediately triggering A2 completion. The sole successful frame's consumed range is [0, 0) = ∅, consistent with [0, 0).

Inductive step. Assume I₁ holds for all executions with T′ < T iterations. Consider which branch the top-of-stack frame enters at iteration T. We analyze all 8 branches (A–H):

(A) Frame completion — frame.i >= frame.textEnd (lines 628–652).

  • A1 (unclosed inline error recovery, lines 629–646): The frame is NOT a successful frame; its consumed range is discarded. appendBuf(parent, frame.tagStartI, frame.argStartI) appends the tag-head text to the parent's buffer; parent.i = frame.argStartI makes the parent resume scanning from argStartI. This means the region [argStartI, textEnd) that the frame had scanned will be re-scanned by the parent (or its subsequent children). Since the frame itself is not a successful frame, this overlap does not violate I₁. But it introduces extra iterations — see Error recovery overhead below.

  • A2 (normal completion, lines 648–652): The frame is a successful frame. completeChild sets parent.i to the right endpoint of the child frame's consumed range:

    • inline: parent.i = closeStart + child.inlineCloseWidth (line 370). For full DSL inline, inlineCloseWidth = endTag.length; for normal shorthand close, inlineCloseWidth = tagClose.length. Under full-form close competition, shorthand does not close with width=0; it yields via ownership arbitration (degrades and rewinds to argStart).
    • rawArgs / blockArgs / blockContent: parent.i is already set to nextI by the respective push path (lines 609, 670), or the child frame's [contentStart, closeStart) naturally does not extend beyond the parent's textEnd.

    Thus the parent continues from where the child stopped — no overlap, no gap.

(B) Escape sequence (lines 659–672). frame.i = next, where next > i (readEscapedSequence advances by ≥ 1). Single-frame advancement, no frame operations.

(C) Inline frame argClose detection (lines 675–836). Fires only when inlineCloseToken !== null. The frame model uses two fields — inlineCloseToken (the token that closes this frame) and inlineCloseWidth (how many source chars to consume on close). Sub-branches:

  • C1 (shorthand frame, inlineCloseToken === tagClose, lines 678–684): It first runs same-position ownership arbitration: if the cursor is also a full-form endTag for the parent, shorthand yields and degrades; otherwise startsWith(tagClose, i) closes the shorthand frame with inlineCloseWidth = tagClose.length, then flushBuffer + stack.pop() + completeChild. Still O(1) per cursor, no per-level rescan.

  • C2 (full DSL frame, endTag match → inline close, lines 689–695): startsWith(endTag, i)inlineCloseWidth = endTag.length, flushBuffer + stack.pop() + completeChild. completeChild sets parent.i = closeStart + inlineCloseWidth.

  • C3 (rawOpen )% match, lines 638–697): After finding raw close, parent.i = nextI (line 609), where nextI = closeStart + rawClose.length. The child frame consumes [argStart, argClose). The raw content range [contentStart, closeStart) is not character-consumed by any frame (passed directly as a literal to factory.raw). The parent skips the entire construct. If findRawClose returns -1 (line 565), the degraded path treats the tag head + rawOpen as text, parent.i = contentStart.

  • C4 (blockOpen )* match, lines 699–770): Similar to C3. parent.i = nextI = closeStart + blockClose.length (line 670). A blockContent child frame is pushed with textEnd = closeStart, non-overlapping with the parent.

  • C5 (ordinary ) with no recognized suffix, lines 772–775): frame.i += tagClose.length. Single-frame advancement.

(D) Unexpected endTag in non-inline frame (lines 841–845). frame.i += endTag.length. Pure text consumption, single-frame advancement.

(E) Pipe separator (lines 850–858). frame.i += tagDivider.length. Fires only in insideArgs frames. Single-frame advancement.

(F) No tag match / ordinary character (lines 862–870). When readTagStartInfo returns null:

  • If the frame is inline (inlineCloseToken !== null), readInlineShorthandStart is tried. If a shorthand match is found and gating allows it, tryPushInlineShorthandChild pushes a child frame with inlineCloseToken = tagClose, implicitInlineShorthand = true — same push semantics as H1 but for the shorthand form. If no shorthand match, falls through.
  • Otherwise: appendBuf(frame, i, i + 1), frame.i++. The simplest branch — consumes 1 character.

(G) Depth limit degradation (lines 876–890). frame.i = degradedEnd, where degradedEnd = skipTagBoundary(…) returns the position after the tag boundary. Single-frame skip over the entire tag head. No child frame pushed.

(H) Tag head recognized, form determination and dispatch (lines 799–904).

  • H1 (nested tag inside inline frame, lines 892–918): When a full DSL tag head is detected inside an inline frame (inlineCloseToken !== null):

    • Current implementation does not close shorthand here via inlineCloseWidth = 0. Competition is resolved at close-token positions (C1/C2), not at tag-head detection.
    • tryPushInlineChild succeeds → pushes a child frame (child.i = argStart, child.textEnd = frame.textEnd). Does NOT call getTagCloserType. The parent's frame.i does not advance — the child takes over. Upon child completion, completeChild sets parent.i. If tryPushInlineChild is rejected by gating → the tag head is skipped via skipTagBoundary and treated as text.
  • H2 (non-inline frame, getTagCloserType returns null, lines 923–941): findTagArgClose found no matching ). The entire tag head degrades to text: appendBuf(frame, i, info.argStart), frame.i = info.argStart. frame.i advances by ≥ 1 (info.argStart > i because the tag name is ≥ 1 char + ().

  • H3 (inline form closerInfo.closer === endTag, lines 938–952): Same push logic as H1.

  • H4 (raw form, lines 946–995): findRawClose locates the closing marker. frame.i = nextI (line 830), pushes a rawArgs child frame with textEnd = argClose.

  • H5 (block form, lines 998–1060): Similar to H4. frame.i = nextI (line 892), pushes blockArgs then blockContent child frames. All child frame ranges stay within the parent's textEnd.

Synthesis. Across all branches:

  • When a child frame is pushed, its [startPos, endPos) is always a sub-interval of the parent's not-yet-consumed range.
  • When a child completes, completeChild or the push path sets parent.i past the child's consumed range.
  • Therefore parent and successful child consumed ranges do not overlap and have no gaps.

The root frame starts at 0 with textEnd = n. All successful frames' consumed ranges form a partition of [0, n).

Note. This proof does not cover findTagArgClose / findBlockClose / findInlineClose auxiliary boundary scans — they access the source outside the main loop. Their complexity is analyzed separately in Auxiliary scan covering lemma.

Recovery-overhead lemma under the current implementation

This section has to be read together with the 1.4.4 changelog, because this is exactly where the newly documented complexity escape hatch used to live.

Before 1.4.4, the most dangerous path was not ordinary well-formed inline nesting. It was the combination of broken shorthand / unclosed inline chains with ownership competition and EOF recovery, producing a pattern like “pop one frame → return to the parent → re-enter the main loop → discover the next layer is still broken”.

The problem was not that one individual branch was intrinsically expensive. The problem was that the same remaining suffix could be dragged back into the main loop repeatedly across the nested chain, so the accumulated work could form a triangular series:

k + (k - 1) + (k - 2) + ... + 1

That is O(k^2).

What 1.4.4 does is remove the two multipliers that made that escape hatch real:

  1. Nearest ancestor full-form close ownership is now carried explicitly on each frame. ParseFrame.ancestorEndTagOwnerIndex lets shorthand logic reach the nearest ancestor endTag owner in O(1) during both push and close arbitration. It no longer needs to “re-discover” that owner indirectly by falling back into the parent and re-observing the same close position.
  2. Malformed EOF inline chains are now planned once and replayed once. buildMalformedInlineReplayPlan(...) computes the whole malformed chain first, and replayMalformedInlineChainAtEof(...) emits errors, pops the chain, and performs only one recovery resume.

So after 1.4.4, the right mental model is no longer “every broken layer may drag the same tail back into the main loop again”. It is:

  • shorthand candidates that can be rejected at push time are rejected at push time,
  • shorthand closes that must yield to an ancestor full-form close are downgraded at the close site,
  • malformed EOF chains are replayed in one batch,
  • the parser no longer performs frame-by-frame re-entry across the same suffix.

The three recovery costs should therefore be analyzed separately.

E1. Push-time shorthand ownership probing is still linear

tryPushInlineShorthandChild(...) no longer relies on the older monolithic resolveShorthandOwnership(...) shape. It delegates directly to resolveShorthandOwnershipPush(...) in structuralOwnership.ts.

That helper does three conceptually different things:

  1. O(1) owner checks first

    • if the current frame itself is the full-form owner and argStart overlaps a same-cursor endTag, it immediately returns defer-parent;
    • if the nearest ancestor owner matches endTag at that same point, it also immediately returns defer-parent.
  2. A forward probe only when the current frame itself is a full-form owner

    • it scans rightward from argStart;
    • escape sequences are skipped in one jump;
    • a full DSL tag head stops the probe with reject = false;
    • a tagClose stops the probe and then checks whether that position is also the start of endTag.
  3. Probe results are cached in shorthandProbe

    • the cache key is effectively “this frame under this textEnd scan boundary, over this [startI, boundaryI] window”, not a global free-floating position;
    • later shorthand candidates inside the same reusable window reuse the cached result instead of probing the same window from scratch again.

So the key fact here is not “one probe may scan several characters”. The key fact is: the same window is not rebuilt over and over.

For a fixed frame and fixed textEnd, a source position is either:

  • visited during the first window-building probe, or
  • covered by later reuse of currentProbe.

Therefore the total push-time shorthand probing cost is the sum of first-time window constructions, which is linear under amortization, not “scan the same tail again for every shorthand candidate”.

E2. defer-parent at the close site is now a one-shot downgrade, not a full tail rewind

The other crucial 1.4.4 change is downgradeInlineIntoParent(...).

Its current semantics are:

  1. flushBuffer(frame): finalize any text still buffered in the child frame;
  2. appendBuf(parent, frame.tagStartI, frame.argStartI): replay the shorthand / inline tag head as plain parent text;
  3. flushBuffer(parent): stabilize the parent's current text state;
  4. appendNodesWithMergedText(parent.nodes, frame.nodes): attach the already parsed child nodes directly to the parent;
  5. parent.i = nextParentI: continue in the parent from the current close site onward.

This is fundamentally different from the old “rewind to argStartI and re-scan the whole tail” shape.

  • Old reading: the child frame's already parsed body is effectively thrown away, and the parent has to reconstruct the suffix again.
  • Current reading: the tag head is degraded to text, but the child frame's already established body nodes are preserved and attached upward; the parent continues from nextParentI.

So for each shorthand / inline frame that yields via defer-parent:

  • the frame can degrade at most once,
  • once degraded it leaves the active stack and does not compete for ownership again,
  • its already produced frame.nodes are moved upward only once.

If m_deg is the number of degraded frames in a parse and N_deg_nodes is the total number of nodes attached upward through this path, then:

T_defer-parent = O(m_deg + N_deg_nodes)

and both terms are ≤ O(n), so close-site ownership yielding remains O(n) overall.

E3. EOF malformed inline chains are now replayed in bulk, with only one extra resume scan

When tryFinalizeFrameAtEof(...) sees frame.inlineCloseToken !== null, it no longer handles the situation as “current frame errors → pop one frame → return to the main loop”. It directly calls replayMalformedInlineChainAtEof(frame).

That path has three distinct phases:

  1. Build the replay plan

    • buildMalformedInlineReplayPlan(startFrame, getFrameByParentIndex) walks only the parent chain;
    • it does not scan source characters;
    • therefore its cost is O(h), where h is the length of the malformed inline chain.
  2. Emit errors and pop the whole chain

    • each frame in replayPlan.chain emits one error;
    • then each is popped from the stack;
    • again this is O(h), proportional only to chain length.
  3. Perform a single recovery resume

    • if a first non-inline parent exists above the chain, the current implementation only replays the outermost degraded tag head from the plan: appendBuf(parent, resumeTagStartI, resumeArgStartI);
    • then it sets parent.i = resumeArgStartI;
    • then the parent continues.

The key point is: the whole malformed chain causes only one recovery resume scan.

EOF happens at most once in a parse call, so the cost of “resume from resumeArgStartI and scan what remains” is bounded by:

T_eof-rescan ≤ parent.textEnd - resumeArgStartI ≤ n

It is no longer multiplied by chain length, and it no longer forms the frame-by-frame triangular re-entry series.

Lemma E₁ (current implementation)

For a single parseNodesWithFactory(...) call, the extra work introduced by inline / shorthand ownership and EOF recovery satisfies:

T_recovery-extra(n)
    = T_probe + T_defer-parent + T_replay-plan + T_eof-rescan
    ≤ O(n) + O(n) + O(h) + O(n)
    = O(n)

where h is the malformed inline-chain length at EOF, and h ≤ the number of live frames ≤ O(n).

Proof ingredients.

  • T_probe: first-time probe-window construction in resolveShorthandOwnershipPush(...), linear under amortization;
  • T_defer-parent: each yielding frame degrades once, and its already produced nodes are attached upward once;
  • T_replay-plan: parent-chain walk only, no source scan;
  • T_eof-rescan: one resume after replay, not one resume per broken layer.

So inline / shorthand malformed recovery no longer creates quadratic re-entry.

Bounding main-loop iterations

For the main while loop inside parseNodesWithFactory, the post-1.4.4 count is best split into four parts:

  1. Successful-frame source-consuming iterationsn

    • directly from invariant I₁.
  2. Bulk non-boundary text-run append lengthn

    • since 1.4.3, plain text no longer enters the full branch tree one character at a time;
    • but the sum of all run lengths is still bounded by total source length.
  3. Extra ownership / downgrade / EOF-replay workc_err · n

    • from Lemma E₁ above;
    • importantly, c_err is now an implementation constant, not a frame-by-frame multiplier such as 2^D from the old re-entry reading.
  4. Stack operations: frame completion / push / pop / child attachment ≤ O(number of frames) ≤ O(n)

    • each frame corresponds to at least one recognized structural fragment or tag head;
    • push, pop, completeChild(...), and appendNodesWithMergedText(...) all run linearly in those frame / node counts.

So the combined loop-and-recovery bound is:

T_loop(n) ≤ c_scan · n + c_err · n + c_stack · n
         = C_loop · n

where C_loop depends on syntax density, input shape, and implementation constants, but not on n.

For well-formed input, c_err is close to 0. For malformed input that used to trigger the 1.4.4 bad path, c_err is still a constant, rather than a multiplier that grows with re-entry depth.

Auxiliary scan covering lemma

Root / block frames call getTagCloserType, which in turn calls findTagArgClose to classify tag form. Since 1.4.4, shorthand ownership probing must be included in the auxiliary-scan story as well; otherwise the document would still leave open the question “is the main loop linear while probe / replay work quietly explodes off to the side?”.

So this section should distinguish two auxiliary-scan families:

  1. root / block form-classification scans: findTagArgClose, findBlockClose, findRawClose
  2. inline shorthand ownership scans: the forward probe in resolveShorthandOwnershipPush(...)

Lemma A₁. For one complete parse, total auxiliary scan work satisfies:

T_aux(n) = T_form-scan(n) + T_shorthand-probe(n) ≤ c_aux · n

where c_aux depends on syntax and input shape, but not on n.

Proof ingredients.

  • findTagArgClose / findBlockClose / findRawClose still run only in root / block-level form classification, not once per inline nesting level;
  • after successful form classification, either a child consumes that region or the parent advances i to nextI, so those scan intervals are absorbed by subsequent consumption / skipping;
  • shorthand ownership probing is already covered by Lemma E₁ as linear under amortization because each (frame, textEnd) window is built once and then reused;
  • the pre-1.4.4 multiplier “auxiliary scan itself is cheap, but frame-by-frame replay multiplies it” is gone; the relationship is additive now, not multiplicative.

Therefore total auxiliary scan work still fits c_aux · n.

Bounding per-iteration work

For 1.4.3+, per-iteration work is best split into three buckets:

  1. Boundary-position branch cost

    • when execution actually enters the branch cascade, it runs a fixed set of startsWith / equality checks;
    • under the default syntax this is still well approximated by a bounded constant k.
  2. Bulk text-run append cost

    • appendBuf(...) should no longer be described as "always O(1)";
    • more precisely, one invocation is O(batch size), where batch size is the distance to the next boundary returned by findNextBoundaryChar(...);
    • but the sum of all such batch sizes is still ≤ n on the successful path.
  3. Auxiliary scan and cache cost

    • non-inline form detection may still perform findTagArgClose scans;
    • 1.4.3 adds getTagCloserTypeWithCache(...) for large windows, but its cache lifetime is only a single parse call;
    • escape-token matching now pays a one-time per-SyntaxConfig preparation cost, then checks only the same-lead candidate bucket on subsequent uses.

So the parser is no longer best described as "O(1) helper work per consumed character". The more accurate statement is:

  • successful-frame consumption + bulk text append = O(n)
  • root/block auxiliary scans = amortized O(n)
  • cache construction changes constants, not the asymptotic bound

Closed-form bound

Combining the components, we can write:

T_structural(n) ≤ k · n + s · n
                = C₁ · n

where:

  • k is the main-loop per-character branch budget (empirically about 3–15 under default syntax),
  • s is the amortized constant from auxiliary scans such as root/block-level form detection and block/raw closing-boundary searches.

So C₁ = k + s is a constant that depends on the implementation and input shape, but not on n.

For the render layer, the core traversal (renderNodes) visits each IndexedStructuralNode exactly once. If handlers further call materializeTextTokens, then under WeakSet-based skipping, each same-reference TextToken[] array is fully materialized at most once within one render pass.

Under a fixed handler implementation where single-handler output size remains linearly bounded by the input, we have nodes ≤ O(n) and tokens ≤ O(n), so:

T_render(n) ≤ C₂ · n

Preconditions. The bound above relies on the following constraints on handler implementations:

  1. Per-handler output size ≤ α · (source span of the input node), where α is a constant > depending on the handler. This guarantees total handler output ≤ α · n.
  2. Handlers do not perform super-linear external operations (e.g., full-text search, > sorting). If a handler contains O(n log n) or higher logic, T_render's bound increases > accordingly.
  3. materializeTextTokens's WeakSet deduplication is keyed by array identity: > the same TextToken[] reference is materialized at most once within one render pass. > Distinct arrays with equivalent logical content are still materialized separately.

These conditions are satisfied by all built-in yumeDSL handlers. Third-party custom handlers must ensure them independently.

Therefore:

T_parseRichText(n) = T_structural(n) + T_render(n) ≤ (C₁ + C₂) · n = C · n

This gives the formal T(n) ≤ C · n bound.

Ω(n) lower bound

The above proves T(n) ≤ C · n (upper bound). For completeness, we show T(n) = Ω(n) (lower bound) also holds, yielding T(n) = Θ(n).

Proposition. For any input of length n > 0, parseRichText time T(n) ≥ n.

Proof. The root frame of parseNodesWithFactory has textEnd = n. The root frame (or its descendant frames) must consume every position in [0, n) — otherwise some source content would not appear in the output AST (the root frame's textEnd covers the full text, and I₁ guarantees all positions are consumed by successful frames).

But the lower bound can no longer be phrased as “each character-consuming iteration advances only a constant number of positions,” because 1.4.3+ fast skip may advance across an entire non-boundary text run in one iteration. The correct lower bound here is a source-touch lower bound:

  1. for any position p ∈ [0, n), either it is ultimately consumed by some successful frame, or it lies inside a non-boundary text run scanned by findNextBoundaryChar(...);
  2. if p lies in a non-boundary run, the parser must at least read/compare that position in order to determine that it is not a current-frame boundary character and keep advancing the fast-skip pointer;
  3. if p is a boundary position, the parser must at least perform one boundary-site decision there to determine whether it is escape, tag head, close, separator, or ordinary text;
  4. therefore, regardless of which class p falls into, the current implementation incurs at least one library-owned primitive source touch on it (a read, comparison, or inclusion in a monotone scan).

Hence the total number of source touches is at least linear in n:

source_touches(n) >= n

and each primitive source touch has constant cost, so:

T(n) = Ω(n)

Combined: T(n) = O(n) ∧ T(n) = Ω(n) ⟹ T(n) = Θ(n).


The in-document appendix below keeps the implementation walkthroughs, branch-coverage inventories, historical bad-path comparisons, empirical constants, and incremental-parsing analysis.

In-document appendix

The material below stays in the same page: it supplies implementation location, historical comparison, and engineering context, but it does not change the theorem object proved above.

Appendix 0: where the historical square term came from

The next section explains only why the historical recovery chain was quadratic. The main proof page now proves only the current implementation.

Second derivation layer: why the historical bad path produced a quadratic term

The quadratic term did not come from the mere existence of the main loop. It came from the historical malformed-shorthand / EOF-recovery path that repeatedly re-entered the same remaining suffix frame by frame, redoing ownership and close classification.

Let a toxic sample contain a malformed shorthand chain of depth m that must be downgraded or recovered at EOF. The old path can be abstracted as:

  • the first fallback rechecks a remaining suffix of size about m;
  • the second fallback rechecks a suffix of size about m-1;
  • ...
  • the last fallback rechecks a suffix of size about 1.

Then the extra recovery cost of the old behavior is:

T_old-recovery(m)
  = Θ(m) + Θ(m-1) + ... + Θ(1)
  = Σ_{k=1..m} Θ(k)
  = Θ(m²)

If m scales with input size, that single recovery branch is enough to destroy the linear upper bound.

The appendices preserve code-location walkthroughs, branch inventories, version evolution, historical bad paths, empirical constants, and incremental-parsing analysis. They are engineering context, not part of the theorem object's definition.

Appendix A: full-pipeline walkthrough

The diagram below lists every step parseRichText("...", options) actually takes, based on the current implementation in parse.ts / resolveOptions.ts / structural.ts / scanner.ts / render.ts / builders.ts. No ellipses, no "and so on" — every function named here corresponds to a locatable piece of source.

1.4.3 reading note. Since 1.4.3, the structural main loop is no longer best read as "every character enters the full branch cascade". Before the cascade, it now runs findNextBoundaryChar(...) and bulk-skips non-boundary text runs. The current boundary set includes tagPrefix, tagClose, tagDivider (inside args only), escapeChar (when escapes are enabled for the frame), the current inlineCloseToken, and tagName.isTagStartChar(...) when shorthand mode is active. So the chart below should be read as a boundary-triggered control-flow map, not a per-character branch tree.

flowchart TD
    A[text + ParseOptions] --> B[parseRichText parse.ts]
    B --> C{empty text?}
    C -->|yes| Z[return empty array]
    C -->|no| D[resolveParsePipelineBase parse.ts]

    subgraph BASE[Base Resolve]
        D --> D1[buildGatingContext resolveOptions.ts]
        D --> D2[resolveBlockTags parse.ts]
        D --> D3[resolveBaseOptions resolveOptions.ts]
        D3 --> D4[resolve syntax/tagName/tracker]
    end

    D --> E[createId seed or options.createId]
    E --> F[createRenderContextFromBase parse.ts]
    F --> G[withLegacyAmbientState compat wrapper]
    G --> H[parseStructuralWithResolved structural.ts]

    subgraph STRUCT[structural.ts main path]
        H --> H1[parseNodesWithFactory]
        H1 --> H2[prepare syntax closures]
        H2 --> H3[stack = makeFrame root]
        H3 --> H4[while stack.length > 0]
        H4 --> H4A[A frame-finish branch]
        H4 --> H4B[B escape branch]
        H4 --> H4C[C inline argClose branch]
        H4 --> H4D[D unexpected endTag branch]
        H4 --> H4E[E args separator branch]
        H4 --> H4F[F tag-head branch]
        H4 --> H4G[G inline shorthand branch]
        H4 --> H4H[H text fallback branch]
        H4F --> H5[getTagCloserType]
        H5 --> H6[findTagArgClose scanner.ts]
    end

    H1 --> I[StructuralNode array]

    subgraph SCAN[scanner.ts helpers]
        I --> I1[findBlockClose]
        I --> I2[findRawClose]
        I --> I3[findInlineClose compat-only]
    end

    I --> J[renderNodes render.ts]
    J --> J1[explicit stack traversal]
    J1 --> J2[mergeTextToken / handler dispatch]
    J2 --> J3[materializeTextTokens builders.ts]
    J3 --> J4[WeakSet hit => skip repeated materialization]
    J4 --> K[TextToken array]
Loading

Full node guide (no abbreviation)

The chart maps to concrete execution stages as follows:

  1. Entry stage (parse.ts)

    • parseRichText(text, options) first runs an empty-string short-circuit.
    • Non-empty input enters resolveParsePipelineBase(text, options).
    • createId is resolved (options.createId or default rt-${seed++}).
    • createRenderContextFromBase(...) builds render context.
    • withLegacyAmbientState(...) wraps the structural + render pipeline for legacy ambient compatibility.
  2. Base resolve stage (parse.ts + resolveOptions.ts)

    • buildGatingContext(handlers, allowForms) splits handlers by form and prepares gating indices.
    • resolveBlockTags(gating.handlers, options.blockTags) derives block-tag lookup and applies explicit overrides.
    • resolveBaseOptions(text, options) resolves final syntax / tagName / tracker.
  3. Structural entry (structural.ts)

    • parseStructuralWithResolved(...) calls parseNodesWithFactory(...).
    • parseNodesWithFactory initializes closure-local syntax tokens and helper functions.
    • The parser uses an explicit stack = [makeFrame(root)] and iterates with while (stack.length > 0).
  4. Structural main-loop branches (A to H)

    • A frame done: flush/pop/complete child or return root.
    • B escape sequence: emit escape node and advance.
    • C inline argClose: when inlineCloseToken !== null, detects the frame's close token. For shorthand frames ( inlineCloseToken === tagClose), a bare ) closes the frame. For full DSL frames, ) triggers suffix-based dispatch: )$$ → inline close, )% → raw form, )* → block form.
    • D unexpected endTag in non-inline frame.
    • E args separator (tagDivider) handling.
    • F tag-head detection + form decision via getTagCloserType / getTagCloserTypeWithCache (findTagArgClose entry for non-inline frames only).
    • G inline shorthand recognition: when inside an inline frame and no tag-head matches, readInlineShorthandStart tries to match name( patterns for implicit inline shorthand.
    • H fallback text append / bulk non-boundary run append.
  5. Scanner helper stage (scanner.ts)

    • findTagArgClose for form decision.
    • findBlockClose for block close boundary.
    • findRawClose for raw close boundary.
    • findInlineClose remains compatibility path.
  6. Render stage (render.ts + builders.ts)

    • renderNodes(...) uses explicit-stack traversal over structural nodes.
    • Node dispatch handles text / escape / inline / raw / block with handler calls.
    • appendToken merges adjacent text tokens.
    • materializeTextTokens(...) uses WeakSet(materializedArrays) to skip repeated subtree materialization.
  7. Output stage

    • parseRichText(...) returns TextToken[].
    • parseStructural(...) returns StructuralNode[] at structural stage exit.

Three places in this diagram are worth a second look — they are exactly the spots that turned the pre-1.1.2 O(n²) into Θ(n):

  • Branch C (inline argClose) is fully resolved inside the main loop. When it sees ), it checks the frame's inlineCloseToken to decide on the spot whether this is a shorthand close (token = ), with ownership arbitration when full-form close competes), or for full DSL frames checks the suffix ($$ / % / *) to decide whether this is an inline close, a raw form, or a block form. There is no "scan ahead again to find the closer" step. Before 1.1.2 this used to call findInlineClose / findTagArgClose, re-sweeping the tail at every inline level — that was O(n·d) at depth d.
  • findTagArgClose in Branch F still exists, but it only runs when a new tag header appears and only on non-inline frames. Since 1.4.3, large remaining windows may go through getTagCloserTypeWithCache(...), which reuses closer decisions within the same parse call. This changes constants, not the asymptotic class.
  • Escape lookup also changed in 1.4.3: escapable token sets are cached per SyntaxConfig, and candidate tokens are bucketed by first character. In practice this makes each escape attempt closer to "check only matching-lead candidates" rather than "scan every token".
  • The render-layer WeakSet dedup only guarantees that the same TextToken[] array object is fully materialized at most once within a single render pass. Distinct arrays that contain the same logical subtree are still materialized separately.

If you want to see how this diagram plays out on a concrete input step by step, read on to "Quick tour 2".


Appendix B: per-iteration walkthroughs

1.4.3 note. Since 1.4.3, the main loop runs findNextBoundaryChar(...) before entering the branch cascade, so consecutive non-boundary characters are bulk-advanced in a single appendBuf step. The current boundary set includes tagPrefix, tagClose, tagDivider (inside args only), escapeChar, the current inlineCloseToken, and tagName.isTagStartChar(...) when shorthand mode is enabled. The table below intentionally preserves the pre-1.4.3 per-character expansion only to explain branch semantics; the fast skip changes constants, not correctness or asymptotic complexity.

To avoid hand-waving with ellipses, the table below lists every iteration of the parseNodesWithFactory main loop. The input simultaneously exercises a tag header, an inline nesting, plain text, and the inline-close cascade (n = 23):

index: 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22
char : h  i        $  $  b  (  w  o  $  $  i  (  r  l  d  )  $  $  )  $  $  !

Relevant tokens under the default syntax: tagPrefix = "$$", tagOpen = "(", tagClose = ")", endTag = ")$$" (length 3). The inline frame's close detection first matches tagClose and then matches endTag starting at the same i. In the current implementation this is a same-cursor scanEndTagAt(...) check followed by same-cursor suffix dispatch inside the full DSL inline path.

Initial state: stack = [root], where root is built by makeFrame(text, 0, false, 0) with textEnd = 23 and inlineCloseToken = null.

Legend for the table:

  • frame is the top of the stack (b / i denote inline child frames named by their tag; R is the root).
  • stack shows the entire stack at the start of the iteration (top on the right).
  • i₀ → i₁ is frame.i before and after the iteration.
  • branch identifies the source-level branch actually taken (line numbers are current structural.ts).
  • buf / nodes is the cumulative state of that frame after this iteration.
# frame stack i₀ → i₁ branch taken (line) accumulated state of this frame
1 R [R] 0 → 1 frame not done (628) → escape miss (659) → inlineCloseToken === null, skip inline argClose block (675) → unexpected endTag miss (non-inline branch) → readTagStartInfo miss → appendBuf R.buf = "h"
2 R [R] 1 → 2 same path as #1, appendBuf R.buf = "hi"
3 R [R] 2 → 3 same as above, appendBuf R.buf = "hi "
4 R [R] 3 → 7 escape miss → inlineCloseToken null → unexpected endTag miss → readTagStartInfo matches $$b(: tagPrefix $$ at 3, name b at 5, tagOpen ( at 6. Then getTagCloserTypefindTagArgClose decides this is the inline form → tryPushInlineChild (497): flushBuffer pushes [0,3) as text node "hi " into R.nodes, builds child frame b and pushes it; child has b.i = info.argStart = 7, b.inlineCloseToken = endTag (i.e. )$$), b.tagStartI = 3, b.argStartI = 7. Stack top is now b; next iteration starts on b. R.nodes = [text "hi "]; R.buf cleared
5 b [R, b] 7 → 8 escape miss → inlineCloseToken !== null enters inline argClose block (675): inlineCloseToken !== tagClose (full DSL frame), so enters suffix path; startsWith(")", 7) → false → falls through to tag-head check; readTagStartInfo at 7 → 'w' is not tagPrefix → inline shorthand miss → appendBuf b.buf = "w"
6 b [R, b] 8 → 9 same path as #5, appendBuf b.buf = "wo"
7 b [R, b] 9 → 13 escape miss → tagClose ")" miss → readTagStartInfo matches $$i(: tagPrefix at 9, name i at 11, tagOpen at 12. Since b is an inline frame (inlineCloseToken !== null), takes H1 path (no getTagCloserType) → tryPushInlineChild: flushBuffer pushes "wo" as text node into b.nodes; pushes child frame i with i.i = 13, i.inlineCloseToken = endTag ()$$), i.tagStartI = 9, i.argStartI = 13` b.nodes = [text "wo"]; b.buf cleared
8 i [R, b, i] 13 → 14 escape miss → tagClose ")" at 13 miss → readTagStartInfo at 13 → 'r' is not tagPrefix → inline shorthand miss → appendBuf i.buf = "r"
9 i [R, b, i] 14 → 15 same as #8, appendBuf i.buf = "rl"
10 i [R, b, i] 15 → 16 same as #8, appendBuf i.buf = "rld"
11 i [R, b, i] 16 → 19 escape miss → inlineCloseToken !== tagClose (full DSL frame); tagClose ")" at 16 hits, enters suffix check: startsWith(")$$", 16) → chars 16,17,18 = )$$ → true → inline close path: i.inlineCloseWidth = endTag.length = 3, flushBuffer pushes "rld" as text node into i.nodes → stack.pop()completeChild(i): closeStart = 16, nextI = closeStart + 3 = 19, pushes factory.inline("i", i.nodes, meta, false) into b.nodes and sets b.i = 19. Stack top returns to b. frame i is gone; b.nodes = [text "wo", inline "i"(...)]
12 b [R, b] 19 → 22 escape miss → tagClose ")" at 19 hits; endTag check: chars 19,20,21 = )$$ → true → inline close: b.inlineCloseWidth = endTag.length = 3, flushBuffer (b.buf empty, no-op) → pop → completeChild(b): closeStart = 19, nextI = 22, pushes factory.inline("b", b.nodes, meta, false) into R.nodes and sets R.i = 22 frame b is gone; R.nodes = [text "hi ", inline "b"(...)]
13 R [R] 22 → 23 escape miss → inlineCloseToken null, skip inline argClose block → unexpected endTag miss → readTagStartInfo miss → appendBuf R.buf = "!"
14 R [R] 23 frame.i >= frame.textEnd (628) hits → flushBuffer pushes "!" as text node into R.nodes → pop → returnKind === null → return frame.nodes R.nodes = [text "hi ", inline "b"(...), text "!"]

The main loop runs 14 iterations total to consume 23 source positions plus 2 inline pops. You can verify:

  • Char-consuming iterations (#1–#3, #5–#6, #8–#10, #13) = 9, exactly matching the 9 plain text characters h i ␠ w o r l d !. Each one strictly advances i by 1.
  • Tag-header jump iterations (#4, #7) = 2. Each one lets the parent frame jump i past an entire $$X( header (4 characters) in one step. Those 4 positions are then consumed by the child frame in its own subsequent scanning — so at the final-frame level every position still belongs to exactly one frame.
  • Inline-close iterations (#11, #12) = 2. Each one advances i past )$$ (3 characters); again, those 3 positions are accounted for in the parent frame as parent-frame consumption, with no overlap across frames.
  • Frame-done iteration (#14) = 1, no character consumed.

Add up "the 4-character jump when the parent pushes a child" with "the child's own subsequent character-by-character scan", and every source position is consumed by exactly one frame at the final-frame level. This is the invariant the formal section below proves rigorously.

Helper-scan inventory for the same input

Outside the main loop, this input triggers the following helper scans (all from scanner.ts, only at root/block level):

call site start end scanned interval length
findTagArgClose (#4) 7 19 [7, 19) 12
findTagArgClose (#7) 13 16 [13, 16) 3

The second findTagArgClose runs in a deeper context (frame b is already an inline child), but the caller is not the inline frame itself — it is getTagCloserType doing a one-shot form decision for the new tag $$i(. Physically that call is nested inside an inline frame, but semantically it is still root/block-level form detection because getTagCloserType only fires when readTagStartInfo first encounters a new tag header. That is a fundamentally different cost model from "rescan the closer at every inline nesting level": the former costs O(number-of-tag-headers), the latter costs O(depth · n) and gave the old O(n²).

Helper scans cover 12 + 3 = 15 characters total, partially overlapping with the 23 positions consumed by the main loop. Total work stays inside n + Σ(helper-scan lengths) = O(n), matching the closed-form bound T_structural(n) ≤ k·n + s·n from the "Formal upper bound → Per-iteration work" subsection below.

Walkthrough 2b: per-iteration trace with shorthand mode

The example above contains only full DSL structures ($$name(...)$$). Since 1.3 the parser supports shorthand form name(...) inside inline argument regions — the prefix and suffix are omitted and a bare ) closes the frame. The following input mixes full DSL with shorthand nesting, tracing every iteration to show shorthand push / close / preemption by full DSL.

Input (n = 27):

index: 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
char:  $  $  b  o  l  d  (  h  e  l  l  o     b  o  l  d  (  w  o  r  l  d  )  )  $  $

Meaning: $$bold(hello bold(world))$$ — the outer tag is a full DSL bold; inside its argument region there is a shorthand bold(world).

Relevant tokens are the same as before: tagPrefix = "$$", tagOpen = "(", tagClose = ")", endTag = ")$$". Shorthand frames have inlineCloseToken = tagClose (i.e. )); full DSL frames have inlineCloseToken = endTag (i.e. )$$).

# frame stack i₀ → i₁ branch taken (line) accumulated state of this frame
1 R [R] 0 → 7 readTagStartInfo matches $$bold(: tagPrefix at 0, name "bold" at 2, tagOpen at 6. getTagCloserTypefindTagArgClose decides inline form → tryPushInlineChild: pushes child bold, bold.i = 7, bold.inlineCloseToken = endTag R.nodes = []; R.buf cleared
2 bold [R, bold] 7 → 8 escape miss → tagClose ")" at 7 miss → readTagStartInfo miss → readInlineShorthandStart at 7: name "hello" read successfully but startsWith("(", 12) → char 12 is space → false → shorthand miss → appendBuf bold.buf = "h"
3 bold [R, bold] 8 → 9 same as #2, appendBuf bold.buf = "he"
4 bold [R, bold] 9 → 10 same bold.buf = "hel"
5 bold [R, bold] 10 → 11 same bold.buf = "hell"
6 bold [R, bold] 11 → 12 same bold.buf = "hello"
7 bold [R, bold] 12 → 13 space ' ' is not tagStartChar → shorthand miss → appendBuf bold.buf = "hello "
8 bold [R, bold] 13 → 18 readTagStartInfo miss ('b' is not tagPrefix $$) → readInlineShorthandStart at 13: name "bold", tagNameEnd=17, startsWith("(", 17) → true → tag "bold" in registeredTags → supportsInlineForm passes → returns info. tryPushInlineShorthandChild: ambiguity probe scans from argStart=18, checking escape / full DSL tag head / tagClose at each position; hits ) at 23 (boundary=23), startsWith(")$$", 23) → chars 23,24,25 = ),),$)$$reject = false → probe passes. flushBuffer pushes "hello " into bold.nodes. Pushes shorthand child bold(sh): bold(sh).i = 18, bold(sh).inlineCloseToken = ")", bold(sh).implicitInlineShorthand = true bold.nodes = [text "hello "]
9 bold(sh) [R, bold, bold(sh)] 18 → 19 inlineCloseToken = ")" === tagClose → shorthand frame → startsWith(")", 18) → false → readTagStartInfo miss → readInlineShorthandStart miss ("world" not followed by () → appendBuf bold(sh).buf = "w"
10 bold(sh) [R, bold, bold(sh)] 19 → 20 same as #9 bold(sh).buf = "wo"
11 bold(sh) [R, bold, bold(sh)] 20 → 21 same bold(sh).buf = "wor"
12 bold(sh) [R, bold, bold(sh)] 21 → 22 same bold(sh).buf = "worl"
13 bold(sh) [R, bold, bold(sh)] 22 → 23 same bold(sh).buf = "world"
14 bold(sh) [R, bold, bold(sh)] 23 → 24 inlineCloseToken = ")"; first checks scanEndTagAt(23), which is not a full endTag in this example, so ownership does not defer. Then startsWith(")", 23) → true → C1 shorthand close: inlineCloseWidth = tagClose.length = 1, flushBuffer pushes "world" into bold(sh).nodes → pop → completeChild: closeStart=23, nextI=24, factory.inline("bold", nodes, meta, true) pushed into bold.nodes, sets bold.i = 24 bold(sh) gone; bold.nodes = [text "hello ", inline "bold"(sh)(...)]
15 bold [R, bold] 24 → 27 tagClose ")" at 24 hits → startsWith(")$$", 24) → chars 24,25,26 = )$$ → true → C2 inline close: inlineCloseWidth = 3, flushBuffer (buf empty) → pop → completeChild: closeStart=24, nextI=27, factory.inline("bold", nodes, meta, false) pushed into R.nodes, sets R.i = 27 bold gone; R.nodes = [inline "bold"(...)]
16 R [R] 27 frame.i >= frame.textEnd → flushBuffer (buf empty) → pop → return frame.nodes R.nodes = [inline "bold"(...)]

The main loop runs 16 iterations to consume 27 source positions plus 2 pops.

  • Char-consuming iterations (#2–#7, #9–#13) = 11, matching the 11 text characters h e l l o ␠ w o r l d.
  • Tag-header jump iterations (#1, #8) = 2. #1 skips the full DSL header $$bold( (7 chars); #8 skips the shorthand header bold( (5 chars). In both cases the child frame will account for those positions in its own scanning.
  • Frame-close iterations (#14, #15) = 2. #14 (shorthand close) consumes just 1 character ). #15 (full DSL close) consumes 3 characters )$$.
  • Frame-done iteration (#16) = 1, no character consumed.

Shorthand-specific helper scans

In addition to findTagArgClose (seen in Walkthrough 2), shorthand mode introduces one extra helper scan — the ambiguity probe (tryPushInlineShorthandChild(...), which in the current implementation delegates its ownership decision to resolveShorthandOwnershipPush(...)).

When the parent frame is a full DSL frame (inlineCloseToken === endTag), the parser must verify that the first "boundary" inside the shorthand argument region does not cause the shorthand to consume the parent's endTag. The probe scans forward character by character, stopping at one of three conditions:

  1. Escape sequence — skipped, scanning continues.
  2. Full DSL tag header (readTagStartInfo matches $$name() — boundary is set to this position, reject = false (the shorthand frame would be preempted by the full DSL structure before reaching it, so no ambiguity).
  3. tagClose ()) — boundary is set to this position; startsWith(endTag, boundary) is checked: if the ) is the start of endTag ()$$), reject = true and the shorthand is rejected.

In this example (#8), the probe scans from 18 to 23 and finds ) (boundary=23). startsWith(")$$", 23) → false → reject = false → probe passes. If the boundary lands at textEnd, scanEndTagAt may return truncated-prefix; this is treated as non-full. The result is cached in shorthandProbe: ShorthandProbeState | null (lazily created, containing textEnd / startI / boundaryI / reject), and reused only under the same textEnd scan-boundary version to avoid cross-window stale reuse.

call site start end scanned interval length
findTagArgClose (#1) 7 23 [7, 23) 16
tryPushInlineShorthandChild probe (#8) 18 23 [18, 23) 5

Helper scans cover 16 + 5 = 21 characters total, partially overlapping with the 27 positions consumed by the main loop. Total work stays inside n + Σ(helper-scan lengths) = O(n).

Walkthrough 2c: poison input, iteration by iteration (ownership competition)

This is the 1.3.x "poison" sample that most easily causes wrong ownership if the parser uses ad-hoc special cases. Like 2b, we list the main-loop progression, but this case includes two explicit "degrade + rewind to argStart" cycles.

Input (n = 45, custom syntax):

=bold<天気がbold<いlink<baidu.com|い>=>から>=散歩しましょう

Syntax used in this walkthrough (same as tests/shorthandBehavior.test.ts):

  • tagPrefix = "="
  • tagOpen = "<"
  • tagClose = ">"
  • tagDivider = "|"
  • endTag = ">="

Character index map:

index:  0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44
char :  =  b  o  l  d  <  天 気 が b  o  l  d  <  い l  i  n  k  <  b  a  i  d  u  .  c  o  m  |  い  >  =  >  か  ら  >  =  散 歩 し  ま  し  ょ  う

Stack labels:

  • R: root frame
  • Bf: outer full-form =bold<...>= frame (inlineCloseToken = ">=")
  • Bs: shorthand bold<...> candidate frame (inlineCloseToken = ">")
  • Ls: shorthand link<...> candidate frame (inlineCloseToken = ">")
# frame stack i₀ → i₁ Matched branch (single ownership resolver across phases) State snapshot
1 R [R] 0 → 6 readTagStartInfo matches =bold<; tryPushInlineChild pushes full-form Bf. R.buf cleared
2 Bf [R, Bf] 6 → 7 Plain-text append (). Bf.buf="天"
3 Bf [R, Bf] 7 → 8 Plain-text append (). Bf.buf="天気"
4 Bf [R, Bf] 8 → 9 Plain-text append (). Bf.buf="天気が"
5 Bf [R, Bf] 9 → 14 readInlineShorthandStart matches bold<; resolveShorthandOwnershipPush(...) returns allow; push Bs. Bf.nodes += text("天気が")
6 Bs [R, Bf, Bs] 14 → 15 Plain-text append (). Bs.buf="い"
7 Bs [R, Bf, Bs] 15 → 20 readInlineShorthandStart matches link<; resolveShorthandOwnershipPush(...) returns allow; push Ls. Bs.nodes += text("い")
8 Ls [R, Bf, Bs, Ls] 20 → 31 Scans baidu.com + divider + : text segments use text append; the divider hits tagDivider and emits a separator. Ls emits a separator and keeps side text
9 Ls [R, Bf, Bs, Ls] 31 → 31 At tagClose (>), resolveShorthandOwnershipClose(...) runs first: the same cursor can form outer endTag (>=), so the result is defer-parent. Ls degrades. Ls enters degrade
10 Bs [R, Bf, Bs] 20 → 31 After Ls degradation, parent rewinds to argStart and rescans; the link<... segment is absorbed as plain text in Bs. Bs absorbs that segment
11 Bs [R, Bf, Bs] 31 → 31 At the same competition point, Bs also gets resolveShorthandOwnershipClose(...) = defer-parent; Bs degrades too. Bs enters degrade
12 Bf [R, Bf] 14 → 36 Parent Bf rewinds to argStart and rescans, accumulating the corresponding text span; no full close occurs before index 36. Bf.buf updated
13 Bf [R, Bf] 36 → 38 Full endTag (>=) matches; Bf closes normally via completeChild and is emitted to root. R.nodes += inline(bold)
14 R [R] 38 → 45 Tail 散歩しましょう appended as plain text; EOF pop and return. done

Output (implicitInlineShorthand=false/true are identical):

天気がbold<いlink<baidu.com|い>から>=散歩しましょう

Note: the | is processed by the pipe-divider branch during scanning (not as a plain text character); it appears preserved in final text because this path later degrades/rewinds and settles as text.

Why this matters: push, close, and fallback all call the same ownership resolver. That prevents split-brain behavior where one phase "salvages" shorthand and steals close ownership from a stable full-form parent.

Shorthand vs full DSL priority

When a shorthand close point competes with a full DSL endTag at the same cursor, ownership arbitration runs first: if endTag fully matches, shorthand yields (degrades to text and rewinds to argStart) and the parent frame consumes that same close token. This guarantees full DSL syntax takes priority over shorthand.

Key observations

  1. i is strictly monotonic. Across both examples, frame.i either stays the same (frame-done pop) or strictly increases. No branch ever rewinds i.
  2. Inline close needs no rescanning. Full DSL frames hit ) and perform same-position token checks (endTag / rawOpen / blockOpen) — findInlineClose is never called. Shorthand frames also stay local: they run same-position ownership arbitration before consuming ). This keeps each iteration O(1) and is the key reason inline nesting stays O(n) since 1.1.2.
  3. Helper scans are enumerable. Both examples produce helper scans independent of inline nesting depth — findTagArgClose is triggered by root/block-level form decisions, and the shorthand ambiguity probe is triggered by disambiguation with cacheable results. That is the structural reason the asymptotic bound stays linear.

Stretch this table out to n = 200,000 and both the iteration count and the total helper-scan coverage grow linearly — which is exactly what the formal section below proves rigorously.


Appendix C: implementation branch coverage, version notes, and library-owned boundaries

Since 1.1.2, both parser entry points have a worst-case Θ(n) time complexity:

API Worst-case Reason
parseStructural Θ(n) Single scan pointer i in a while main loop; no per-level rescan
parseRichText Θ(n) parseStructural Θ(n) + render traversal Θ(n) = Θ(n); if handlers further call materialization helpers, those additional steps also remain linear

What changed in 1.1.2

Two independent O(n²) bottlenecks were eliminated:

  1. Structural layer — inline forward scan removed. Before 1.1.2, every inline nesting level called findInlineClose / findTagArgClose (scanner.ts) to pre-scan forward for the matching close. With d nesting levels each scanning ~n characters, the worst case was O(n·d) = O(n²).

    Now: inline child frames use an inlineCloseToken field (structural.ts, main loop) that records the exact token needed to close the frame — endTag (i.e. )$$) for full DSL inline, or tagClose (i.e. )) for shorthand inline. The frame checks for its close token at the current scan position — the comment at the push site says "不需要预扫 findTagArgClose … 纯 inline 深嵌套保持 O(n)". The old findInlineClose / findTagArgClose functions still exist in scanner.ts but are only called from getTagCloserType (root/block-level form detection) and findBlockClose (block-close boundary search); neither path is entered inside an inline frame.

  2. Render layer — materializeTextTokens re-traversal eliminated. Each handler call used to recurse into the full subtree. A module-level WeakSet (materializedArrays in builders.ts) now marks processed arrays; subsequent calls hitting the same array skip recursion entirely (builders.ts line 87–86), collapsing O(n²) to O(n).

Both fixes are documented in the Performance deep-nesting section.

Remaining forward scans

getTagCloserType / getTagCloserTypeWithCache still call findTagArgClose at root/block level to determine whether a tag is inline / raw / block form. Since 1.4.3 the call is split by remaining text length: ≤ 256 characters uses the uncached getTagCloserType; > 256 characters uses getTagCloserTypeWithCache (sharing argClose scan results via a lazily created Map cache). This does not reintroduce O(n²) because:

  • It runs once per tag at the top level or inside a block content frame — not per nesting level.
  • findBlockClose (scanner.ts) internally handles nested block depth, skips inner raw spans via findRawClose, and skips inner inline spans via findInlineClose. Within a single findBlockClose call, the scan pointer still advances monotonically and does not reintroduce per-level backtracking rescans, so the total work of that call remains O(n).

Branch-coverage matrix: which branch complexities are actually covered here

Saying only “the main loop is Θ(n)” is not the same as accounting for every library-owned branch in the current implementation. This section makes that explicit, function by function:

  • where the branch is triggered,
  • what its local one-hit cost is,
  • which global bound absorbs it,
  • and where the analysis stops because we would otherwise be counting user handler code rather than library-owned control flow.

Here “covered” means specifically:

  1. the normal parseStructural(...) / parseRichText(...) entry paths,
  2. the current structural + render implementation shipped by the library,
  3. excluding the internal complexity of user-provided handler callback bodies.

Structural parser: top-level while-loop branch coverage

Top-level branch Trigger point Local one-hit cost Where it is absorbed
tryFinalizeFrameAtEof(frame) start of each loop iteration O(1) on normal completion; malformed EOF replay handled separately below normal completion folds into O(number of frames); post-1.4.4 EOF replay folds into T_recovery-extra = O(n)
findNextBoundaryChar(...) fast skip non-boundary text run O(run length) for one skip the sum of all skipped run lengths is ≤ n, so it folds into source-consumption O(n)
readEscapedForFrame(...) escape attempt O(1) on miss; O(escape-sequence length) on hit escape spans are disjoint and fold into total source/escape consumption O(n)
tryConsumeInlineCloseAtCursor(...) inline-frame close site O(1) same-cursor arbitration; raw/block close search accounted separately C1/C2/C5 fold into main-loop O(n); C3/C4 close searches fold into auxiliary scans O(n)
non-inline scanEndTagAt(...)=full unexpected close O(1) folds into main-loop O(n)
insideArgs && startsWith(tagDivider) arg separator O(1) folds into main-loop O(n)
tryConsumeTagOrTextAtCursor(...) full tag / shorthand / text dispatch depends on its sub-branch sub-branches are split below into tag-dispatch O(n) + helper-scan O(n)
defensive fallback appendBuf(...); i++ theoretical safety net O(1) folds into main-loop O(n)

Structural parser: sub-branch coverage inside tryConsumeInlineCloseAtCursor(...)

Sub-branch Local cost Global absorption
C1 shorthand normal close O(1) same-cursor ownership check + O(1) close folds into main-loop O(n)
C1 shorthand defer-parent O(number of already produced child nodes attached upward) post-1.4.4 folds into T_defer-parent = O(n)
C2 full DSL inline close (endTag) O(1) folds into main-loop O(n)
C3 raw form ()%) one linear scan of that raw-content window via findRawClose all raw-close searches fold into auxiliary scans O(n)
C3 raw-unclosed degradation O(1) error + O(tag-head span) source fallback folds into auxiliary scanning / degraded-source total O(n)
C4 block form ()*) one linear scan of that block window via findBlockClose all block-close searches fold into auxiliary scans O(n)
C4 block-unclosed degradation O(1) error + O(tag-head span) source fallback folds into auxiliary scanning / degraded-source total O(n)
C5 ordinary ) O(1) folds into main-loop O(n)

Structural parser: sub-branch coverage inside tag/text dispatch and shorthand push

Sub-branch Local cost Global absorption
readTagStartInfo(...) recognizes a full DSL head O(tag-name length) total tag-head character volume is ≤ n, so this folds into O(n)
inline-frame tryPushInlineChild(...) O(1) push; does not call getTagCloserType folds into O(number of frames) ≤ O(n)
readInlineShorthandStart(...) recognizes a shorthand head O(tag-name length) folds into total tag-head volume O(n)
resolveShorthandOwnershipPush(...) owner fast-path O(1) folds into T_probe / T_recovery-extra = O(n)
resolveShorthandOwnershipPush(...) forward probe O(probe-window length) per first build each (frame, textEnd) window is built once, so total cost is amortized O(n)
getTagCloserType(...) / getTagCloserTypeWithCache(...) O(length of that tag's arg window) only root/block-level form classification triggers it, so total folds into auxiliary scans O(n)
findTagArgClose(...) returns null (H2) failed scan O(remaining window) + tag-head degradation under the current 1.4.4 recovery model this folds into auxiliary scans + recovery-extra O(n)
gating rejection → skipTagBoundary(...) O(tag-boundary length) total skipped boundary length is ≤ n, so this folds into O(n)
depthLimit degradation O(tag-boundary length) via skipTagBoundary(...) still absorbed by total boundary length O(n)
ordinary text fallback O(1) or batch O(run length) folds into source consumption O(n)

Post-1.4.4 recovery-branch coverage

These recovery branches need to be listed explicitly, because they determine whether recovery can reintroduce superlinear work. The currently covered recovery branches are:

Recovery branch Current implementation Global absorption
shorthand push-time owner conflict resolveShorthandOwnershipPush(...) can reject immediately O(1), or amortized probe O(n)
shorthand close-site owner conflict resolveShorthandOwnershipClose(...) + downgradeInlineIntoParent(...) T_defer-parent = O(n)
EOF malformed inline / shorthand chain buildMalformedInlineReplayPlan(...) + replayMalformedInlineChainAtEof(...) T_replay-plan + T_eof-rescan = O(n)
historical bad path: frame-by-frame re-entry on the same suffix existed before 1.4.4 explicitly analyzed here as old O(n^2), new O(n)

Renderer: renderNodes(...) main-loop branch coverage

Render-loop branch Local cost Global absorption
frame end frame.index >= frame.nodes.length flushTextBuf + pop/resume; flushTextBuf is O(buffered text length) every buffered text segment is joined/merged once, so this folds into output-text volume O(n)
consumeNextLB trims leading line break O(trimmed newline length), typically O(1) total trimmed line-break length is bounded by rendered text length, so O(n)
text / escape / separator bufferText(...) amortized O(1); root-mode readEscaped(...) is O(raw escape length) total text-like output volume folds into O(rendered text size) ≤ O(n) on the library-owned side
inline node flushTextBuf + push child render frame folds into O(number of nodes) ≤ O(n)
raw node flushTextBuf + renderRawNode(...) see next table; total raw-content work folds into O(n)
block node flushTextBuf + push child render frame folds into O(number of nodes) ≤ O(n)

Renderer: node-dispatch branch coverage

Function / branch Local cost Global absorption
renderInlineNode(...): !supportsInlineForm(...)degradeToSource(...) O(node source-span length) in the normal parseRichText pipeline many illegal forms are already filtered by structural gating; remaining degradation cost is charged to degraded source span and kept inside O(n)
renderInlineNode(...): unknown-tag passthrough one materializeTextTokens(childTokens) + appendToken loop each identical TextToken[] object is fully materialized at most once, so this folds into O(total token materialization)
renderInlineNode(...): handler.inline / fallback handler call itself is outside the library guarantee; library-side createToken / appendToken is O(size of produced token draft) library-owned part folds into O(rendered token count); user handler-body work is separate
renderRawNode(...): no handler.rawdegradeToSource(...) O(raw-node source span) charged to degraded source span, folding into O(n)
renderRawNode(...): escaped raw-close restoration O(rawContent length) with lazy parts allocation total rawContent length is bounded by source length, so O(n)
renderRawNode(...): normalizeBlockTagContent(...) O(rawContent length) folds into raw-content total O(n)
renderBlockNode(...): no handler.blockdegradeToSource(...) O(block-node source span) charged to degraded source span, folding into O(n)
renderBlockNode(...): fast path in trimBlockBoundaryTokens(...) O(1) folds into O(number of block nodes) ≤ O(n)
renderBlockNode(...): trimming path with cloneToken(...) O(size of the trimmed edge-token subtree) only modified edge tokens are cloned; for normal parser output the total folds into overall token size O(n)
bufferText(...) / flushTextBuf(...) O(1) amortized per push; O(buffer length) per flush every buffered segment is joined/merged once, so total cost folds into text-output volume O(n)
mergeTextToken(...) / appendToken(...) O(1), including adjacent-text merge folds into token-append count O(n)

Helper / materialization / degradation branches and the coverage boundary

Helper path Covered here? Boundary note
materializeTextTokens(...) yes analysis assumes library-generated token arrays; the same array object is fully materialized at most once
degradeToSource(...) yes, with boundary note for the normal parseRichText pipeline, degraded span total is charged as O(n); if a user manually feeds render a highly overlapping external tree, that is outside the normal parser guarantee
trimBlockBoundaryTokens(...) / cloneToken(...) yes under normal parser output, cloning touches only trimmed edge-token subtrees and folds into total token size
createToken(...) only the library-side wrapper cost if a user handler constructs a huge draft object, that additional construction cost is not the library's own complexity
user handler.inline/raw/block bodies not covered user code can itself do O(m^2) work, which obviously exceeds the library-owned Θ(n) guarantee

Conclusion: what “full branch coverage” means after this addition

The more precise statement is now:

  1. structural top-level branches and their sub-branches are covered, not only the headline main loop;
  2. the 1.4.4 ownership / EOF recovery path is covered in both old and new form;
  3. render-loop, node-dispatch, text-buffer, raw/block trim/degrade/materialization branches are covered on the library-owned side;
  4. user handler bodies are explicitly excluded rather than smuggled into the library's Θ(n) claim.

So the accurate umbrella statement becomes:

T_library-owned(parseRichText, n) = Θ(n)

where “library-owned” explicitly excludes extra work performed inside user-provided handlers. Under that scope, the current structural / render / recovery / helper branches are now all brought under the complexity coverage of this page.

Appendix D: the historical N² recovery chain repaired in 1.4.4

This appendix is historical only. It explains why the old bad path was quadratic and which frame-by-frame re-entry mechanism the current implementation removes. It is not part of the main theorem statement.

Old behavior vs. new behavior: the 1.4.4 quadratic path

When the changelog says “a quadratic path was found”, it is not talking about the normal well-formed fast path, nor about every malformed input in the abstract. It is talking about the specific combination of:

  • deep shorthand / inline nesting,
  • ownership competition with an ancestor full-form close,
  • and malformed state carried all the way into EOF recovery.

Putting old and new readings side by side makes the significance clearer:

Dimension Bad-path reading before 1.4.4 Reading since 1.4.4
ancestor close ownership had to be rediscovered after parent re-entry at the same close position ancestorEndTagOwnerIndex gives the nearest owner in O(1)
shorthand yielding could degrade into “yield one layer, then let the parent reconsider the whole suffix” resolveShorthandOwnershipClose(...) + downgradeInlineIntoParent(...) resolve it in place and continue from nextParentI
malformed EOF chain could keep feeding control back into the main loop layer by layer buildMalformedInlineReplayPlan(...) computes once, replay pops once
suffix rescans could accumulate as a triangular series reduced to one recovery resume
worst-case cost of that path O(n^2) O(n)

So the 1.4.4 change is not merely “a constant-factor optimization”. It restores the complexity class of a real recovery path from quadratic back to linear.

Historical supplement: preserving the old frame-by-frame re-entry model

The analysis above describes the current implementation. But if this page is meant to function as a design note or paper-style explanation, the old model should remain visible too. Otherwise the reader sees that the quadratic path is gone, but not why it used to exist.

The old multiplier can be abstracted as this chain:

deep shorthand / inline close competition
    → current layer cannot stabilize ownership
    → return to the parent and observe the same suffix again
    → parent carries part of that suffix back into the main loop
    → the next layer repeats the same pattern

If the still-unresolved suffix length at a given moment is k, the accumulated work can be written as:

W_old(k) = k + (k - δ1) + (k - δ1 - δ2) + ...

where each δi > 0 means “some progress happened”. But as long as the parent still has to re-examine essentially the same tail after each recovery bounce, this is a triangular series; in the worst uniform case it becomes:

W_old(k) = k + (k - 1) + (k - 2) + ... + 1 = Θ(k^2)

In other words, the old problem was not “there exists an O(n) scan helper”. The real problem was:

  • the same suffix re-entered the main loop too many times,
  • ownership information was not local enough, so parent re-entry had to re-establish it,
  • EOF recovery did not settle the whole malformed chain in one shot.

1.4.4 cuts all three multipliers:

Old multiplier 1.4.4 cut point
nearest owner had to be rediscovered through parent re-entry ancestorEndTagOwnerIndex makes nearest ownership frame-local O(1) data
shorthand yielding returned too much of the tail to the parent downgradeInlineIntoParent(...) only replays the tag head, keeps already parsed child nodes, and jumps to nextParentI
EOF recovery propagated pressure layer by layer buildMalformedInlineReplayPlan(...) + replayMalformedInlineChainAtEof(...) plan first, replay once

Keeping the old reasoning visible matters because it shows:

  • how the old complexity actually grew, not just that it disappeared,
  • which exact causal chain the new implementation breaks,
  • why the changelog says “restores the linear guarantee” rather than merely “improves constants”.

Appendix E: worst-case input model

The constant C is not uniform — different input shapes stress different branches. Here are the three worst-case axes and the concrete inputs that maximize them.

1. Maximum branch density: escape-prefix spam

\z\z\z\z\z\z\z\z\z\z...

Every \ triggers the escape-prefix hit path. readEscapedSequence matches the escape char, then checks all 9 escapableTokens in length-descending order (default syntax: )$$, )%, )*, %%, **, (, ), |, \). Since z doesn't start any of them, all 9 miss. The \ is then treated as a plain character (not an escape). This is the input that maximizes per-character branch count: k ≈ 12–15 per position.

A valid escape like \(\(\(\(... matches ( after 6 misses (the first 5 tokens plus %%/** do not start with (), so the per-\ cost is lower but still significant.

In both cases there is no rescan — the pointer still advances by at least 1 per iteration, so total work remains linear.

2. Maximum stack depth: pure inline nesting

$$a($$a($$a($$a($$a($$a(x)$$)$$)$$)$$)$$)$$

Each $$a( pushes one frame onto the explicit stack. With default syntax, minimum tag overhead is 4 chars ($$ + 1 name char + (), so maximum stack depth ≤ n/4. Since 1.1.2, the parser uses an explicit array stack — not the call stack — so this is bounded only by heap memory. No stack overflow at any depth.

The 50-million-layer stress test in the Performance page demonstrates this empirically.

3. Maximum form-detection overhead: root-level inline tags

$$a(x)$$$$a(x)$$$$a(x)$$$$a(x)$$...

Every tag sits at root level, so each one triggers getTagCloserTypefindTagArgClose. The form-detection scan overlaps with characters later parsed by child frames, so those regions incur an additional constant-factor linear overhead. This does not change the Θ(n) result.

Combined worst case: three-axis mutual exclusivity

No single input can maximize all three axes simultaneously. Here is the formal argument.

Proposition. Let E be the number of characters consumed by escape sequences, T be the number of characters consumed by tag structures (tag head + arguments + end tags), and P be the number of plain-text characters (neither escaped nor tag structure) in an input of length n. Then E + T + P = n.

Axis 1 vs. Axis 2 mutual exclusivity. Each escape sequence \x occupies 2 characters and is not recognized as a tag head (the escape check precedes the tag-head check, structural.ts line 659). As E increases, the characters available for tag structures decrease: T ≤ n − E. Maximum stack depth ≤ T/4 (each tag head ≥ 4 characters), so:

stack_depth ≤ (n − E) / 4

Axis 2 vs. Axis 3 mutual exclusivity. Deep nesting requires tags to recursively nest as inline form. Inside inline frames (H1 path), getTagCloserType is not called (structural.ts lines 892–918), so no form-detection overhead is incurred. Only root/block-level frame tags trigger getTagCloserType. At nesting depth d, the number of root-level tags in a given region is ≤ n/(4d), reducing total form-detection scan accordingly.

Axis 1 vs. Axis 3 mutual exclusivity. Escape characters do not form tag heads, so they never trigger form detection. More escapes means fewer form-detection calls.

Conclusion. The three axes' costs form a convex combination bounded linearly:

C_total = c₁ · E + c₂ · T + c₃ · P ≤ max(c₁, c₂, c₃) · n = O(n)

where c₁, c₂, c₃ are the per-character costs of the escape branch, tag processing (including stack operations + form detection), and plain-text branch, respectively. All three are constants independent of n. The realized worst case is always a trade-off among them, and the total stays within C · n.


Appendix F: per-character branch budget (k)

The main loop at each character position i executes a fixed cascade of checks. Here is the exact sequence, counted from source (structural.ts line 624+):

Step Check Calls When
0 Fast text skip: findNextBoundaryCharcharCodeAt comparisons (up to 5 boundary lead codes) ≤ 5 Always (since 1.4.3); also checks isTagStartChar when shorthand enabled
1 Frame end: frame.i >= frame.textEnd 1 cmp Always
2 Escape: readEscapedSequencestartsWith(escapeChar) 1 Always
2b Escapable scan (only if escape char matches) ≤ B B = lead-char bucket size (since 1.4.3: tokens bucketed by first char, no longer scanning all E)
3 Inline argClose: inlineCloseToken check + startsWith(tagClose) 1–2 Inline frame only
3b Suffix: startsWith(endTag), startsWith(rawOpen), startsWith(blockOpen) ≤ 3 tagClose hit in full DSL frame
4 Unexpected endTag: startsWith(endTag) 1 Non-inline frame only
5 Separator: startsWith(tagDivider) 1 insideArgs only
6 Tag head: readTagStartInfostartsWith(tagPrefix) + char class checks 1+ Always (if no earlier match)
6b Inline shorthand: readInlineShorthandStart → tag name + ( check 1 Inline frame, no tag-head match
7 (removed in 1.3: bare ( parenDepth tracking no longer needed)

1.4.3 changes. Step 0's fast text skip bulk-skips consecutive non-boundary characters, so plain-text spans no longer walk steps 1–6b character by character. Step 2b's escapable scan changed from "iterate all E tokens" to "bucket by leading character and only test the matching bucket" — sequences like \z whose first character does not match any token's leading character now return immediately instead of scanning all 9 tokens.

Totals by input class (default syntax):

Input class k before 1.4.3 k in 1.4.3 (with fast skip) Path taken
Plain text (no syntax chars) 3–4 ≤ 5 (boundary scan only, bulk skip) fast skip jumps entire span → next iteration hits boundary
Normal rich text 4–6 4–6 boundary hit → normal branch cascade
Dense inline 5–8 5–8 frequent tagClose / tagPrefix hits, fast skip has limited benefit
Dense inline + shorthand 6–9 6–9 tagClose miss → tagPrefix miss → shorthand check → appendBuf or push
Escape-prefix heavy 12–15 5–7 lead-char bucketing: \z bucket miss returns immediately, no longer scans all tokens

Fast skip in 1.4.3 mainly benefits the plain-text and escape-heavy extremes; tag-dense input sees roughly unchanged k. For the dominant real-world case (normal rich text), k ≈ 4–6.


Appendix G: definition of n and benchmark sizes

n = text.length          // UTF-16 code units

For ASCII DSL fixtures, bytes ≈ code units ≈ characters. The benchmark inputs:

Benchmark group n
Full-document parse 204,803
Position tracking 204,840
Lightweight utilities 200,067

For deep-nesting fixtures the test parameter is nesting depth d, but each layer adds a constant number of source characters, so n = c · d + O(1) and Θ(n) = Θ(d).


Appendix H: historical empirical constants (v1.1.9 baseline)

This section is kept as a historical baseline from the v1.1.9 era. It is useful for understanding constant-factor scale under a fixed methodology, but it should not be read as the current 1.4.3 release number set.

For current release-era numbers, use the Performance page. The table below stays here only as an older same-methodology reference point.

Environment: Kunpeng 920 / aarch64 / Node v24.14.0. The numbers below all come from the same benchmark script (.tmp/bench-pos-single.mjs) under identical methodology:

  • Input: fixed fixture, n = 204,803 bytes (~200 KB)
  • Each cell = 20 fresh cold-boot node processes, run strictly sequentially with no concurrency, so JIT / heap state never leaks across trials
  • Inside each process: 100 warmups (so V8 fully tier-ups the parser — anything less leaves the measurement bimodal because the optimization threshold may or may not have fired by call 31) → 15 measurement runs, take the per-process median as that process's representative sample
  • The reported ms = arithmetic mean of the 20 per-process medians; ns/cu = ms × 10⁶ / n

Full parse + position-tracking overhead (~200 KB, n = 204,803)

API tracking off tracking on overhead
parseRichText 25.58 ms (124.9 ns/cu) 34.06 ms (166.3 ns/cu) +33.2%
parseStructural 17.75 ms (86.7 ns/cu) 22.61 ms (110.4 ns/cu) +27.4%

These are lower than the older "30.6 / 23.3 ms" wiki numbers for two reasons at once: the v1.1.9 baseline already included several rounds of render / scanner constant-factor wins on top of the version those older numbers came from; and this methodology (100 warmup + 15-sample median × 20 cold processes) is more stable than the old "3 warmup + 20 in-process samples for all 4 cases in one process", where the in-process JIT state penalised whichever case ran first. To reproduce, run node .tmp/bench-pos-single.mjs <target> directly.

Lightweight utilities (~200 KB, n = 200,067)

Data represents the average of 5 independent process runs, each containing 50 internal samples.

Operation Time Per unit
printStructural ~2.00 ms ~10.0 ns/char — Θ(total node text size)
buildZones ~0.74 ms Θ(top-level nodes); 1,897 zones from 6,321 nodes
walkTokens ~0.50 ms Θ(total tokens); 9,165 visits

buildZones iterates only the top-level StructuralNode[] array (zones.ts), so its cost scales with the number of top-level nodes, not with source length directly. The per-char figure (~3.7 ns/char) is a derived number from this specific benchmark and should not be extrapolated to inputs with different node-to-character ratios.

These constants are machine-specific and benchmark-specific; they are not comparable across groups.

Stack safety notes for utility functions

The two parser entry points (parseStructural, parseRichText) and materializeTextTokens are fully stack-iterative (explicit stack + while loop). All utility functions are now also stack-iterative:

Function Recursion style Stack-safe?
printStructural Iterative (explicit stack + while loop) Yes
walkTokens Iterative (explicit stack + while loop) Yes
mapTokens Iterative (explicit stack + while loop) Yes

For typical document depths (hundreds of levels) this is not a concern. It will not overflow even for pathological inputs in the tens-of-thousands range.


Appendix I: what controls the constant factor

The complexity is always Θ(n), but the wall-clock time for a given n depends on how much work each source character causes. Four input-shape dimensions dominate:

1. Tag density

How often the scanner hits a tag prefix and enters open/close/gating logic.

Input profile Effective multiplier
Plain text / low DSL ~1×
Normal rich text ~1.5–3×
Dense inline tags ~3–6×
Dense inline + deep nest ~4–8×

2. Output node density

More nodes per character → more object allocation, more children array work, more token materialization on the parseRichText path.

The flushBuffer path in the structural scanner uses a segment-pair optimization: for the common case of 1–2 text segments it concatenates directly without allocating a temporary array (structural.ts lines 312–315). This keeps the per-flush constant low for typical inputs.

3. Tree shape: wide vs. deep

Dense siblings and deep chains have similar n but different cache behavior and different children attachment patterns. The Performance memory profile already shows this: 200 KB dense inline and 20k nested inline have different heap/RSS profiles at similar input scales.

4. API path

parseRichText = structural scan + render materialization, so its constant is always higher than parseStructural on the same input.


Appendix J: a more useful cost model

For engineering estimation, the total cost decomposes as:

T ≈ α · n_chars + β · n_tags + γ · n_nodes + δ · depth_max

For parseRichText, add + ε · n_tokens.

All these derived quantities are upper-bounded by n_chars, so the asymptotic class stays Θ(n). But in practice these are the knobs that determine fast vs. slow for a given source length.

Derived metric What it captures
n_chars Baseline scan cost
n_tags Tag-prefix hits, open/close/gating branches
n_nodes Object allocation, children arrays
depth_max Explicit stack frame depth, cache locality
n_tokens Render-layer materialization (rich text only)

Appendix K: legacy summary

n is always source length; what varies across inputs is the per-character work density — driven by tag density, node density, nesting depth, and the render path.


Appendix L: incremental parsing complexity (outside the full-parse Θ(n) theorem)

Sections 1.1–1.2.3 analyse the initial parse, which is always Θ(n). Incremental parsing targets a fundamentally different cost model: after a document has been parsed once, subsequent edits should cost proportional to the changed region, not the full document.

Main theorem, proof boundary, and exclusions

The proof object in this section is the call-time cost of the update APIs. It is not “whatever lazy work may happen later is retroactively charged to the same call.”

Theorem U (single incremental update-path call)

T_update-call(n) = O(n)

Here T_update-call(n) means the worst-case time of one incremental update-path call itself (core path: updateIncrementalInternal(...)). It includes:

  1. assertValidEdit(...)
  2. fingerprint comparison
  3. findDirtyRange(...)
  4. reparseDirtyWindowUntilStable(...)
  5. isSafeRightReuse(...)
  6. deferShiftZone(...)
  7. document assembly plus installLazyDocument(...)
  8. cloneParseOptions(...) when the incremental path is actually committed

It explicitly excludes two later-stage costs:

  1. lazy materialization triggered by a subsequent first read of doc.tree / doc.zones;
  2. incremental diff added by session.applyEditWithDiff(...) after the update completes.

Theorem S (session call)

T_session-applyEdit-call(n) = O(n)

because createIncrementalSession(...).applyEdit(...) only wraps updateIncrementalInternal(...) with constant-cost strategy gating, sampling, and fallback orchestration.

Proof spine

This section uses the same structure as the main theorem body: cost decomposition first, support lemmas second, closure last. The spine is not the pipeline diagram by itself, but the additive formula:

T_edit-call
  = T_validate
  + T_fingerprint
  + T_dirtyFind
  + T_reparse
  + T_probe
  + T_shift
  + T_assemble
  + T_clone

The subsections below discharge these terms one by one. The final closure then uses Δ <= n, Z <= n, Z_right <= Z <= n, and the boundary assumption that H does not grow with document length n, to derive the worst-case O(n) bound.

The linear guarantee in this section first covers the update path itself (the path used by session.applyEdit(...)):

applyEditWithDiff(...) is not a standalone exported function; it is a method on the object returned by createIncrementalSession(...). Starting in 1.3.8, it adds an extra structural-diff phase after the update completes; that diff stage has its own complexity model and should not be conflated with the parser/updater Θ(n) / O(n) guarantee.

The implementation core is updateIncrementalInternal(...). Public update calls reach it through createIncrementalSession(...), whose returned session object layers adaptive strategy gating on top.

Variable definitions used throughout this section:

Symbol Definition
n newSource.length — full document length after the edit
Δ edit.newText.length — inserted text length
d_old edit.oldEndOffset - edit.startOffset — deleted span in previous source
Z number of zones in the previous snapshot (prevZones.length)
z_dirty number of zones in the dirty range after expansion
H total handler data size across parseOptions.handlers
W_dirty character width of the dirty window (the reparsed slice)
Z_right number of right-side reused zones
W_probe character width of the seam-probe reparse window

Update pipeline decomposition

flowchart LR
    subgraph INCR["updateIncrementalInternal pipeline"]
        direction LR
        S1["① assertValidEdit<br/>O(Δ)"]
        S2["② fingerprint check<br/>O(H) / O(1)"]
        S3["③ zone guards<br/>O(1)"]
        S4["④ findDirtyRange<br/>O(Z)"]
        S5["⑤ reparseDirtyWindow<br/>≤ O(n)"]
        S6["⑥ seam probe<br/>O(W_probe)"]
        S7["⑦ deferShiftZone<br/>O(Z_right)"]
        S8["⑧ assemble<br/>O(Z)"]
        S1 --> S2 --> S3 --> S4 --> S5 --> S6 --> S7 --> S8
    end

    S3 -. " empty zones / low zone / tail gap " .-> FB["fullRebuild Θ(n)"]
    S2 -. " fingerprint mismatch " .-> FB
    S5 -. " budget exceeded " .-> FB
    S6 -. " signature mismatch " .-> FB
Loading

updateIncrementalInternal(...) executes a fixed sequence of pipeline stages. Each stage is analysed independently; no stage feeds back into an earlier one.

1. Edit validation — assertValidEdit (incremental.ts:941–965)

Three checks:

  • Range bounds: O(1) comparisons.
  • Expected-length: one subtraction + one addition → O(1).
  • Text match: newSource.slice(edit.startOffset, edit.startOffset + edit.newText.length) produces a string of length Δ, then a === comparison of that string against edit.newTextO(Δ).
T_validate(Δ) = O(Δ)

2. Fingerprint check — buildParseOptionsFingerprint (incremental.ts:346–369)

Hashes the syntax fields (8 keys → O(1)), tagName (2 function refs → O(1)), and allowForms (array of length k → O(k)). The handlers fingerprint is folded in via a single u32 feed per handler entry.

T_fingerprint(H) = O(H)    where H = handler count + |allowForms| + |syntax keys|

For a repeated call on the same doc.parseOptions object the result is cached via getCachedOptionsFingerprint (WeakMap lookup → O(1)). In the session path, options rarely change between edits, so the common case is O(1).

3. Dirty range finding — findDirtyRange (incremental.ts:840–872)

Single-pass linear scan of prevZones that simultaneously tracks the overlap range and the insertion index (for pure-insertion edits with no overlap). The scan touches each zone at most once → O(Z).

After overlap detection the range is expanded by LEFT_LOOKBEHIND_ZONES = 1 (incremental.ts:431) on the left. The right boundary is the last overlapping zone.

T_dirtyFind(Z) = O(Z)

4. Dirty window reparse — reparseDirtyWindowUntilStable (incremental.ts:877–935)

This is the critical path. The function parses the slice newSource[dirtyStartNew..dirtyEndNew] and checks whether the right boundary of the produced tree aligns with the old zone boundary. If not, it expands dirtyTo by one zone and re-parses the (now wider) window.

Each iteration:

  • computes window boundaries → O(1)
  • calls parseWithPositions on a window of width W_i → O(W_i)
  • calls buildZones on the resulting tree → O(nodes in W_i)
  • checks boundary stability → O(1)

In the worst case, expansion runs k times with successive window widths W_1, W_2, …, W_k. Each W_i ≥ W_{i−1} because the window only grows. The cumulative reparsed byte count is therefore Σ_{i=1}^{k} W_i — a potential O(k · W_k) quadratic term.

But: the cumulative budget guard enforces a fixed internal limit:

cumulativeBudget = Math.max(newSource.length * 2, 1024)

This is an implementation constant, not a public tuning knob. The loop checks nextCumulativeReparsedBytes > cumulativeBudget. Once exceeded, the function returns budgetExceeded = true immediately, and updateIncrementalInternal falls back to fullRebuild() — which is a fresh parseIncremental(newSource), itself O(n) (proven in §1.2).

Therefore the total reparse work across all expansion iterations is bounded by min(2n, Σ W_i). If the budget is reached, the subsequent full rebuild is O(n). In either path:

T_reparse(n) ≤ O(n)

5. Seam probe — isSafeRightReuse (incremental.ts:613–660)

Validates that the zones to the right of the dirty window can be reused without re-parsing. Detailed analysis follows in the next section.

6. Lazy delta shifting — deferShiftZone per right zone (incremental.ts:1145)

Each surviving right zone is wrapped with a deferred offset delta. Per-zone cost is O(1) (see §Lazy delta shifting cost model below).

T_shift(Z_right) = O(Z − z_dirty)

7. Document assembly (incremental.ts:1145–1160)

const nextRawZones = [...leftZones, ...dirtyZones, ...rightZones];

Three array spreads. Total elements = Z_left + z_dirty_new + Z_right ≤ Z + z_dirty_new. Since z_dirty_new ≤ Z and all zone counts ≤ Z + O(z_dirty_new):

T_assemble(Z) = O(Z)

Amortized reparse budget

This section proves the key invariant that prevents quadratic blowup in reparseDirtyWindowUntilStable.

Claim. The total number of source bytes reparsed across all expansion iterations of a single updateIncrementalInternal call is at most 2n, where n = newSource.length.

Proof.

  1. The cumulative byte counter cumulativeReparsedBytes is initialised to 0 at the start of updateIncrementalInternal(...).

  2. Each iteration of reparseDirtyWindowUntilStable(...) adds the current window width reparsedWindowSize to the counter.

  3. Before parsing, the guard checks:

    if (nextCumulativeReparsedBytes > cumulativeBudget) return { budgetExceeded: true, … }
    

    where cumulativeBudget = Math.max(newSource.length * 2, 1024).

  4. If the guard fires, the caller invokes fullRebuild(), which is a single full-document parse — O(n) by the analysis of §1.2.

  5. If the guard never fires, the sum of all window widths is ≤ 2n. Each window is parsed in O(W_i) time (linear-time parser, §1.2), so the total parse work is O(Σ W_i) ≤ O(2n) = O(n).

  6. Therefore, regardless of whether the budget is exhausted:

T_reparse(n) ≤ O(n)

Worst-case trigger. The budget fires when every zone boundary in the dirty window is "just unstable" — the right-boundary check at lines 791–793 fails each time, forcing one more zone into the window. Even in this pathological case, the budget caps total work at 2n before falling back to a fresh O(n) parse. The combined cost is O(n) + O(n) = O(n). ∎

Seam probe bounded cost

isSafeRightReuse(...) re-parses a small window around the seam (the boundary between dirty zones and reused right zones) and compares the result against the old zones via structural signatures.

Probe window width.

probeZoneCount      = min(RIGHT_REUSE_PROBE_ZONES, oldRightZones.length)
                    = min(2, oldRightZones.length)

probeWindowZones    = min(probeZoneCount + RIGHT_REUSE_PROBE_EXTRA_ZONES,
                          oldRightZones.length)
                    = min(probeZoneCount + 1, oldRightZones.length)

The constants are currently hard-coded as RIGHT_REUSE_PROBE_ZONES = 2 and RIGHT_REUSE_PROBE_EXTRA_ZONES = 1, so the probe covers at most 3 old zones mapped to new-source offsets. Let W_probe be the character width of those 3 zones in the new source. The probe calls parseWithPositions(newSource.slice(…), …) on a slice of length W_probe → O(W_probe) parse cost.

Signature comparison cost.

After parsing, the probe compares at most probeZoneCount (≤ 2) zones by structural signature. The signature budget is:

RIGHT_REUSE_PROBE_SIGNATURE_NODE_BUDGET = 4096        // line 332

Initialised at line 510:

const signatureBudget: SignatureBudget = {remaining: 4096};

nodeSignature (incremental.ts:500–560) consumes 1 budget unit per node (line 410). If the budget reaches 0, signature computation returns undefined and the probe bails — forcing a full rebuild.

Per node, nodeSignature calls fnvFeedStringBounded (hash.ts:36–49) which touches at most BOUNDED_HEAD + BOUNDED_TAIL = 64 characters regardless of content length (hash.ts:34–35). The remaining hash operations (fnvFeedU32, fnvInit) are O(1).

T_signature_per_node = O(1)           (bounded by 64-char window)
T_signature_total   ≤ O(4096) = O(1)  (constant budget)

Old-zone signatures are typically cached (WeakMap zoneSignatureCache, incremental.ts:448) from the initial parseIncremental call (line 982), so zoneSignature(expected) at line 521 is O(1).

Probe reparse cost.

W_probe spans at most 3 zone widths. In the typical case, zones are much smaller than n, so W_probe ≪ n. However, in the degenerate case of very few, very large zones, W_probe can approach n.

This is acceptable because the probe's reparsed bytes are not added to the cumulativeBudget counter of reparseDirtyWindowUntilStable (it is tracked separately as probeSliceBytes), but the full-rebuild fallback at line 989 exceeds O(n) for the reparse plus O(n) for the fallback rebuild:

T_probe(n)  = O(W_probe) + O(1)
            ≤ O(n) + O(1)         (worst case)
            = O(n)

Snapshot clone cost

cloneParseOptions (incremental.ts:258–284) deep-copies the parse options only after all early full-rebuild guards have passed (1.2.5+). If any guard triggers a full rebuild, no clone occurs — parseIncrementalInternal clones internally.

  • Top-level spread: O(|fields|) = O(1) (fixed set of known keys).
  • cloneHandlersSnapshot (incremental.ts:239–251): iterates handler keys and recursively deep-clones each handler's data property. Cost: O(H) where H = Σ (handler key count + nested data size).
  • allowForms spread: O(|allowForms|), folded into H.
  • frozenSnapshots WeakSet guard (line 179): if the options object has already been cloned and frozen, the function returns immediately → O(1).

In the session path, if options is undefined (the common case — options don't change between edits), the old snapshot is reused directly — no clone at all.

T_clone(H) = O(H)     per edit, where H is independent of n

Lazy delta shifting cost model

When right zones survive reuse, their source offsets must be shifted by delta = newSource.length − doc.source.length. Eagerly traversing every node in every right zone would cost O(total_nodes_in_right_zones) — potentially O(n). The implementation defers this work.

deferShiftZone (incremental.ts:768–782):

  1. pendingShiftDeltas.get(zone) — WeakMap lookup → O(1).
  2. Creates a new Zone object with shifted startOffset/endOffset but the same nodes array reference → O(1).
  3. pendingShiftDeltas.set(newZone, totalDelta) → O(1).
  4. Copies signature cache entry if present → O(1).
T_deferShift_per_zone = O(1)
T_deferShift_total    = O(Z_right) = O(Z − z_dirty)

materializeZone (incremental.ts:786–797):

Triggered lazily when a consumer first accesses doc.zones (line 682). Calls shiftNode on every node in the zone. shiftNode uses an iterative explicit stack, visiting every node in the subtree once:

T_materialize_per_zone = O(nodes_in_zone)
T_materialize_total    = O(total_nodes) = O(n)

Crucially, materialization happens on first consumer access, not during the edit. If the consumer never reads doc.zones (e.g., only reads doc.source), materialization cost is 0.

If the consumer reads after k successive edits, the accumulated deltas are composed: each deferShiftZone adds its delta to the existing pending delta (line 635–649). Materialization still traverses each node exactly once, applying the composed delta in a single pass:

T_materialize_after_k_edits = O(n)      (same as after 1 edit)

Aggregate amortized analysis. Consider an editing session with k consecutive applyEdit calls, after which the consumer reads doc.tree, triggering a single materialization.

  • Each applyEdit costs ≤ O(W_dirty_i + Z + H), where W_dirty_i is the dirty window width for the i-th edit. The materialization flag is set (dirty = true), but no materialization is performed.
  • The final materialization traverses all nodes, executing shiftNode's explicit-stack iteration: each node is visited once. Total cost = O(node_count) = O(n).

Therefore, the total cost for k edits + 1 materialization:

Σᵢ T_edit(i) + T_materialize = Σᵢ O(W_dirty_i + Z + H) + O(n)

If each edit is local (W_dirty_i ≪ n), edit total ≤ k · O(W_dirty_max + Z + H), and the O(n) materialization cost is paid only once. Amortizing materialization across k edits:

amortized_per_edit = O(W_dirty + Z + H) + O(n/k)

When k ≥ n / W_dirty, the amortized materialization cost ≤ O(W_dirty), matching the edit cost. This formally proves the economy of lazy materialization: frequent edits + deferred reads = optimal amortization.

Closed-form bound

Combining all pipeline stages:

T_edit(n, Δ, Z, H)
    ≤ T_validate(Δ)  + T_fingerprint(H)  + T_dirtyFind(Z)
    + T_reparse(n)   + T_probe(n)         + T_shift(Z)
    + T_assemble(Z)  + T_clone(H)

    ≤ O(Δ) + O(H) + O(Z) + O(n) + O(n) + O(Z) + O(Z) + O(H)

    = O(n + Δ + Z + H)

Since Δ ≤ n (inserted text cannot exceed new source length) and Z ≤ n (each zone spans ≥ 1 character) and H is independent of n:

T_edit(n) ≤ O(n)     (worst case, dominated by reparse budget or full-rebuild fallback)

For the typical case where the edit is localized (small Δ, small dirty window, most zones reused):

T_edit_typical ≤ O(W_dirty + Z_right + H)

where W_dirty is the dirty window width (proportional to edit locality, not to n) and Z_right is the number of right zones getting lazy-shifted (O(1) each). In an editor scenario with single-character keystrokes, W_dirty is typically a few hundred characters and Z_right is the number of zones after the cursor — each costing O(1) via deferred shifting.

Formal conditions for "typical local edit". The W_dirty ≪ n assumption does not hold for all edits. The following are sufficient conditions for sub-linear behavior:

  1. Bounded edit span: The replacement interval [replaceStart, replaceEnd) satisfies > replaceEnd − replaceStart ≤ δ, where δ is a constant independent of n (e.g., δ = 0 for > single-character insertion, δ ≤ line_width for single-line replacement).
  2. Edit does not cross zone boundaries: The replacement interval falls entirely within a > single zone z_i. This ensures findDirtyRange only inspects z_i and its neighbors, giving > W_dirty ≤ |z_i| + |z_{i±1}|.
  3. Edit does not break tag closure structure: If the edit inserts or deletes syntax > delimiters ($$, (, ), etc.), it may trigger large-scale reparsing. "Typical" edits > modify characters within text regions, not syntax boundary characters.
  4. Reparse budget not exhausted: cumulativeBudget = Math.max(newSource.length * 2, 1024) is > not consumed by a single reparse. If the dirty window's reparsing exceeds this budget, it > falls back to a full rebuild (O(n)).

Under these four conditions: W_dirty ≤ |z_max| + |z_max| = 2 · |z_max|, where |z_max| is indirectly bounded by softZoneNodeCap (default 64 nodes; invalid user values are normalized back to that default) — each zone contains at most 64 nodes, and each node's source span has a practical upper bound in typical documents. Therefore W_dirty = O(1) relative to n, giving T_edit_typical = O(Z + H) = O(Z).

Zone count vs. incremental overhead trade-off (1.2.4+)

The closed-form bound makes the Z dependency explicit: findDirtyRange, deferShiftZone, and assemble all cost O(Z). When the edit is highly localised (W_dirty ≪ n), Z becomes the dominant term. More zones ≠ faster.

Zone splitting changes the source of Z. Before 1.2.4, Z was entirely determined by the distribution of raw / block nodes ("hard zone breakers") in the document. 1.2.4 introduces softZoneNodeCap (default 64): non-breaker nodes are flushed into a new zone every time the accumulator reaches the cap. Invalid or non-positive user values are normalized back to the default, so this cap remains active. The resulting zone count is approximately:

Z ≈ N_breaker + ⌈N_nonbreaker / softNodeCap⌉

where N_breaker is the raw/block node count and N_nonbreaker is the text/inline/escape/separator node count. For a given document, smaller softNodeCap → more zones → narrower dirty windows (W_dirty ↓) but higher scaffold cost (Z ↑).

Empirical data (1 MB document, Kunpeng 920 aarch64, Node v24.14.0, head-of-document single-character replacement):

Zone count Source Incremental median Full parse Speedup Notes
264 Pure inline, softCap=64 ~12 ms ~127 ms ~10× Sweet spot
1 571 Sparse raw (every 50 lines) ~12 ms ~130 ms ~10× Sweet spot
3 757 Moderate raw (every 20 lines) ~15 ms ~130 ms ~9× Still good
7 015 Dense raw (every 10 lines) ~19 ms ~130 ms ~7× Scaffold rising
17 871 Ultra-dense raw (every 3 lines) ~38 ms ~136 ms ~3.5× Z dominates

Head-of-document edits are worst-case direction (Z_right ≈ Z); tail edits are faster.

Sweet spot: a few hundred to a few thousand zones. In this range the dirty window is narrow enough while scaffold overhead remains negligible, yielding ~10× speedup. Above ~10 000 zones, the scaffold cost (primarily the single-pass scan in findDirtyRange and per-zone object creation in deferShiftZone) consumes most of the savings.

This is why SOFT_ZONE_NODE_CAP defaults to 64 rather than a smaller value — it keeps Z in the hundreds-to-thousands range for typical documents, landing squarely in the sweet spot.

Low-zone guard (1.2.4+)

When prevZones.length ≤ 1 (the entry guard in updateIncrementalInternal(...)), the incremental path has no reusable zone boundaries — the dirty window is the entire document. In this case the findDirtyRange + seam probe + assemble overhead is pure waste.

1.2.4 adds a guard at the entry of updateIncrementalInternal: if prevZones.length ≤ 1, it immediately calls fullRebuild(), skipping the entire incremental pipeline. This guarantees that single-zone documents (pure text, pure inline, no handlers, etc.) are never slower than a bare full parse.

Session adaptive strategy complexity

createIncrementalSession (src/incremental/incremental.ts) wraps tryUpdateIncrementalInternal with an adaptive strategy layer.

applyEdit:

Step Cost
Strategy check (strategy === "full-only") O(1)
Edit ratio computation O(1)
Ratio gate (editRatio > maxEditRatioForIncremental) O(1)
Cooldown gate (preferFullMode && cooldownRemaining > 0) O(1)
tryUpdateIncrementalInternal call T_edit
Timing: now() before + after O(1)
recordBounded(incrementalDurations, …) O(1)
recordBounded(fallbackMarks, …) O(1)
maybeAdaptPolicy() see below

maybeAdaptPolicy:

  • Early exit if strategy !== "auto" → O(1).
  • average(fallbackMarks) iterates at most sampleWindowSize elements → O(sampleWindowSize).
  • average(incrementalDurations) and average(fullDurations) → same bound.
  • sampleWindowSize is bounded by session options (default 24); the recordBounded helper enforces bucket.length ≤ sampleWindowSize by shifting excess entries.
T_adaptPolicy = O(sampleWindowSize) = O(24) = O(1)

Therefore the total session overhead per edit is O(1) — constant, independent of n:

T_session_edit = T_edit + O(1) = same asymptotic class as the raw update core path

Incremental diff complexity (1.3.8+)

applyEditWithDiff(...) has two parts:

T_applyEditWithDiff = T_applyEdit + T_diff

T_applyEdit has already been proven worst-case O(n). The additional cost comes from T_diff.

From 1.4.0+, this diff path is also explicitly budgeted by default:

  • refinementDepthCap = 64
  • maxComparedNodes = 20_000
  • maxAnchorCandidates = 128
  • maxOps = 512
  • maxSubtreeNodes = 256
  • maxMilliseconds = 8

There is also a hard cutoff:

  • MAX_FULL_FALLBACK_DIFF_REFINEMENT_SOURCE_LENGTH = 20_000

If the session is already in full-fallback mode and either side exceeds that size, the implementation skips deep refinement and goes directly to conservative diff.

These budgets and cutoffs are there to bound degradation and allow coarser output. They do not turn session.applyEditWithDiff(...) into a linear-time guarantee.

1. Conservative diff path

When fine-grained refinement throws, or when the session has already taken the full-fallback route on a large document, the implementation switches to buildConservativeTokenDiff(...).

That path does only three meaningful things:

  1. reads previousDoc.tree.length / nextDoc.tree.length → O(1)
  2. builds the whole-tree patches / ops envelope → O(1)
  3. copies oldNodes / newNodes once with slice() → O(N_old + N_new)

Let:

N_old  = previousDoc.tree.length
N_new  = nextDoc.tree.length
N_root = N_old + N_new

Then:

T_diff_conservative = O(N_root)

So the conservative route remains linear.

2. Fine-grained diff path: computeTokenDiff(...)

The refined route is driven by diffNodeArrays(...) plus processNestedNodeArrayDiffs(...). It is not a single left-to-right pass: it may perform anchor discovery, partition a range, then continue refining the remaining subranges and nested containers.

Define:

M = total old/new nodes actually refined by this diff
P = total emitted structural diff ops

For any one container-range pair with combined length m, diffNodeArrays(...) performs:

  1. Prefix/suffix trimming — linear scan from both ends → O(m)
  2. Anchor search via findAnchors(...)
    • old/new signature-count maps → O(m)
    • candidate sorting → O(a log a), a ≤ m
    • LIS extraction → O(a log a)
    • so the anchor phase is O(m log m)
  3. If no anchors survive
    • equal-length branch: aligned comparison + optional nested-task enqueue → O(m)
    • non-equal-length branch: splice fallback + node copying → O(m)

Therefore one range satisfies:

R(m) ≤ O(m log m) + child-range refinement cost

3. Why this is not a linear guarantee

In the worst case, each anchor step isolates only a tiny stable island and leaves one large subrange of size m - 1 for the next iteration. That gives:

R(m) = R(m - 1) + O(m log m)

Unrolling the recurrence:

R(m) = O(Σ_{i=1..m} i log i) = O(m² log m)

So fine-grained structural diff is not Θ(n) in the worst case. Its goal is high-quality local change description, not the same strict linear bound the parser itself provides.

4. Depth cap and final sort

  • diffRefinementDepthCap only limits how far refinement continues into nested children / args.
  • Once the depth cap is exceeded, the algorithm degrades to a container-level splice.
  • This bounds deep-chain refinement cost, but it does not remove the worst-case O(m² log m) behaviour caused by repeated anchor partitioning on wide ranges.
  • After refinement, the final ops array is sorted once:
T_sort_ops = O(P log P)

5. Closed-form bound

So for applyEditWithDiff(...):

T_applyEditWithDiff
    = T_applyEdit + T_diff
    ≤ O(n) + min(
          O(N_root),                 // conservative diff
          O(M² log M + P log P)      // refined diff
      )

In the common local-edit case, M is usually limited to a small root slice plus a small number of nested containers, so real cost is far below this upper bound. But the strict linear-time guarantee belongs to the parser / updater, not to fine-grained diff refinement.

Comparison with full parse

Operation Worst case Typical (local edit) Notes
parseIncremental (initial) Θ(n) Θ(n) Equivalent to parseStructural + buildZones
session.applyEdit O(n) O(W_dirty) Plus O(1) strategy overhead
session.applyEditWithDiff Typically near O(n + T_diff_local); under default budgets the worst case is forced back toward an O(n)-bounded update path via coarse degradation O(W_dirty + T_diff_local) Fine-grained diff is not strictly linear, but the default session now hard-stops once compare / anchor / op / subtree / wall-clock budgets are exceeded
Materializing doc.zones O(n) O(n) Lazy delta applied in single pass

In the common editor scenario (keystroke-level edits, W_dirty ≪ n):

Full parse per keystroke:    Θ(n) × k keystrokes = Θ(k·n)
Incremental per keystroke:   O(W_dirty) × k       = O(k·W_dirty)     (+ one O(n) materialization)

The default session diff path also carries hard refinement budgets: if the root pass exceeds budget it immediately falls back to a whole-tree conservative diff; if nested refinement exceeds budget it keeps the root patch shape but degrades to root-level splice ops. In other words, **the worst case no longer keeps paying unbounded cost just to chase a smaller patch**.
Those budgets are accumulated through an internal `DiffBudgetState`; exhaustion is not treated as an exception, but as an ordinary control-flow branch that immediately coarsens the diff.
In addition, when an edit stays on the incremental update path, root-level fine diff is now constrained to the source window of the incremental dirty zone before anchor / nested refinement begins. That keeps the normal-case cost closer to `O(W_dirty + T_diff_local)` instead of refining the entire root tree.
If you care which version introduced this as the default behavior, see [Version Semantics Notes](en-Version-Semantics).

The incremental path wins whenever k · W_dirty + n < k · n, which holds for any W_dirty < n and k ≥ 2.

Worst-case input models

Three orthogonal axes produce worst-case behaviour specific to the incremental path.

1. Maximum dirty window expansion: every zone boundary is just unstable

Construction. A document where every zone's last node is a partial tag that can absorb content from the next zone. Each expansion step adds one zone, and the right-boundary stability check fails every time.

Analysis. Expansion proceeds until either:

  • The boundary stabilises (loop terminates), or
  • cumulativeReparsedBytes > cumulativeBudget = max(2n, 1024).

Window widths form an increasing sequence W_1 < W_2 < … < W_k. Even without overlap, Σ W_i ≤ 2n before the budget fires. The subsequent fullRebuild() is O(n). Total cost:

T_worst_expand = O(2n) + O(n) = O(n)

The quadratic Σ_{i=1}^{k} W_i term is not quadratic because the budget prevents the sum from exceeding 2n.

2. Maximum seam probe cost: giant zones make probe window approach n

Construction. A document with Z = 3 zones of roughly equal width n/3 each. An edit touches zone 1; zones 2–3 are right candidates. The probe window spans min(3, 2) = 2 zones of width ≈ 2n/3.

1.2.4 note: The low-zone guard (prevZones.length ≤ 1 in updateIncrementalInternal(...)) diverts prevZones.length ≤ 1 to full rebuild before the probe is reached. Therefore the degenerate case of a single giant zone never hits the seam probe in practice. However, Z = 2–3 still reaches the probe with large windows.

Analysis. Probe reparse: O(2n/3) = O(n). Signature comparison: bounded by RIGHT_REUSE_PROBE_SIGNATURE_NODE_BUDGET = 4096 nodes, each costing O(1) via fnvFeedStringBounded (hash.ts:36–49, bounded to 64 characters). Signature cost = O(4096) = O(1).

T_worst_probe = O(n) + O(1) = O(n)

Combined with reparse budget: the total across dirty-window reparse + probe is at most O(n) + O(n) = O(n). If both hit their worst case simultaneously, the full-rebuild fallback caps the total at O(n).

3. Maximum fingerprint mismatch: every edit brings new options

Construction. The caller passes a different parseOptions object on every applyEdit call, causing the fingerprint check (incremental.ts:1052–1076) to detect a mismatch every time.

Analysis. Mismatch triggers fullRebuild() immediately — before any dirty range computation, reparse, or probe. Each full rebuild is O(n).

T_worst_fingerprint = O(n) per edit

This is the expected fallback behaviour, not a bug. The fingerprint check is O(H) — one hash computation — which is negligible compared to the O(n) rebuild it triggers. The session's maybeAdaptPolicy will detect the repeated fallbacks and switch strategy to "full-only" after minSamplesForAdaptation (default 6) samples, eliminating the fingerprint overhead entirely.

⚠️ **GitHub.com Fallback** ⚠️