en Performance - chiba233/yumeDSL GitHub Wiki

Performance

Home | API Reference | Source Position Tracking

This page collects the parser performance notes that used to be scattered across multiple wiki pages.


Daily Document Benchmark

This section is now organized as one continuous story instead of a separate "historical" block and a separate "follow-up" block. The tables are still split by what they answer, but they should be read together.

First, the measurement framing:

  • The 200 KB headline benchmark uses a mixed document shaped like a normal blog / CMS / editor document
  • The memory profile still uses dense inline and 20k nested inline cases
  • The expanded run methodology is one case per process, --expose-gc, forced GC before and after each measured round, and at most 20 measured rounds per case

Environment:

  • Existing tables: Kunpeng 920 aarch64 / Node v24.14.0
  • Expanded run: Kunpeng 920 aarch64 / Node v24.15.0

Node minor version changed between runs, so cross-table reads should focus on band and trend, not on tiny 1โ€“2 ms differences.

Version grouping overview (1.1.8+)

This is not a replacement for the raw tables below. It is just the reading guide.

Group Versions How to read it
Speed-advantage versions 1.2.1 / 1.2.2 / 1.2.4 / 1.2.6 / 1.2.7 / 1.4.3 / 1.4.4 This group captures the stronger full-parse speed line: 1.2.6 / 1.2.7 remain the best broad-speed representatives, while 1.4.3 / 1.4.4 are the rare late-line versions that pull the structural hot path back down sharply
Memory-advantage versions 1.1.9 / 1.2.4 / 1.2.6 / 1.3.4 / 1.3.5 Lower heapUsed after parseStructural; 1.3.4 / 1.3.5 are especially strong at 200 KB and 20k nested
Balanced versions 1.1.9 / 1.2.1 / 1.2.2 / 1.2.4 / 1.2.6 No obvious weakness in either speed or memory
Function-first phase 1.3.0 ~ 1.4.2, especially 1.3.6+ / 1.4.0+ Much of this line is paying for shorthand semantics, context-aware escaping, incremental diff semantics, and stable integration; as a group it is still not a memory-first line. 1.4.3 is the first version after this phase to shift focus back to constant-factor optimization

Full-document parse (~200 KB, version-line baseline)

This round was re-measured version-by-version, process-by-process: 1.1.0 / 1.1.1 / 1.1.2 / 1.1.3 / 1.1.4 / 1.1.5 / 1.1.6 / 1.1.7 each ran in its own Node process, so heap and JIT state from one version did not pollute the next. Test input was 204,803 bytes.

API 1.1.0 1.1.1 1.1.2 1.1.3 1.1.4 1.1.5 1.1.6 1.1.7
parseRichText ~4514.2 ms ~40.5 ms ~37.4 ms ~38.1 ms ~34.0 ms ~42.3 ms ~29.9 ms ~30.6 ms
parseStructural ~36.1 ms ~35.0 ms ~33.4 ms ~29.7 ms ~26.2 ms ~34.7 ms ~29.0 ms ~23.3 ms

This is the headline benchmark for normal blog / CMS / editor documents.

Three immediate reads:

  • 1.1.0 has an obviously abnormal parseRichText path and should no longer be used as the performance reference for the current line
  • within 1.1.1 to 1.1.5, the fastest parseRichText build is 1.1.3
  • parseStructural keeps tightening across the line; in the current data both 1.1.6 and 1.1.7 are now in the ~20 ms class, and the added 1.1.7 re-run measured ~20 ms

Extended full-parse table (1.1.8 ~ 1.4.4)

This extension keeps the mixed-document headline benchmark and widens the size ladder to 200 KB / 1 MB / 2 MB, so day-to-day document speed and medium / large document scaling can be read from the same table.

Version parseRichText 200 KB parseStructural 200 KB parseRichText 1 MB parseStructural 1 MB parseRichText 2 MB parseStructural 2 MB
1.1.8 ~36.83 ms ~25.86 ms ~141.5 ms ~96.28 ms ~244.9 ms ~163.9 ms
1.1.9 ~34.59 ms ~27.96 ms ~143.2 ms ~97.97 ms ~241.7 ms ~161.0 ms
1.2.0 ~34.22 ms ~25.79 ms ~142.0 ms ~100.8 ms ~257.2 ms ~162.9 ms
1.2.1 ~32.75 ms ~25.53 ms ~142.4 ms ~97.18 ms ~251.4 ms ~163.7 ms
1.2.2 ~32.74 ms ~26.07 ms ~139.3 ms ~97.68 ms ~256.3 ms ~161.0 ms
1.2.3 ~33.24 ms ~27.50 ms ~141.5 ms ~97.45 ms ~253.7 ms ~163.8 ms
1.2.4 ~31.70 ms ~26.56 ms ~141.6 ms ~97.24 ms ~254.5 ms ~164.1 ms
1.2.5 ~33.40 ms ~25.83 ms ~139.9 ms ~98.54 ms ~255.4 ms ~163.2 ms
1.2.6 ~32.22 ms ~27.29 ms ~140.7 ms ~96.57 ms ~251.6 ms ~159.5 ms
1.2.7 ~32.58 ms ~25.39 ms ~142.1 ms ~100.2 ms ~243.5 ms ~162.1 ms
1.3.0 ~34.46 ms ~26.06 ms ~140.0 ms ~97.68 ms ~248.9 ms ~163.3 ms
1.3.1 ~34.28 ms ~27.01 ms ~144.8 ms ~100.1 ms ~253.2 ms ~163.6 ms
1.3.2 ~34.82 ms ~27.10 ms ~145.3 ms ~97.08 ms ~245.1 ms ~162.3 ms
1.3.3 ~34.19 ms ~27.27 ms ~142.6 ms ~96.29 ms ~253.6 ms ~162.1 ms
1.3.4 ~45.70 ms ~37.64 ms ~153.8 ms ~103.7 ms ~254.3 ms ~178.2 ms
1.3.5 ~44.33 ms ~37.47 ms ~153.6 ms ~100.6 ms ~252.6 ms ~183.7 ms
1.3.6 ~51.10 ms ~40.27 ms ~151.1 ms ~112.0 ms ~292.5 ms ~192.7 ms
1.3.7 ~52.58 ms ~40.48 ms ~158.4 ms ~120.2 ms ~279.4 ms ~196.8 ms
1.3.8 ~47.11 ms ~41.60 ms ~160.1 ms ~119.8 ms ~284.2 ms ~198.2 ms
1.3.9 ~52.27 ms ~40.93 ms ~162.0 ms ~119.4 ms ~296.0 ms ~201.7 ms
1.4.0 ~49.41 ms ~40.98 ms ~157.4 ms ~116.1 ms ~292.3 ms ~207.0 ms
1.4.1 ~52.24 ms ~41.51 ms ~152.0 ms ~114.8 ms ~282.7 ms ~206.5 ms
1.4.2 ~51.86 ms ~39.78 ms ~158.6 ms ~119.0 ms ~281.2 ms ~212.9 ms
1.4.3 ~35.09 ms ~22.89 ms ~131.8 ms ~75.28 ms ~245.5 ms ~145.2 ms
1.4.4 ~32.75 ms ~24.37 ms ~134.8 ms ~77.10 ms ~244.3 ms ~152.4 ms

The useful read here is the version window, not a single cell:

  • 1.1.8 ~ 1.2.7 stays in the same full-parse speed class as 1.1.7, which matches the changelog story โ€” 1.1.8 itself was stack-safety work for walkTokens plus docs, so it should not drift far from 1.1.7
  • 1.2.1 ~ 1.2.7 is still the speed sweet spot for this mixed-document full-parse run
  • 1.3.0 ~ 1.3.3 lands shorthand without yet falling off a cliff on the headline benchmark
  • 1.3.4+ is not a tiny wobble; it is a real constant-factor regression. On parseStructural, 200 KB moves from the ~26 ms class to about ~40 ms (~50% higher), and 2 MB moves from about ~160 ~ 164 ms to about ~200 ~ 213 ms (~25% ~ 33% higher)
  • 1.4.0 ~ 1.4.2 is much more about stable incremental integration and clearer diff contracts than about pulling headline full-parse back into the 1.2.x low band
  • 1.4.3 is not just a partial recovery anymore. On the structural path it sets a new low for this post-1.1.7 window: parseStructural 200 KB drops from ~39.78 ms to ~22.89 ms (~42% lower than 1.4.2), 1 MB drops from ~119.0 ms to ~75.28 ms (~37% lower), and 2 MB drops from ~212.9 ms to ~145.2 ms (~32% lower)
  • 1.4.4 still belongs to the fast full-parse tier that 1.4.3 pulled back down, but the shape is now more "partly sustained, partly rebounded": parseRichText stays very strong at 200 KB / 2 MB with a mild uptick at 1 MB, while parseStructural is higher than 1.4.3 at all three sizes, though still well ahead of 1.4.0 ~ 1.4.2

onError version differences (1.1.x)

This section now lives on Version Semantics Notes:

Version memory profile (parseStructural)

Below is the structural-memory comparison across the recent parser rewrites.

This section is about the memory shape right after parse, so the tables below use afterParse.heapUsed, not the post-GC settled value.

Case 1.1.4 1.1.5 1.1.6 1.1.7
200 KB dense inline, heapUsed after parse 27.37 MB 22.73 MB 21.44 MB 21.60 MB
2 MB dense inline, heapUsed after parse 206.96 MB 137.83 MB 138.54 MB 138.51 MB
20k nested inline, heapUsed after parse 24.48 MB 17.84 MB 16.53 MB 16.53 MB

sampledPeakRss from the same run:

Case 1.1.4 1.1.5 1.1.6 1.1.7
200 KB dense inline 104.23 MB 92.22 MB 97.09 MB 96.69 MB
2 MB dense inline 372.37 MB 290.79 MB 294.80 MB 295.41 MB
20k nested inline 83.82 MB 75.59 MB 77.48 MB 78.53 MB

Earlier runs that put multiple versions and cases into the same process tended to inflate especially the 200 KB and 20k nested rows because residual objects, JIT state, and GC timing leaked across cases. The isolated-process re-runs brought the numbers back into a more stable band.

Extended memory table (1.1.8 ~ 1.4.4)

Version 200 KB heapUsed 1 MB heapUsed 2 MB heapUsed 20k nested heapUsed
1.1.8 21.46 MB 85.35 MB 138.93 MB 16.31 MB
1.1.9 21.33 MB 84.94 MB 138.90 MB 16.34 MB
1.2.0 21.46 MB 85.14 MB 138.78 MB 16.36 MB
1.2.1 21.50 MB 85.33 MB 138.74 MB 16.33 MB
1.2.2 21.42 MB 85.52 MB 138.91 MB 16.33 MB
1.2.3 21.51 MB 85.57 MB 138.72 MB 16.33 MB
1.2.4 21.31 MB 85.34 MB 138.62 MB 16.47 MB
1.2.5 21.64 MB 85.30 MB 138.62 MB 16.43 MB
1.2.6 21.26 MB 85.52 MB 138.64 MB 16.45 MB
1.2.7 21.47 MB 85.63 MB 138.60 MB 16.51 MB
1.3.0 20.85 MB 85.73 MB 138.87 MB 17.27 MB
1.3.1 21.47 MB 85.70 MB 138.94 MB 17.32 MB
1.3.2 21.48 MB 86.97 MB 142.03 MB 17.90 MB
1.3.3 21.64 MB 90.99 MB 141.94 MB 17.82 MB
1.3.4 17.66 MB 84.39 MB 141.10 MB 16.89 MB
1.3.5 17.59 MB 84.33 MB 141.07 MB 16.03 MB
1.3.6 21.32 MB 76.87 MB 159.64 MB 16.94 MB
1.3.7 21.40 MB 76.87 MB 159.69 MB 17.33 MB
1.3.8 21.45 MB 77.07 MB 159.75 MB 16.99 MB
1.3.9 21.33 MB 77.12 MB 159.77 MB 17.41 MB
1.4.0 21.36 MB 77.05 MB 159.82 MB 17.42 MB
1.4.1 21.48 MB 77.02 MB 159.78 MB 17.39 MB
1.4.2 21.50 MB 77.07 MB 159.80 MB 17.39 MB
1.4.3 20.58 MB 71.53 MB 162.32 MB 18.67 MB
1.4.4 20.75 MB 71.81 MB 162.63 MB 19.52 MB

This table should be read in tiers, not as a single-column race:

  • 1.1.8 ~ 1.2.7 mostly stay in the same structural-memory band as 1.1.6 / 1.1.7
  • 1.3.4 / 1.3.5 are the prettiest small-document and 20k-nested heapUsed values in this run
  • 1.3.6+ drops 1 MB from about 85 MB to about 77 MB (~9% lower), but raises 2 MB from about 141 MB to about 160 MB (~13% higher). That strongly suggests an allocation-shape change: medium documents keep a tighter live-object window, while larger documents retain more structural objects / boundary metadata at once
  • 1.4.3 improves the small and medium tiers: 200 KB drops from 21.50 MB to 20.58 MB, and 1 MB drops from 77.07 MB to 71.53 MB. But large-document and deep-nesting go the other way: 2 MB rises from 159.80 MB to 162.32 MB (+1.6%), and 20k nested rises from 17.39 MB to 18.67 MB (+7.4%). So 1.4.3's memory profile is small/medium improved, large-doc and deep-nesting worse โ€” not a uniform win
  • 1.4.4 stays very close to 1.4.3 on all four rows, but the direction is still "small / medium / large documents slightly higher, and 20k nested still high". So the better read is "same tier, still sitting in the heavier retained-shape band introduced by 1.4.3"
  • so if the priority is small / medium document footprint, 1.3.4 / 1.3.5 stand out; if the priority is 2 MB-scale stability, 1.2.x is steadier
Version Public parseStructural memory shape Read
1.1.1 Highest overhead among the post-two-phase releases. Public structural parsing still built the indexed tree first, then converted to a public tree through a separate strip phase This part of the line still had the highest public-tree peak
1.1.2 Same public-tree allocation shape as 1.1.1, but stack-safe and fully iterative The change here is stack safety, not peak-memory reduction
1.1.3 Lower than 1.1.1 / 1.1.2 after stripMeta stopped building a whole-tree Map<IndexedStructuralNode, StructuralNode> Public-tree memory starts to tighten here
1.1.4 Same structural memory class as 1.1.3 Same tier as 1.1.3
1.1.5 First release in this line to remove the public API's "second tree": after scanning, the public path no longer duplicates the indexed tree into a public tree The main memory step-down lands here
1.1.6 Same single-public-tree architecture as 1.1.5, but with a tighter scan-phase allocation strategy: text buffering now accumulates by ranges/segments instead of repeated string concatenation, and raw / block child frames keep sharing source ranges instead of eagerly slicing substrings 200 KB is lower; 2 MB and nested are not lower than 1.1.5
1.1.7 Same architecture as 1.1.6; optimizations focus on the render layer and scan-phase constants: render-layer trimBlockBoundaryTokens no longer deep-clones the entire children array; flushBuffer uses direct string concatenation for the common 1โ€“2 segment-pair case, avoiding temporary array allocation 200 KB is lower; 2 MB and nested are not lower than 1.1.5

Position tracking overhead

Note: The version used for this data was not recorded. Treat these numbers as order-of-magnitude reference only. Future reruns should tag the version and align with the extended full-parse table.

Measured on ~200 KB input (204,840 bytes), 20 samples per case.

API Without tracking With tracking Overhead
parseRichText ~22.45 ms ~34.07 ms ~51.8%
parseStructural ~14.88 ms ~18.49 ms ~24.3%

This run says three practical things:

  • trackPositions is still well within normal editor-budget territory, but it is not free
  • the visible overhead is much higher on parseRichText than on parseStructural
  • if a pipeline needs tighter budgeting, parseRichText + trackPositions is the first place to inspect

Substring parse: baseOffset / tracker

Note: The version used for this data was not recorded. Treat these numbers as order-of-magnitude reference only.

This run uses a fixed 53-character slice, matching the substring scenario from Source Position Tracking. 800 samples per case.

API baseOffset tracker Read
parseRichText slice ~23.78 ยตs ~20.62 ยตs same performance class
parseStructural slice ~14.26 ยตs ~13.47 ยตs same performance class

The measured read is straightforward:

  • both baseOffset and tracker stay in the tens-of-microseconds range
  • tracker does not turn substring parsing into a millisecond-scale path
  • parseStructural remains lighter than parseRichText on substring work

Incremental Parsing

For incremental structural caching (avoid full parseStructural scans across edits), see Incremental Parsing.

The old micro-benchmark ("edit one 36-character tag in the middle of a ~200 KB document") is no longer the right lead-in for this page. The main incremental section now uses a shape much closer to real editor work:

  • Baseline document: ~1 MB
  • Single edit size: about 5%
  • Edit classes: inline / deep-inline / raw / block
  • Methodology: one case per process, --expose-gc, forced GC before and after each measured round, 20 measured rounds per case

Incremental grouping overview

Group Versions How to read it
First public incremental surface 1.2.0 Still low-level parseIncremental/updateIncremental; usable, but clearly slower than the later session-first line
Bare incremental speed advantage 1.2.4 / 1.2.5 / 1.2.6 / 1.2.7 / 1.3.0 / 1.4.3 / 1.4.4 1.2.4 ~ 1.3.0 is the most stable low-20-ms band; 1.4.3 / 1.4.4 are the late-line recovery window that pulls bare applyEdit back down sharply again
Deep-inline speed advantage 1.2.6 / 1.3.0 / 1.3.4 / 1.3.5 1.2.6 / 1.3.0 are the lowest points; 1.3.4 / 1.3.5 represent the post-shorthand recovery window
Raw / block speed advantage 1.3.6 / 1.3.7 / 1.4.0 / 1.4.1 / 1.4.3 After context-aware escaping and more stable boundary handling, raw / block become the easiest late-line bare-update paths to keep fast; 1.4.3 pushes both even lower again
Diff-cost advantage 1.4.0 / 1.4.1 These are still the steadiest structured-diff releases: much cheaper than 1.3.8 / 1.3.9, and more balanced overall than 1.4.2 / 1.4.3
Retained-heap advantage 1.2.1 ~ 1.3.7, especially 1.2.4 ~ 1.3.7 Build-time retained heap stays roughly in the 29.5 ~ 30.7 MB band; inline-edit retained heap is especially good in the 1.2.4 ~ 1.3.7 window
Best integration picks 1.4.0 / 1.4.1 If you care about stable session semantics, consumable diff output, and not pushing retained heap or diff cost further to the worse side, these are still the easiest versions to recommend

Incremental benchmark table (1 MB baseline, ~5% edits)

Version Engine inline deep-inline raw block Notes
1.2.0 low-level ~39.60 ms ~38.87 ms ~33.40 ms ~37.57 ms parseIncremental/updateIncremental
1.2.1 session ~41.16 ms ~40.32 ms ~36.65 ms ~40.95 ms first createIncrementalSession(...) release
1.2.2 session ~41.69 ms ~41.66 ms ~49.66 ms ~39.48 ms still early-session behavior
1.2.3 session ~46.14 ms ~47.97 ms ~44.50 ms ~48.09 ms pre-zone-splitting speed band
1.2.4 session ~22.88 ms ~26.54 ms ~22.87 ms ~28.83 ms first obvious speed step after softZoneNodeCap / internal zone splitting
1.2.5 session ~22.58 ms ~26.56 ms ~22.61 ms ~29.54 ms same class as 1.2.4
1.2.6 session ~21.70 ms ~20.88 ms ~22.86 ms ~28.31 ms one of the nicest deep-inline results; matches the stack-safe signature-path fixes
1.2.7 session ~22.31 ms ~26.67 ms ~22.88 ms ~29.32 ms still in the low-20-ms band
1.3.0 session ~22.25 ms ~20.61 ms ~22.76 ms ~28.88 ms shorthand lands without blowing up the main incremental band
1.3.1 session ~22.81 ms ~37.11 ms ~22.14 ms ~29.04 ms deep-inline gets noticeably heavier after shorthand boundary / depth-limit fixes
1.3.2 session ~22.36 ms ~36.79 ms ~22.09 ms ~28.91 ms deep-inline still high
1.3.3 session ~21.91 ms ~38.00 ms ~22.20 ms ~29.64 ms deep-inline still high
1.3.4 session ~26.85 ms ~28.19 ms ~21.83 ms ~28.08 ms inline gets heavier, but deep-inline drops back down
1.3.5 session ~27.08 ms ~28.17 ms ~21.75 ms ~28.93 ms same class as 1.3.4
1.3.6 session ~29.71 ms ~28.77 ms ~15.86 ms ~32.25 ms raw path drops sharply; aligns with context-aware escaping
1.3.7 session ~29.80 ms ~28.99 ms ~21.69 ms ~32.28 ms same general band after the endTag prefix-consumption fix
1.3.8 session ~26.37 ms ~23.10 ms ~15.90 ms ~32.55 ms deep-inline drops again after structured diff work
1.3.9 session ~26.65 ms ~22.70 ms ~15.90 ms ~26.39 ms steady low band after isNoop and conservative-diff contract clarification
1.4.0 session ~26.48 ms ~22.27 ms ~16.01 ms ~26.14 ms speed holds after the stable-integration surface is declared
1.4.1 session ~27.13 ms ~22.47 ms ~16.00 ms ~25.62 ms corrupted deep-inline edits no longer hang; normal edits stay in band
1.4.2 session ~29.61 ms ~22.63 ms ~21.57 ms ~25.81 ms latest measured version; semantics and type cleanup matter more than chasing a new low
1.4.3 session ~17.96 ms ~29.81 ms ~15.76 ms ~17.73 ms current release candidate; inline / raw / block drop sharply again, but deep-inline is still above the 1.4.0 ~ 1.4.2 band
1.4.4 session ~17.72 ms ~29.59 ms ~15.90 ms ~17.98 ms broadly keeps the 1.4.3 bare-incremental tier; inline is slightly better, deep-inline is a touch lower, but raw / block are slightly higher

How to read the incremental table

  • 1.2.0 ~ 1.2.3 is the first public incremental surface: usable, but still a full band slower than the later line
  • 1.2.4 is the first real performance inflection point on this page: the changelog's pure-inline zone splitting immediately shows up as low-20-ms 1 MB / 5% edits
  • 1.2.6 / 1.3.0 are the easiest all-around recommendations in this run: inline, deep-inline, and raw are all steady
  • 1.3.1 ~ 1.3.3 clearly raise deep-inline cost; that lines up with the shorthand-boundary, parent-close ownership, and ambiguity-guard fixes, where correctness is taking priority
  • 1.3.6+ makes raw edits noticeably faster, in the same direction as the changelog's context-aware escaping and raw / block boundary stabilization
  • 1.4.2 raises raw applyEdit itself from ~16.00 ms to ~21.57 ms (~35% higher), so that cleanup window was not free
  • 1.4.3 is mostly a success on bare applyEdit: inline returns to ~17.96 ms, raw to ~15.76 ms, and block to ~17.73 ms; but deep-inline still sits at ~29.81 ms, clearly above the low-20-ms 1.4.0 ~ 1.4.2 band
  • 1.4.4 stays in the same bare-incremental tier as 1.4.3: inline is a little better and deep-inline only edges down slightly, but raw / block are slightly higher. So this is better read as "same band" rather than a fresh across-the-board drop

Shared methodology for this incremental extension

  • Baseline document: about 1 MB
  • Each scenario keeps the single edit at about 5%
  • inline: edit normal inline-tag content
  • deep-inline: edit deeply nested inline content
  • raw: edit raw-tag body content
  • block: edit block-tag body content
  • 1.2.0 uses low-level parseIncremental/updateIncremental
  • 1.2.1+ uses createIncrementalSession(...).applyEdit(...)
  • Every measured version stayed at 20/20 expected-mode hits in this benchmark, so there was no extra fallback note to preserve in the table

Incremental diff cost (1.3.8+)

From 1.3.8 onward, the session API also has the applyEditWithDiff(...) path. For editor integrations, this is usually closer to the real end-to-end cost than bare applyEdit(...).

Version inline applyEdit inline applyEditWithDiff Ratio deep-inline applyEdit deep-inline applyEditWithDiff Ratio raw applyEdit raw applyEditWithDiff Ratio block applyEdit block applyEditWithDiff Ratio
1.3.8 ~26.37 ms ~35.17 ms 1.33x ~23.10 ms ~193.7 ms 8.39x ~15.90 ms ~87.61 ms 5.51x ~32.55 ms ~101.0 ms 3.10x
1.3.9 ~26.65 ms ~35.88 ms 1.35x ~22.70 ms ~195.2 ms 8.60x ~15.90 ms ~92.27 ms 5.80x ~26.39 ms ~101.5 ms 3.85x
1.4.0 ~26.48 ms ~39.75 ms 1.50x ~22.27 ms ~63.63 ms 2.86x ~16.01 ms ~52.93 ms 3.31x ~26.14 ms ~66.18 ms 2.53x
1.4.1 ~27.13 ms ~41.12 ms 1.52x ~22.47 ms ~61.68 ms 2.75x ~16.00 ms ~51.30 ms 3.21x ~25.62 ms ~62.94 ms 2.46x
1.4.2 ~29.61 ms ~41.44 ms 1.40x ~22.63 ms ~79.89 ms 3.53x ~21.57 ms ~51.52 ms 2.39x ~25.81 ms ~64.54 ms 2.50x
1.4.3 ~17.96 ms ~38.40 ms 2.14x ~29.81 ms ~85.96 ms 2.88x ~15.76 ms ~76.20 ms 4.84x ~17.73 ms ~79.69 ms 4.50x
1.4.4 ~17.72 ms ~40.55 ms 2.29x ~29.59 ms ~87.81 ms 2.97x ~15.90 ms ~77.36 ms 4.86x ~17.98 ms ~82.30 ms 4.58x

The key read here is not just "diff is slower" โ€” that is expected โ€” but that 1.4.0 pulls diff back into a much more integration-friendly band:

  • 1.3.8 / 1.3.9: diff is heavy on deep-inline, raw, and block, with deep-inline close to 8.4x ~ 8.6x
  • 1.4.0+: deep-inline is still the most expensive case, but it drops from nearly 200 ms into the 60 ~ 80 ms band; raw / block also fall from 80 ~ 100 ms into about 50 ~ 66 ms
  • inline stays comparatively stable the whole time, which strongly suggests the issue is not "diff is always explosive", but "some structural cases refine far too deeply"
  • 1.4.3 also improves inline diff (41.44 โ†’ 38.40 ms), but deep-inline / raw / block are still heavy: raw sits at ~76.20 ms, block at ~79.69 ms, and deep-inline at ~85.96 ms. So this current workspace is much more of a full-parse + bare-incremental optimization than a broad diff optimization
  • 1.4.4 does not keep the previous "a little lower again" story in this rerun: inline diff is back at ~40.55 ms, and raw / block are also slightly higher than 1.4.3; deep-inline remains in the same expensive band too, so this is still not a broad diff-cost reset

One more qualifier matters here: the current raw benchmark is effectively a high-repetition worst-case. The script repeats many same-shape $$code(...)%...%end$$ units and then replaces a middle slice, which is deliberately unfriendly to unique diff anchors. A separate quick bench with unique raw units (about 1 MB source, 5% edit) lands closer to ~45 ms for raw applyEditWithDiff, with a dirty span much closer to the edited window itself instead of nearly the whole document. So the ~76 ms row here is better read as a pressure-case upper bound for repetitive input, not as the typical cost of everyday non-repetitive raw edits.

That lines up with the changelog:

  • 1.3.8 upgrades applyEditWithDiff(...) from a basic range diff to structured ops / patches, increasing expressive power and cost at the same time
  • 1.3.9 adds isNoop and clarifies conservative-diff semantics, but does not fundamentally reduce heavy-case cost
  • 1.4.0 adds global refinement budgets (comparisons / anchors / ops / subtree size / wall-clock), so when refinement is not worth it the code drops to coarse splice or conservative whole-tree diff much earlier

Diff result-shape summary

This helps explain why 1.4.0+ gets much faster.

Version inline deep-inline raw block
1.3.8 incremental=20; ops~1064, patches~1064 incremental=20; ops~98, patches~98 incremental=20; ops~625, patches~625 incremental=20; ops~610, patches~610
1.3.9 incremental=20; ops~1064, patches~1064 incremental=20; ops~98, patches~98 incremental=20; ops~625, patches~625 incremental=20; ops~610, patches~610
1.4.0 incremental=20; ops~1, patches~1 incremental=20; ops~98, patches~98 incremental=20; ops~1, patches~1 incremental=20; ops~1, patches~1
1.4.1 incremental=20; ops~1, patches~1 incremental=20; ops~98, patches~98 incremental=20; ops~1, patches~1 incremental=20; ops~1, patches~1
1.4.2 incremental=20; ops~1, patches~1 incremental=20; ops~98, patches~98 incremental=20; ops~1, patches~1 incremental=20; ops~1, patches~1
1.4.3 incremental=20; ops~1, patches~1 incremental=20; ops~98, patches~98 incremental=20; ops~1, patches~1 incremental=20; ops~1, patches~1
1.4.4 incremental=20; ops~1, patches~1 incremental=20; ops~98, patches~98 incremental=20; ops~1, patches~1 incremental=20; ops~1, patches~1

This strongly suggests:

  • 1.3.8 / 1.3.9 still emit large, fine-grained diffs for many scenarios
  • 1.4.0+ is much more willing to collapse inline / raw / block into very small diffs when the refinement budget says further work is not worth it
  • deep-inline still keeps many ops, which is why it remains the most expensive diff case in 1.4.0 / 1.4.1 / 1.4.2 / 1.4.3 / 1.4.4
  • 1.4.3 shows again that a very small diff shape does not automatically mean low diff cost: inline gets the win this time, but raw / block / deep-inline still stay expensive

So the version recommendation changes a bit depending on what your integration actually consumes:

  • If you only care about incremental update cost itself: 1.2.4 ~ 1.3.0 still looks great, and 1.4.3 / 1.4.4 are now the stronger bare-incremental options inside 1.4.0 / 1.4.1 / 1.4.2 / 1.4.3 / 1.4.4
  • If you care about incremental update + structured diff cost: 1.4.0 / 1.4.1 are still the steadiest picks inside 1.4.0 / 1.4.1 / 1.4.2 / 1.4.3 / 1.4.4; after this rerun, 1.4.4 is not prettier than 1.4.3 here: inline / raw / block diff are all a bit higher, and deep-inline is still expensive

Incremental snapshot memory (1.2.0+)

This benchmark answers another practical question: once the incremental snapshot is actually built and tree / zones are materialized, what does the retained heap look like?

This is not a transient "right after parse" peak. The measurement flow is:

  • build the incremental snapshot
  • deliberately touch doc.tree / doc.zones so lazy getters do not under-report memory
  • force GC
  • then read the still-retained heapUsed

That makes it much closer to the memory shape of a real editor holding an incremental document in memory.

Version inline retained heap after build deep-inline retained heap after build raw retained heap after build block retained heap after build
1.2.0 28.74 MB 29.68 MB 28.57 MB 28.80 MB
1.2.1 29.76 MB 30.69 MB 29.58 MB 29.80 MB
1.2.2 29.78 MB 30.70 MB 29.60 MB 29.81 MB
1.2.3 30.39 MB 31.31 MB 30.20 MB 30.43 MB
1.2.4 29.70 MB 30.64 MB 29.53 MB 29.76 MB
1.2.5 29.70 MB 30.64 MB 29.53 MB 29.76 MB
1.2.6 29.72 MB 30.65 MB 29.59 MB 29.77 MB
1.2.7 29.72 MB 30.66 MB 29.59 MB 29.78 MB
1.3.0 29.68 MB 30.62 MB 29.51 MB 29.73 MB
1.3.1 29.68 MB 30.62 MB 29.51 MB 29.74 MB
1.3.2 29.69 MB 30.62 MB 29.51 MB 29.74 MB
1.3.3 29.69 MB 30.62 MB 29.51 MB 29.74 MB
1.3.4 29.72 MB 30.66 MB 29.55 MB 29.78 MB
1.3.5 29.73 MB 30.66 MB 29.55 MB 29.78 MB
1.3.6 29.75 MB 30.68 MB 29.57 MB 29.79 MB
1.3.7 29.74 MB 30.68 MB 29.57 MB 29.80 MB
1.3.8 31.13 MB 32.05 MB 30.95 MB 31.18 MB
1.3.9 31.13 MB 32.06 MB 30.95 MB 31.18 MB
1.4.0 31.18 MB 32.10 MB 31.00 MB 31.23 MB
1.4.1 31.18 MB 32.10 MB 31.00 MB 31.23 MB
1.4.2 31.18 MB 32.10 MB 31.00 MB 31.23 MB
1.4.3 31.19 MB 32.11 MB 31.01 MB 31.24 MB
1.4.4 31.19 MB 32.12 MB 31.01 MB 31.24 MB

There are two clear tiers here:

  • 1.2.1 ~ 1.3.7: retained heap after build is basically stable in the 29.5 ~ 30.7 MB band
  • 1.3.8+: the whole line shifts upward by about 1.3 ~ 1.5 MB
  • 1.4.3 / 1.4.4: these constant-factor iterations barely change incremental retained heap at all; they stay in the same heavier retained-heap tier as 1.4.0 / 1.4.1 / 1.4.2 / 1.4.3 / 1.4.4

That matches the diff story above: from 1.3.8 onward, the line is not just paying more in diff time; the incremental snapshot itself also gets heavier in order to support richer structured diff behavior.

Retained heap after one edit

Now the same question after actually applying one ~5% edit and keeping the updated document alive.

Version inline retained heap after edit deep-inline retained heap after edit raw retained heap after edit block retained heap after edit
1.2.0 38.60 MB 40.39 MB 38.28 MB 38.71 MB
1.2.1 39.61 MB 41.90 MB 39.78 MB 40.22 MB
1.2.2 40.16 MB 41.94 MB 39.83 MB 40.26 MB
1.2.3 40.76 MB 43.00 MB 40.44 MB 40.87 MB
1.2.4 33.04 MB 44.00 MB 41.98 MB 42.35 MB
1.2.5 33.05 MB 44.00 MB 41.92 MB 42.36 MB
1.2.6 33.04 MB 44.00 MB 41.92 MB 42.36 MB
1.2.7 33.05 MB 44.01 MB 41.93 MB 42.37 MB
1.3.0 33.01 MB 43.97 MB 41.88 MB 42.32 MB
1.3.1 33.01 MB 43.97 MB 41.90 MB 42.32 MB
1.3.2 33.01 MB 43.97 MB 41.88 MB 42.33 MB
1.3.3 33.01 MB 43.97 MB 41.90 MB 42.33 MB
1.3.4 33.00 MB 44.03 MB 41.93 MB 42.37 MB
1.3.5 33.01 MB 44.03 MB 41.94 MB 42.37 MB
1.3.6 33.02 MB 44.05 MB 41.96 MB 42.40 MB
1.3.7 33.02 MB 44.05 MB 41.96 MB 42.41 MB
1.3.8 34.41 MB 45.42 MB 43.34 MB 43.79 MB
1.3.9 34.41 MB 45.43 MB 43.34 MB 43.78 MB
1.4.0 34.46 MB 45.47 MB 43.39 MB 43.83 MB
1.4.1 34.46 MB 45.48 MB 43.39 MB 43.83 MB
1.4.2 34.46 MB 45.48 MB 43.41 MB 43.85 MB
1.4.3 34.48 MB 45.49 MB 43.42 MB 43.87 MB
1.4.4 34.48 MB 45.49 MB 43.42 MB 43.87 MB

This table is even more revealing:

  • inline edits drop sharply at 1.2.4 and stay around 33 MB through 1.3.7
  • deep-inline / raw / block do not get the same drop, which strongly suggests the pure-inline zone-splitting work mostly improved the retained shape of inline-style edits
  • 1.3.8+ lifts all four scenarios together by roughly 1.3 ~ 1.5 MB

So if you read incremental speed + structured diff + retained memory together, the line becomes clearer:

  • 1.2.4 ~ 1.3.7: fast bare incremental updates, and very nice retained heap for inline edits
  • 1.3.8 / 1.3.9: richer diff output, but both latency and retained heap step upward
  • 1.4.0+: diff latency comes back down a lot, but retained heap stays in the heavier 1.3.8+ memory tier
  • 1.4.3 / 1.4.4: these late iterations are almost entirely time-constant optimizations, not retained-heap optimizations

If the only question is "is holding an incremental snapshot in memory prohibitively expensive?", the answer is still fairly calm: on a 1 MB baseline, most versions sit around 30 MB after build, and around 33 ~ 45 MB after one retained edit, depending on scenario.


Pathological Deep-Nesting Stress

Input shape: a single-chain inline nest, $$bold($$bold(...x...)$$)$$. Each table row below uses the same shape with the listed layer count (5,000 / 20,000 / 200,000 / 1,000,000 / 10,000,000 / 20,000,000 / 30,000,000 / 40,000,000 / 50,000,000).

Measured on:

  • historical columns: Kunpeng 920 aarch64 / Node v24.14.0
  • 1.4.3 current reruns: Kunpeng 920 aarch64 / Node v24.15.0

depthLimit is set to each layer count + 100 (no degradation). Large-scale high-layer runs in this section use an expanded heap budget; exact memory notes and run conditions are listed below the table.

API 1.1.0 1.1.1 1.1.2-1.1.4 1.1.5 1.1.6 1.4.3 current 1.4.4 current
parseStructural(5000) Stack overflow ~7119 ms ~19 ms ~33.73 ms ~36.30 ms ~20.73 ms Same class, not re-measured separately
parseRichText(5000) ~9731 ms ~17216 ms ~23 ms ~41.12 ms ~34.58 ms ~29.05 ms ~35.50 ms (mean of 3 reruns)
parseStructural(20000) โ€” โ€” ~61 ms ~44.79 ms ~37.89 ms ~50.95 ms Same class, not re-measured separately
parseRichText(20000) โ€” โ€” ~61 ms ~134.35 ms ~53.58 ms ~87.53 ms ~125.85 ms (mean of 3 reruns)
parseStructural(200000) โ€” โ€” ~667 ms ~361.20 ms ~421.45 ms ~457.05 ms Same class, not re-measured separately
parseRichText(200000) โ€” โ€” ~844 ms ~1096.81 ms ~1209.64 ms ~1.15 s ~1.13 s (mean of 3 reruns)
parseStructural(1000000) โ€” โ€” ~2.8 s ~2.39 s ~2.50 s ~2.39 s Same class, not re-measured separately
parseRichText(1000000) โ€” โ€” ~6.4 s ~6.20 s ~6.00 s ~5.46 s ~5.49 s (mean of 3 reruns)
parseStructural(10000000) โ€” โ€” ~28.6 s ~17.3 s (separately re-measured) ~18.7 s (separately re-measured) ~21.15 s (separately re-measured) Same class, not re-measured separately
parseRichText(10000000) โ€” โ€” ~50 s Same class, not re-measured separately Same class, not re-measured separately ~64.65 s (mean of 3 reruns) ~65.50 s (mean of 3 reruns)
parseStructural(20000000) โ€” โ€” ~89.9 s (--max-old-space-size=12288) Same class, not re-measured separately Same class, not re-measured separately Same class, not re-measured separately Same class, not re-measured separately
parseStructural(30000000) โ€” โ€” ~163.1 s (--max-old-space-size=18432) Same class, not re-measured separately Same class, not re-measured separately Same class, not re-measured separately Same class, not re-measured separately
parseStructural(40000000) โ€” โ€” ~210.0 s (--max-old-space-size=24576) Same class, not re-measured separately Same class, not re-measured separately Same class, not re-measured separately Same class, not re-measured separately
parseStructural(50000000) โ€” โ€” ~224.1 s (--max-old-space-size=32768) Same class, not re-measured separately Same class, not re-measured separately Same class, not re-measured separately Same class, not re-measured separately

Note: 1.1.5 / 1.1.6 have been separately re-measured through 1000000, and parseStructural(10000000) has also been measured separately. For 1.4.3, this update separately re-measured every row at <= 10000000. For 1.4.4, this follow-up separately reran only parseRichText(5000 / 20000 / 200000 / 1000000 / 10000000); the remaining 1.4.4 rows are still best read as the same class without page-specific reruns. Only the 20000000+ rows remain in the same fully iterative 1.1.2+ class without page-specific reruns.

The 1.4.3 reread is best split into two bands:

  • 5000 ~ 1000000: the current workspace still sits in the "deep chains are safe and the small-to-mid constant is still decent" zone. parseStructural(5000) is already close to the 1.1.2 ~ 1.1.4 band again, and parseRichText(1000000) lands around ~5.46 s
  • 10000000: 1.4.3 / 1.4.4 still finish under the fully iterative path, but the constant is no longer among the best historical points on this line. parseStructural(10000000) is about ~21.15 s, somewhat slower than the dedicated 1.1.5 / 1.1.6 reruns; 1.4.3 parseRichText(10000000) landed at 64.12 s / 64.87 s / 64.96 s (mean ~64.65 s), while 1.4.4 landed at 68.00 s / 62.24 s / 66.25 s (mean ~65.50 s). So this is not a one-off spike but a stable heavier path under the current semantics.

Note: the public parseStructural(20000000) path now completes after stripMeta was rewritten to build the public tree directly without the intermediate Map<IndexedStructuralNode, StructuralNode>. The 20 M / 30 M / 40 M / 50 M runs above used NODE_OPTIONS=--max-old-space-size=12288 / 18432 / 24576 / 32768 respectively, and finished at roughly:

  • 1.4.3 parseStructural(10 M): this rerun used --max-old-space-size=12288, ending around 5.59 GB heap / 5.83 GB RSS
  • 1.4.3 parseRichText(10 M): this rerun used --max-old-space-size=13312, ending around 7.65 GB heap / 9.86 GB RSS; the three reruns stayed in essentially the same RSS / heap band
  • 1.4.4 parseRichText(10 M): this rerun also used --max-old-space-size=13312, ending around 7.85 GB heap / 9.87 GB RSS on the 3-run mean
  • 20 M: 10.7 GB heap / 11.9 GB RSS
  • 30 M: 16.1 GB heap / 17.0 GB RSS
  • 40 M: 21.4 GB heap / 22.7 GB RSS
  • 50 M: 26.8 GB heap / 28.0 GB RSS

Under Node's default heap limit, high-layer public-tree runs may still OOM depending on machine limits.

Stack overflow thresholds per version (same environment, binary search, ยฑ50 layers):

API 1.1.0 1.1.1 1.1.2+
parseStructural ~3172 layers ~1611 layers No limit (fully iterative)
parseRichText ~3269 layers ~3221 layers No limit (fully iterative)

1.1.2 eliminated three independent deep-nesting bottlenecks:

  1. Stack overflow โ€” parseNodes, renderNodes, stripMeta, extractText, and materializeTextTokens converted from recursion to explicit stack iteration. Nesting depth is now bounded only by heap memory
  2. materializeTextTokens O(nยฒ) re-traversal (render layer) โ€” each handler invocation recursed into the full subtree. A materializedArrays WeakSet now marks processed subtrees so subsequent calls skip them โ€” O(n)
  3. findInlineClose / findTagArgClose O(nยฒ) forward scan (structural layer) โ€” each inline nesting level scanned forward for the matching close. Now uses lazy close: inline child frames continue scanning on the parent's text and complete when they encounter )$$; gating-confirmed inline-only tags skip getTagCloserType entirely โ€” O(n)

5000-layer parseRichText: 1.1.1 ~17 s โ†’ 1.1.2 ~23 ms (~740x faster).

From 1.1.2 onward, this version line has effectively crossed a generation boundary on stack safety:

  • 1.1.0 / 1.1.1 still had to worry about call-stack limits
  • 1.1.2+ mostly turns the problem into a heap-budget question instead

So this section should be read as a long-term note about stack safety and worst-case-path complexity, not as a day-to-day patch-by-patch benchmark that needs constant refreshing.


Lightweight Utility Ops

Measured on ~200 KB document (200,067 chars, 6,321 structural nodes, 9,165 rendered tokens). Data represents the average of 5 independent process runs.

Operation Time Notes
printStructural ~2.00 ms Lossless round-trip
buildZones ~0.74 ms 1,897 zones
walkTokens ~0.50 ms 9,165 visits
mapTokens identity ~1.21 ms Wrapper-only cost
mapTokens transform ~1.55 ms Rename every bold to strong

These utility operations are effectively free compared with the parser itself in editor pipelines.