en Performance - chiba233/yumeDSL GitHub Wiki
Performance
Home | API Reference | Source Position Tracking
This page collects the parser performance notes that used to be scattered across multiple wiki pages.
Daily Document Benchmark
This section is now organized as one continuous story instead of a separate "historical" block and a separate "follow-up" block. The tables are still split by what they answer, but they should be read together.
First, the measurement framing:
- The 200 KB headline benchmark uses a mixed document shaped like a normal blog / CMS / editor document
- The memory profile still uses dense inline and 20k nested inline cases
- The expanded run methodology is one case per process,
--expose-gc, forced GC before and after each measured round, and at most 20 measured rounds per case
Environment:
- Existing tables: Kunpeng 920 aarch64 / Node v24.14.0
- Expanded run: Kunpeng 920 aarch64 / Node v24.15.0
Node minor version changed between runs, so cross-table reads should focus on band and trend, not on tiny 1โ2 ms differences.
Version grouping overview (1.1.8+)
This is not a replacement for the raw tables below. It is just the reading guide.
| Group | Versions | How to read it |
|---|---|---|
| Speed-advantage versions | 1.2.1 / 1.2.2 / 1.2.4 / 1.2.6 / 1.2.7 / 1.4.3 / 1.4.4 |
This group captures the stronger full-parse speed line: 1.2.6 / 1.2.7 remain the best broad-speed representatives, while 1.4.3 / 1.4.4 are the rare late-line versions that pull the structural hot path back down sharply |
| Memory-advantage versions | 1.1.9 / 1.2.4 / 1.2.6 / 1.3.4 / 1.3.5 |
Lower heapUsed after parseStructural; 1.3.4 / 1.3.5 are especially strong at 200 KB and 20k nested |
| Balanced versions | 1.1.9 / 1.2.1 / 1.2.2 / 1.2.4 / 1.2.6 |
No obvious weakness in either speed or memory |
| Function-first phase | 1.3.0 ~ 1.4.2, especially 1.3.6+ / 1.4.0+ |
Much of this line is paying for shorthand semantics, context-aware escaping, incremental diff semantics, and stable integration; as a group it is still not a memory-first line. 1.4.3 is the first version after this phase to shift focus back to constant-factor optimization |
Full-document parse (~200 KB, version-line baseline)
This round was re-measured version-by-version, process-by-process: 1.1.0 / 1.1.1 / 1.1.2 /
1.1.3 / 1.1.4 / 1.1.5 / 1.1.6 / 1.1.7 each ran in its own Node process, so heap and JIT
state from one version did not pollute the next. Test input was 204,803 bytes.
| API | 1.1.0 | 1.1.1 | 1.1.2 | 1.1.3 | 1.1.4 | 1.1.5 | 1.1.6 | 1.1.7 |
|---|---|---|---|---|---|---|---|---|
parseRichText |
~4514.2 ms | ~40.5 ms | ~37.4 ms | ~38.1 ms | ~34.0 ms | ~42.3 ms | ~29.9 ms | ~30.6 ms |
parseStructural |
~36.1 ms | ~35.0 ms | ~33.4 ms | ~29.7 ms | ~26.2 ms | ~34.7 ms | ~29.0 ms | ~23.3 ms |
This is the headline benchmark for normal blog / CMS / editor documents.
Three immediate reads:
1.1.0has an obviously abnormalparseRichTextpath and should no longer be used as the performance reference for the current line- within
1.1.1to1.1.5, the fastestparseRichTextbuild is1.1.3 parseStructuralkeeps tightening across the line; in the current data both1.1.6and1.1.7are now in the ~20 ms class, and the added1.1.7re-run measured~20 ms
Extended full-parse table (1.1.8 ~ 1.4.4)
This extension keeps the mixed-document headline benchmark and widens the size ladder to 200 KB /
1 MB / 2 MB, so day-to-day document speed and medium / large document scaling can be read from
the same table.
| Version | parseRichText 200 KB |
parseStructural 200 KB |
parseRichText 1 MB |
parseStructural 1 MB |
parseRichText 2 MB |
parseStructural 2 MB |
|---|---|---|---|---|---|---|
1.1.8 |
~36.83 ms | ~25.86 ms | ~141.5 ms | ~96.28 ms | ~244.9 ms | ~163.9 ms |
1.1.9 |
~34.59 ms | ~27.96 ms | ~143.2 ms | ~97.97 ms | ~241.7 ms | ~161.0 ms |
1.2.0 |
~34.22 ms | ~25.79 ms | ~142.0 ms | ~100.8 ms | ~257.2 ms | ~162.9 ms |
1.2.1 |
~32.75 ms | ~25.53 ms | ~142.4 ms | ~97.18 ms | ~251.4 ms | ~163.7 ms |
1.2.2 |
~32.74 ms | ~26.07 ms | ~139.3 ms | ~97.68 ms | ~256.3 ms | ~161.0 ms |
1.2.3 |
~33.24 ms | ~27.50 ms | ~141.5 ms | ~97.45 ms | ~253.7 ms | ~163.8 ms |
1.2.4 |
~31.70 ms | ~26.56 ms | ~141.6 ms | ~97.24 ms | ~254.5 ms | ~164.1 ms |
1.2.5 |
~33.40 ms | ~25.83 ms | ~139.9 ms | ~98.54 ms | ~255.4 ms | ~163.2 ms |
1.2.6 |
~32.22 ms | ~27.29 ms | ~140.7 ms | ~96.57 ms | ~251.6 ms | ~159.5 ms |
1.2.7 |
~32.58 ms | ~25.39 ms | ~142.1 ms | ~100.2 ms | ~243.5 ms | ~162.1 ms |
1.3.0 |
~34.46 ms | ~26.06 ms | ~140.0 ms | ~97.68 ms | ~248.9 ms | ~163.3 ms |
1.3.1 |
~34.28 ms | ~27.01 ms | ~144.8 ms | ~100.1 ms | ~253.2 ms | ~163.6 ms |
1.3.2 |
~34.82 ms | ~27.10 ms | ~145.3 ms | ~97.08 ms | ~245.1 ms | ~162.3 ms |
1.3.3 |
~34.19 ms | ~27.27 ms | ~142.6 ms | ~96.29 ms | ~253.6 ms | ~162.1 ms |
1.3.4 |
~45.70 ms | ~37.64 ms | ~153.8 ms | ~103.7 ms | ~254.3 ms | ~178.2 ms |
1.3.5 |
~44.33 ms | ~37.47 ms | ~153.6 ms | ~100.6 ms | ~252.6 ms | ~183.7 ms |
1.3.6 |
~51.10 ms | ~40.27 ms | ~151.1 ms | ~112.0 ms | ~292.5 ms | ~192.7 ms |
1.3.7 |
~52.58 ms | ~40.48 ms | ~158.4 ms | ~120.2 ms | ~279.4 ms | ~196.8 ms |
1.3.8 |
~47.11 ms | ~41.60 ms | ~160.1 ms | ~119.8 ms | ~284.2 ms | ~198.2 ms |
1.3.9 |
~52.27 ms | ~40.93 ms | ~162.0 ms | ~119.4 ms | ~296.0 ms | ~201.7 ms |
1.4.0 |
~49.41 ms | ~40.98 ms | ~157.4 ms | ~116.1 ms | ~292.3 ms | ~207.0 ms |
1.4.1 |
~52.24 ms | ~41.51 ms | ~152.0 ms | ~114.8 ms | ~282.7 ms | ~206.5 ms |
1.4.2 |
~51.86 ms | ~39.78 ms | ~158.6 ms | ~119.0 ms | ~281.2 ms | ~212.9 ms |
1.4.3 |
~35.09 ms | ~22.89 ms | ~131.8 ms | ~75.28 ms | ~245.5 ms | ~145.2 ms |
1.4.4 |
~32.75 ms | ~24.37 ms | ~134.8 ms | ~77.10 ms | ~244.3 ms | ~152.4 ms |
The useful read here is the version window, not a single cell:
1.1.8~1.2.7stays in the same full-parse speed class as1.1.7, which matches the changelog story โ1.1.8itself was stack-safety work forwalkTokensplus docs, so it should not drift far from1.1.71.2.1~1.2.7is still the speed sweet spot for this mixed-document full-parse run1.3.0~1.3.3lands shorthand without yet falling off a cliff on the headline benchmark1.3.4+is not a tiny wobble; it is a real constant-factor regression. OnparseStructural,200 KBmoves from the~26 msclass to about~40 ms(~50% higher), and2 MBmoves from about~160 ~ 164 msto about~200 ~ 213 ms(~25% ~ 33% higher)1.4.0~1.4.2is much more about stable incremental integration and clearer diff contracts than about pulling headline full-parse back into the1.2.xlow band1.4.3is not just a partial recovery anymore. On the structural path it sets a new low for this post-1.1.7window:parseStructural 200 KBdrops from~39.78 msto~22.89 ms(~42% lower than1.4.2),1 MBdrops from~119.0 msto~75.28 ms(~37% lower), and2 MBdrops from~212.9 msto~145.2 ms(~32% lower)1.4.4still belongs to the fast full-parse tier that1.4.3pulled back down, but the shape is now more "partly sustained, partly rebounded":parseRichTextstays very strong at200 KB/2 MBwith a mild uptick at1 MB, whileparseStructuralis higher than1.4.3at all three sizes, though still well ahead of1.4.0~1.4.2
onError version differences (1.1.x)
This section now lives on Version Semantics Notes:
Version memory profile (parseStructural)
Below is the structural-memory comparison across the recent parser rewrites.
This section is about the memory shape right after parse, so the tables below use
afterParse.heapUsed, not the post-GC settled value.
| Case | 1.1.4 | 1.1.5 | 1.1.6 | 1.1.7 |
|---|---|---|---|---|
200 KB dense inline, heapUsed after parse |
27.37 MB |
22.73 MB |
21.44 MB |
21.60 MB |
2 MB dense inline, heapUsed after parse |
206.96 MB |
137.83 MB |
138.54 MB |
138.51 MB |
20k nested inline, heapUsed after parse |
24.48 MB |
17.84 MB |
16.53 MB |
16.53 MB |
sampledPeakRss from the same run:
| Case | 1.1.4 | 1.1.5 | 1.1.6 | 1.1.7 |
|---|---|---|---|---|
| 200 KB dense inline | 104.23 MB |
92.22 MB |
97.09 MB |
96.69 MB |
| 2 MB dense inline | 372.37 MB |
290.79 MB |
294.80 MB |
295.41 MB |
| 20k nested inline | 83.82 MB |
75.59 MB |
77.48 MB |
78.53 MB |
Earlier runs that put multiple versions and cases into the same process tended to inflate
especially the 200 KB and 20k nested rows because residual objects, JIT state, and GC timing
leaked across cases. The isolated-process re-runs brought the numbers back into a more stable band.
Extended memory table (1.1.8 ~ 1.4.4)
| Version | 200 KB heapUsed |
1 MB heapUsed |
2 MB heapUsed |
20k nested heapUsed |
|---|---|---|---|---|
1.1.8 |
21.46 MB |
85.35 MB |
138.93 MB |
16.31 MB |
1.1.9 |
21.33 MB |
84.94 MB |
138.90 MB |
16.34 MB |
1.2.0 |
21.46 MB |
85.14 MB |
138.78 MB |
16.36 MB |
1.2.1 |
21.50 MB |
85.33 MB |
138.74 MB |
16.33 MB |
1.2.2 |
21.42 MB |
85.52 MB |
138.91 MB |
16.33 MB |
1.2.3 |
21.51 MB |
85.57 MB |
138.72 MB |
16.33 MB |
1.2.4 |
21.31 MB |
85.34 MB |
138.62 MB |
16.47 MB |
1.2.5 |
21.64 MB |
85.30 MB |
138.62 MB |
16.43 MB |
1.2.6 |
21.26 MB |
85.52 MB |
138.64 MB |
16.45 MB |
1.2.7 |
21.47 MB |
85.63 MB |
138.60 MB |
16.51 MB |
1.3.0 |
20.85 MB |
85.73 MB |
138.87 MB |
17.27 MB |
1.3.1 |
21.47 MB |
85.70 MB |
138.94 MB |
17.32 MB |
1.3.2 |
21.48 MB |
86.97 MB |
142.03 MB |
17.90 MB |
1.3.3 |
21.64 MB |
90.99 MB |
141.94 MB |
17.82 MB |
1.3.4 |
17.66 MB |
84.39 MB |
141.10 MB |
16.89 MB |
1.3.5 |
17.59 MB |
84.33 MB |
141.07 MB |
16.03 MB |
1.3.6 |
21.32 MB |
76.87 MB |
159.64 MB |
16.94 MB |
1.3.7 |
21.40 MB |
76.87 MB |
159.69 MB |
17.33 MB |
1.3.8 |
21.45 MB |
77.07 MB |
159.75 MB |
16.99 MB |
1.3.9 |
21.33 MB |
77.12 MB |
159.77 MB |
17.41 MB |
1.4.0 |
21.36 MB |
77.05 MB |
159.82 MB |
17.42 MB |
1.4.1 |
21.48 MB |
77.02 MB |
159.78 MB |
17.39 MB |
1.4.2 |
21.50 MB |
77.07 MB |
159.80 MB |
17.39 MB |
1.4.3 |
20.58 MB |
71.53 MB |
162.32 MB |
18.67 MB |
1.4.4 |
20.75 MB |
71.81 MB |
162.63 MB |
19.52 MB |
This table should be read in tiers, not as a single-column race:
1.1.8~1.2.7mostly stay in the same structural-memory band as1.1.6/1.1.71.3.4/1.3.5are the prettiest small-document and 20k-nestedheapUsedvalues in this run1.3.6+drops1 MBfrom about85 MBto about77 MB(~9% lower), but raises2 MBfrom about141 MBto about160 MB(~13% higher). That strongly suggests an allocation-shape change: medium documents keep a tighter live-object window, while larger documents retain more structural objects / boundary metadata at once1.4.3improves the small and medium tiers:200 KBdrops from21.50 MBto20.58 MB, and1 MBdrops from77.07 MBto71.53 MB. But large-document and deep-nesting go the other way:2 MBrises from159.80 MBto162.32 MB(+1.6%), and20k nestedrises from17.39 MBto18.67 MB(+7.4%). So 1.4.3's memory profile is small/medium improved, large-doc and deep-nesting worse โ not a uniform win1.4.4stays very close to1.4.3on all four rows, but the direction is still "small / medium / large documents slightly higher, and 20k nested still high". So the better read is "same tier, still sitting in the heavier retained-shape band introduced by1.4.3"- so if the priority is small / medium document footprint,
1.3.4/1.3.5stand out; if the priority is 2 MB-scale stability,1.2.xis steadier
| Version | Public parseStructural memory shape |
Read |
|---|---|---|
1.1.1 |
Highest overhead among the post-two-phase releases. Public structural parsing still built the indexed tree first, then converted to a public tree through a separate strip phase | This part of the line still had the highest public-tree peak |
1.1.2 |
Same public-tree allocation shape as 1.1.1, but stack-safe and fully iterative |
The change here is stack safety, not peak-memory reduction |
1.1.3 |
Lower than 1.1.1 / 1.1.2 after stripMeta stopped building a whole-tree Map<IndexedStructuralNode, StructuralNode> |
Public-tree memory starts to tighten here |
1.1.4 |
Same structural memory class as 1.1.3 |
Same tier as 1.1.3 |
1.1.5 |
First release in this line to remove the public API's "second tree": after scanning, the public path no longer duplicates the indexed tree into a public tree | The main memory step-down lands here |
1.1.6 |
Same single-public-tree architecture as 1.1.5, but with a tighter scan-phase allocation strategy: text buffering now accumulates by ranges/segments instead of repeated string concatenation, and raw / block child frames keep sharing source ranges instead of eagerly slicing substrings |
200 KB is lower; 2 MB and nested are not lower than 1.1.5 |
1.1.7 |
Same architecture as 1.1.6; optimizations focus on the render layer and scan-phase constants: render-layer trimBlockBoundaryTokens no longer deep-clones the entire children array; flushBuffer uses direct string concatenation for the common 1โ2 segment-pair case, avoiding temporary array allocation |
200 KB is lower; 2 MB and nested are not lower than 1.1.5 |
Position tracking overhead
Note: The version used for this data was not recorded. Treat these numbers as order-of-magnitude reference only. Future reruns should tag the version and align with the extended full-parse table.
Measured on ~200 KB input (204,840 bytes), 20 samples per case.
| API | Without tracking | With tracking | Overhead |
|---|---|---|---|
parseRichText |
~22.45 ms | ~34.07 ms | ~51.8% |
parseStructural |
~14.88 ms | ~18.49 ms | ~24.3% |
This run says three practical things:
trackPositionsis still well within normal editor-budget territory, but it is not free- the visible overhead is much higher on
parseRichTextthan onparseStructural - if a pipeline needs tighter budgeting,
parseRichText + trackPositionsis the first place to inspect
Substring parse: baseOffset / tracker
Note: The version used for this data was not recorded. Treat these numbers as order-of-magnitude reference only.
This run uses a fixed 53-character slice, matching the substring scenario from Source Position Tracking. 800 samples per case.
| API | baseOffset |
tracker |
Read |
|---|---|---|---|
parseRichText slice |
~23.78 ยตs | ~20.62 ยตs | same performance class |
parseStructural slice |
~14.26 ยตs | ~13.47 ยตs | same performance class |
The measured read is straightforward:
- both
baseOffsetandtrackerstay in the tens-of-microseconds range trackerdoes not turn substring parsing into a millisecond-scale pathparseStructuralremains lighter thanparseRichTexton substring work
Incremental Parsing
For incremental structural caching (avoid full parseStructural scans across edits), see
Incremental Parsing.
The old micro-benchmark ("edit one 36-character tag in the middle of a ~200 KB document") is no longer the right lead-in for this page. The main incremental section now uses a shape much closer to real editor work:
- Baseline document: ~
1 MB - Single edit size: about
5% - Edit classes:
inline/deep-inline/raw/block - Methodology: one case per process,
--expose-gc, forced GC before and after each measured round, 20 measured rounds per case
Incremental grouping overview
| Group | Versions | How to read it |
|---|---|---|
| First public incremental surface | 1.2.0 |
Still low-level parseIncremental/updateIncremental; usable, but clearly slower than the later session-first line |
| Bare incremental speed advantage | 1.2.4 / 1.2.5 / 1.2.6 / 1.2.7 / 1.3.0 / 1.4.3 / 1.4.4 |
1.2.4 ~ 1.3.0 is the most stable low-20-ms band; 1.4.3 / 1.4.4 are the late-line recovery window that pulls bare applyEdit back down sharply again |
| Deep-inline speed advantage | 1.2.6 / 1.3.0 / 1.3.4 / 1.3.5 |
1.2.6 / 1.3.0 are the lowest points; 1.3.4 / 1.3.5 represent the post-shorthand recovery window |
| Raw / block speed advantage | 1.3.6 / 1.3.7 / 1.4.0 / 1.4.1 / 1.4.3 |
After context-aware escaping and more stable boundary handling, raw / block become the easiest late-line bare-update paths to keep fast; 1.4.3 pushes both even lower again |
| Diff-cost advantage | 1.4.0 / 1.4.1 |
These are still the steadiest structured-diff releases: much cheaper than 1.3.8 / 1.3.9, and more balanced overall than 1.4.2 / 1.4.3 |
| Retained-heap advantage | 1.2.1 ~ 1.3.7, especially 1.2.4 ~ 1.3.7 |
Build-time retained heap stays roughly in the 29.5 ~ 30.7 MB band; inline-edit retained heap is especially good in the 1.2.4 ~ 1.3.7 window |
| Best integration picks | 1.4.0 / 1.4.1 |
If you care about stable session semantics, consumable diff output, and not pushing retained heap or diff cost further to the worse side, these are still the easiest versions to recommend |
Incremental benchmark table (1 MB baseline, ~5% edits)
| Version | Engine | inline |
deep-inline |
raw |
block |
Notes |
|---|---|---|---|---|---|---|
1.2.0 |
low-level | ~39.60 ms | ~38.87 ms | ~33.40 ms | ~37.57 ms | parseIncremental/updateIncremental |
1.2.1 |
session | ~41.16 ms | ~40.32 ms | ~36.65 ms | ~40.95 ms | first createIncrementalSession(...) release |
1.2.2 |
session | ~41.69 ms | ~41.66 ms | ~49.66 ms | ~39.48 ms | still early-session behavior |
1.2.3 |
session | ~46.14 ms | ~47.97 ms | ~44.50 ms | ~48.09 ms | pre-zone-splitting speed band |
1.2.4 |
session | ~22.88 ms | ~26.54 ms | ~22.87 ms | ~28.83 ms | first obvious speed step after softZoneNodeCap / internal zone splitting |
1.2.5 |
session | ~22.58 ms | ~26.56 ms | ~22.61 ms | ~29.54 ms | same class as 1.2.4 |
1.2.6 |
session | ~21.70 ms | ~20.88 ms | ~22.86 ms | ~28.31 ms | one of the nicest deep-inline results; matches the stack-safe signature-path fixes |
1.2.7 |
session | ~22.31 ms | ~26.67 ms | ~22.88 ms | ~29.32 ms | still in the low-20-ms band |
1.3.0 |
session | ~22.25 ms | ~20.61 ms | ~22.76 ms | ~28.88 ms | shorthand lands without blowing up the main incremental band |
1.3.1 |
session | ~22.81 ms | ~37.11 ms | ~22.14 ms | ~29.04 ms | deep-inline gets noticeably heavier after shorthand boundary / depth-limit fixes |
1.3.2 |
session | ~22.36 ms | ~36.79 ms | ~22.09 ms | ~28.91 ms | deep-inline still high |
1.3.3 |
session | ~21.91 ms | ~38.00 ms | ~22.20 ms | ~29.64 ms | deep-inline still high |
1.3.4 |
session | ~26.85 ms | ~28.19 ms | ~21.83 ms | ~28.08 ms | inline gets heavier, but deep-inline drops back down |
1.3.5 |
session | ~27.08 ms | ~28.17 ms | ~21.75 ms | ~28.93 ms | same class as 1.3.4 |
1.3.6 |
session | ~29.71 ms | ~28.77 ms | ~15.86 ms | ~32.25 ms | raw path drops sharply; aligns with context-aware escaping |
1.3.7 |
session | ~29.80 ms | ~28.99 ms | ~21.69 ms | ~32.28 ms | same general band after the endTag prefix-consumption fix |
1.3.8 |
session | ~26.37 ms | ~23.10 ms | ~15.90 ms | ~32.55 ms | deep-inline drops again after structured diff work |
1.3.9 |
session | ~26.65 ms | ~22.70 ms | ~15.90 ms | ~26.39 ms | steady low band after isNoop and conservative-diff contract clarification |
1.4.0 |
session | ~26.48 ms | ~22.27 ms | ~16.01 ms | ~26.14 ms | speed holds after the stable-integration surface is declared |
1.4.1 |
session | ~27.13 ms | ~22.47 ms | ~16.00 ms | ~25.62 ms | corrupted deep-inline edits no longer hang; normal edits stay in band |
1.4.2 |
session | ~29.61 ms | ~22.63 ms | ~21.57 ms | ~25.81 ms | latest measured version; semantics and type cleanup matter more than chasing a new low |
1.4.3 |
session | ~17.96 ms | ~29.81 ms | ~15.76 ms | ~17.73 ms | current release candidate; inline / raw / block drop sharply again, but deep-inline is still above the 1.4.0 ~ 1.4.2 band |
1.4.4 |
session | ~17.72 ms | ~29.59 ms | ~15.90 ms | ~17.98 ms | broadly keeps the 1.4.3 bare-incremental tier; inline is slightly better, deep-inline is a touch lower, but raw / block are slightly higher |
How to read the incremental table
1.2.0~1.2.3is the first public incremental surface: usable, but still a full band slower than the later line1.2.4is the first real performance inflection point on this page: the changelog's pure-inline zone splitting immediately shows up as low-20-ms 1 MB / 5% edits1.2.6/1.3.0are the easiest all-around recommendations in this run:inline,deep-inline, andraware all steady1.3.1~1.3.3clearly raise deep-inline cost; that lines up with the shorthand-boundary, parent-close ownership, and ambiguity-guard fixes, where correctness is taking priority1.3.6+makes raw edits noticeably faster, in the same direction as the changelog's context-aware escaping and raw / block boundary stabilization1.4.2raises rawapplyEdititself from~16.00 msto~21.57 ms(~35% higher), so that cleanup window was not free1.4.3is mostly a success on bareapplyEdit:inlinereturns to~17.96 ms,rawto~15.76 ms, andblockto~17.73 ms; butdeep-inlinestill sits at~29.81 ms, clearly above the low-20-ms1.4.0~1.4.2band1.4.4stays in the same bare-incremental tier as1.4.3:inlineis a little better anddeep-inlineonly edges down slightly, butraw/blockare slightly higher. So this is better read as "same band" rather than a fresh across-the-board drop
Shared methodology for this incremental extension
- Baseline document: about
1 MB - Each scenario keeps the single edit at about
5% inline: edit normal inline-tag contentdeep-inline: edit deeply nested inline contentraw: edit raw-tag body contentblock: edit block-tag body content1.2.0uses low-levelparseIncremental/updateIncremental1.2.1+usescreateIncrementalSession(...).applyEdit(...)- Every measured version stayed at 20/20 expected-mode hits in this benchmark, so there was no extra fallback note to preserve in the table
Incremental diff cost (1.3.8+)
From 1.3.8 onward, the session API also has the applyEditWithDiff(...) path. For editor
integrations, this is usually closer to the real end-to-end cost than bare applyEdit(...).
| Version | inline applyEdit |
inline applyEditWithDiff |
Ratio | deep-inline applyEdit |
deep-inline applyEditWithDiff |
Ratio | raw applyEdit |
raw applyEditWithDiff |
Ratio | block applyEdit |
block applyEditWithDiff |
Ratio |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
1.3.8 |
~26.37 ms | ~35.17 ms | 1.33x |
~23.10 ms | ~193.7 ms | 8.39x |
~15.90 ms | ~87.61 ms | 5.51x |
~32.55 ms | ~101.0 ms | 3.10x |
1.3.9 |
~26.65 ms | ~35.88 ms | 1.35x |
~22.70 ms | ~195.2 ms | 8.60x |
~15.90 ms | ~92.27 ms | 5.80x |
~26.39 ms | ~101.5 ms | 3.85x |
1.4.0 |
~26.48 ms | ~39.75 ms | 1.50x |
~22.27 ms | ~63.63 ms | 2.86x |
~16.01 ms | ~52.93 ms | 3.31x |
~26.14 ms | ~66.18 ms | 2.53x |
1.4.1 |
~27.13 ms | ~41.12 ms | 1.52x |
~22.47 ms | ~61.68 ms | 2.75x |
~16.00 ms | ~51.30 ms | 3.21x |
~25.62 ms | ~62.94 ms | 2.46x |
1.4.2 |
~29.61 ms | ~41.44 ms | 1.40x |
~22.63 ms | ~79.89 ms | 3.53x |
~21.57 ms | ~51.52 ms | 2.39x |
~25.81 ms | ~64.54 ms | 2.50x |
1.4.3 |
~17.96 ms | ~38.40 ms | 2.14x |
~29.81 ms | ~85.96 ms | 2.88x |
~15.76 ms | ~76.20 ms | 4.84x |
~17.73 ms | ~79.69 ms | 4.50x |
1.4.4 |
~17.72 ms | ~40.55 ms | 2.29x |
~29.59 ms | ~87.81 ms | 2.97x |
~15.90 ms | ~77.36 ms | 4.86x |
~17.98 ms | ~82.30 ms | 4.58x |
The key read here is not just "diff is slower" โ that is expected โ but that 1.4.0 pulls diff
back into a much more integration-friendly band:
1.3.8/1.3.9: diff is heavy ondeep-inline,raw, andblock, withdeep-inlineclose to8.4x ~ 8.6x1.4.0+:deep-inlineis still the most expensive case, but it drops from nearly200 msinto the60 ~ 80 msband;raw/blockalso fall from80 ~ 100 msinto about50 ~ 66 msinlinestays comparatively stable the whole time, which strongly suggests the issue is not "diff is always explosive", but "some structural cases refine far too deeply"1.4.3also improves inline diff (41.44 โ 38.40 ms), butdeep-inline/raw/blockare still heavy: raw sits at~76.20 ms, block at~79.69 ms, and deep-inline at~85.96 ms. So this current workspace is much more of a full-parse + bare-incremental optimization than a broad diff optimization1.4.4does not keep the previous "a little lower again" story in this rerun: inline diff is back at~40.55 ms, and raw / block are also slightly higher than1.4.3;deep-inlineremains in the same expensive band too, so this is still not a broad diff-cost reset
One more qualifier matters here: the current raw benchmark is effectively a high-repetition worst-case.
The script repeats many same-shape $$code(...)%...%end$$ units and then replaces a middle slice, which is
deliberately unfriendly to unique diff anchors. A separate quick bench with unique raw units (about 1 MB
source, 5% edit) lands closer to ~45 ms for raw applyEditWithDiff, with a dirty span much closer to
the edited window itself instead of nearly the whole document. So the ~76 ms row here is better read as a
pressure-case upper bound for repetitive input, not as the typical cost of everyday non-repetitive raw edits.
That lines up with the changelog:
1.3.8upgradesapplyEditWithDiff(...)from a basic range diff to structuredops/patches, increasing expressive power and cost at the same time1.3.9addsisNoopand clarifies conservative-diff semantics, but does not fundamentally reduce heavy-case cost1.4.0adds global refinement budgets (comparisons / anchors / ops / subtree size / wall-clock), so when refinement is not worth it the code drops to coarse splice or conservative whole-tree diff much earlier
Diff result-shape summary
This helps explain why 1.4.0+ gets much faster.
| Version | inline |
deep-inline |
raw |
block |
|---|---|---|---|---|
1.3.8 |
incremental=20; ops~1064, patches~1064 |
incremental=20; ops~98, patches~98 |
incremental=20; ops~625, patches~625 |
incremental=20; ops~610, patches~610 |
1.3.9 |
incremental=20; ops~1064, patches~1064 |
incremental=20; ops~98, patches~98 |
incremental=20; ops~625, patches~625 |
incremental=20; ops~610, patches~610 |
1.4.0 |
incremental=20; ops~1, patches~1 |
incremental=20; ops~98, patches~98 |
incremental=20; ops~1, patches~1 |
incremental=20; ops~1, patches~1 |
1.4.1 |
incremental=20; ops~1, patches~1 |
incremental=20; ops~98, patches~98 |
incremental=20; ops~1, patches~1 |
incremental=20; ops~1, patches~1 |
1.4.2 |
incremental=20; ops~1, patches~1 |
incremental=20; ops~98, patches~98 |
incremental=20; ops~1, patches~1 |
incremental=20; ops~1, patches~1 |
1.4.3 |
incremental=20; ops~1, patches~1 |
incremental=20; ops~98, patches~98 |
incremental=20; ops~1, patches~1 |
incremental=20; ops~1, patches~1 |
1.4.4 |
incremental=20; ops~1, patches~1 |
incremental=20; ops~98, patches~98 |
incremental=20; ops~1, patches~1 |
incremental=20; ops~1, patches~1 |
This strongly suggests:
1.3.8/1.3.9still emit large, fine-grained diffs for many scenarios1.4.0+is much more willing to collapseinline/raw/blockinto very small diffs when the refinement budget says further work is not worth itdeep-inlinestill keeps manyops, which is why it remains the most expensive diff case in1.4.0/1.4.1/1.4.2/1.4.3/1.4.41.4.3shows again that a very small diff shape does not automatically mean low diff cost: inline gets the win this time, but raw / block / deep-inline still stay expensive
So the version recommendation changes a bit depending on what your integration actually consumes:
- If you only care about incremental update cost itself:
1.2.4~1.3.0still looks great, and1.4.3/1.4.4are now the stronger bare-incremental options inside1.4.0/1.4.1/1.4.2/1.4.3/1.4.4 - If you care about incremental update + structured diff cost:
1.4.0/1.4.1are still the steadiest picks inside1.4.0/1.4.1/1.4.2/1.4.3/1.4.4; after this rerun,1.4.4is not prettier than1.4.3here: inline / raw / block diff are all a bit higher, and deep-inline is still expensive
Incremental snapshot memory (1.2.0+)
This benchmark answers another practical question: once the incremental snapshot is actually built
and tree / zones are materialized, what does the retained heap look like?
This is not a transient "right after parse" peak. The measurement flow is:
- build the incremental snapshot
- deliberately touch
doc.tree/doc.zonesso lazy getters do not under-report memory - force GC
- then read the still-retained
heapUsed
That makes it much closer to the memory shape of a real editor holding an incremental document in memory.
| Version | inline retained heap after build | deep-inline retained heap after build | raw retained heap after build | block retained heap after build |
|---|---|---|---|---|
1.2.0 |
28.74 MB | 29.68 MB | 28.57 MB | 28.80 MB |
1.2.1 |
29.76 MB | 30.69 MB | 29.58 MB | 29.80 MB |
1.2.2 |
29.78 MB | 30.70 MB | 29.60 MB | 29.81 MB |
1.2.3 |
30.39 MB | 31.31 MB | 30.20 MB | 30.43 MB |
1.2.4 |
29.70 MB | 30.64 MB | 29.53 MB | 29.76 MB |
1.2.5 |
29.70 MB | 30.64 MB | 29.53 MB | 29.76 MB |
1.2.6 |
29.72 MB | 30.65 MB | 29.59 MB | 29.77 MB |
1.2.7 |
29.72 MB | 30.66 MB | 29.59 MB | 29.78 MB |
1.3.0 |
29.68 MB | 30.62 MB | 29.51 MB | 29.73 MB |
1.3.1 |
29.68 MB | 30.62 MB | 29.51 MB | 29.74 MB |
1.3.2 |
29.69 MB | 30.62 MB | 29.51 MB | 29.74 MB |
1.3.3 |
29.69 MB | 30.62 MB | 29.51 MB | 29.74 MB |
1.3.4 |
29.72 MB | 30.66 MB | 29.55 MB | 29.78 MB |
1.3.5 |
29.73 MB | 30.66 MB | 29.55 MB | 29.78 MB |
1.3.6 |
29.75 MB | 30.68 MB | 29.57 MB | 29.79 MB |
1.3.7 |
29.74 MB | 30.68 MB | 29.57 MB | 29.80 MB |
1.3.8 |
31.13 MB | 32.05 MB | 30.95 MB | 31.18 MB |
1.3.9 |
31.13 MB | 32.06 MB | 30.95 MB | 31.18 MB |
1.4.0 |
31.18 MB | 32.10 MB | 31.00 MB | 31.23 MB |
1.4.1 |
31.18 MB | 32.10 MB | 31.00 MB | 31.23 MB |
1.4.2 |
31.18 MB | 32.10 MB | 31.00 MB | 31.23 MB |
1.4.3 |
31.19 MB | 32.11 MB | 31.01 MB | 31.24 MB |
1.4.4 |
31.19 MB | 32.12 MB | 31.01 MB | 31.24 MB |
There are two clear tiers here:
1.2.1~1.3.7: retained heap after build is basically stable in the29.5 ~ 30.7 MBband1.3.8+: the whole line shifts upward by about1.3 ~ 1.5 MB1.4.3/1.4.4: these constant-factor iterations barely change incremental retained heap at all; they stay in the same heavier retained-heap tier as1.4.0/1.4.1/1.4.2/1.4.3/1.4.4
That matches the diff story above: from 1.3.8 onward, the line is not just paying more in diff
time; the incremental snapshot itself also gets heavier in order to support richer structured diff
behavior.
Retained heap after one edit
Now the same question after actually applying one ~5% edit and keeping the updated document alive.
| Version | inline retained heap after edit | deep-inline retained heap after edit | raw retained heap after edit | block retained heap after edit |
|---|---|---|---|---|
1.2.0 |
38.60 MB | 40.39 MB | 38.28 MB | 38.71 MB |
1.2.1 |
39.61 MB | 41.90 MB | 39.78 MB | 40.22 MB |
1.2.2 |
40.16 MB | 41.94 MB | 39.83 MB | 40.26 MB |
1.2.3 |
40.76 MB | 43.00 MB | 40.44 MB | 40.87 MB |
1.2.4 |
33.04 MB | 44.00 MB | 41.98 MB | 42.35 MB |
1.2.5 |
33.05 MB | 44.00 MB | 41.92 MB | 42.36 MB |
1.2.6 |
33.04 MB | 44.00 MB | 41.92 MB | 42.36 MB |
1.2.7 |
33.05 MB | 44.01 MB | 41.93 MB | 42.37 MB |
1.3.0 |
33.01 MB | 43.97 MB | 41.88 MB | 42.32 MB |
1.3.1 |
33.01 MB | 43.97 MB | 41.90 MB | 42.32 MB |
1.3.2 |
33.01 MB | 43.97 MB | 41.88 MB | 42.33 MB |
1.3.3 |
33.01 MB | 43.97 MB | 41.90 MB | 42.33 MB |
1.3.4 |
33.00 MB | 44.03 MB | 41.93 MB | 42.37 MB |
1.3.5 |
33.01 MB | 44.03 MB | 41.94 MB | 42.37 MB |
1.3.6 |
33.02 MB | 44.05 MB | 41.96 MB | 42.40 MB |
1.3.7 |
33.02 MB | 44.05 MB | 41.96 MB | 42.41 MB |
1.3.8 |
34.41 MB | 45.42 MB | 43.34 MB | 43.79 MB |
1.3.9 |
34.41 MB | 45.43 MB | 43.34 MB | 43.78 MB |
1.4.0 |
34.46 MB | 45.47 MB | 43.39 MB | 43.83 MB |
1.4.1 |
34.46 MB | 45.48 MB | 43.39 MB | 43.83 MB |
1.4.2 |
34.46 MB | 45.48 MB | 43.41 MB | 43.85 MB |
1.4.3 |
34.48 MB | 45.49 MB | 43.42 MB | 43.87 MB |
1.4.4 |
34.48 MB | 45.49 MB | 43.42 MB | 43.87 MB |
This table is even more revealing:
- inline edits drop sharply at
1.2.4and stay around33 MBthrough1.3.7 - deep-inline / raw / block do not get the same drop, which strongly suggests the pure-inline zone-splitting work mostly improved the retained shape of inline-style edits
1.3.8+lifts all four scenarios together by roughly1.3 ~ 1.5 MB
So if you read incremental speed + structured diff + retained memory together, the line becomes clearer:
1.2.4~1.3.7: fast bare incremental updates, and very nice retained heap for inline edits1.3.8/1.3.9: richer diff output, but both latency and retained heap step upward1.4.0+: diff latency comes back down a lot, but retained heap stays in the heavier1.3.8+memory tier1.4.3/1.4.4: these late iterations are almost entirely time-constant optimizations, not retained-heap optimizations
If the only question is "is holding an incremental snapshot in memory prohibitively expensive?", the
answer is still fairly calm: on a 1 MB baseline, most versions sit around 30 MB after build, and
around 33 ~ 45 MB after one retained edit, depending on scenario.
Pathological Deep-Nesting Stress
Input shape: a single-chain inline nest, $$bold($$bold(...x...)$$)$$.
Each table row below uses the same shape with the listed layer count (5,000 / 20,000 / 200,000 / 1,000,000 /
10,000,000 / 20,000,000 / 30,000,000 / 40,000,000 / 50,000,000).
Measured on:
- historical columns: Kunpeng 920 aarch64 / Node v24.14.0
1.4.3 currentreruns: Kunpeng 920 aarch64 / Node v24.15.0
depthLimit is set to each layer count + 100 (no degradation). Large-scale high-layer runs in this
section use an expanded heap budget; exact memory notes and run conditions are listed below the table.
| API | 1.1.0 | 1.1.1 | 1.1.2-1.1.4 | 1.1.5 | 1.1.6 | 1.4.3 current | 1.4.4 current |
|---|---|---|---|---|---|---|---|
parseStructural(5000) |
Stack overflow | ~7119 ms | ~19 ms | ~33.73 ms | ~36.30 ms | ~20.73 ms | Same class, not re-measured separately |
parseRichText(5000) |
~9731 ms | ~17216 ms | ~23 ms | ~41.12 ms | ~34.58 ms | ~29.05 ms | ~35.50 ms (mean of 3 reruns) |
parseStructural(20000) |
โ | โ | ~61 ms | ~44.79 ms | ~37.89 ms | ~50.95 ms | Same class, not re-measured separately |
parseRichText(20000) |
โ | โ | ~61 ms | ~134.35 ms | ~53.58 ms | ~87.53 ms | ~125.85 ms (mean of 3 reruns) |
parseStructural(200000) |
โ | โ | ~667 ms | ~361.20 ms | ~421.45 ms | ~457.05 ms | Same class, not re-measured separately |
parseRichText(200000) |
โ | โ | ~844 ms | ~1096.81 ms | ~1209.64 ms | ~1.15 s | ~1.13 s (mean of 3 reruns) |
parseStructural(1000000) |
โ | โ | ~2.8 s | ~2.39 s | ~2.50 s | ~2.39 s | Same class, not re-measured separately |
parseRichText(1000000) |
โ | โ | ~6.4 s | ~6.20 s | ~6.00 s | ~5.46 s | ~5.49 s (mean of 3 reruns) |
parseStructural(10000000) |
โ | โ | ~28.6 s | ~17.3 s (separately re-measured) | ~18.7 s (separately re-measured) | ~21.15 s (separately re-measured) | Same class, not re-measured separately |
parseRichText(10000000) |
โ | โ | ~50 s | Same class, not re-measured separately | Same class, not re-measured separately | ~64.65 s (mean of 3 reruns) | ~65.50 s (mean of 3 reruns) |
parseStructural(20000000) |
โ | โ | ~89.9 s (--max-old-space-size=12288) |
Same class, not re-measured separately | Same class, not re-measured separately | Same class, not re-measured separately | Same class, not re-measured separately |
parseStructural(30000000) |
โ | โ | ~163.1 s (--max-old-space-size=18432) |
Same class, not re-measured separately | Same class, not re-measured separately | Same class, not re-measured separately | Same class, not re-measured separately |
parseStructural(40000000) |
โ | โ | ~210.0 s (--max-old-space-size=24576) |
Same class, not re-measured separately | Same class, not re-measured separately | Same class, not re-measured separately | Same class, not re-measured separately |
parseStructural(50000000) |
โ | โ | ~224.1 s (--max-old-space-size=32768) |
Same class, not re-measured separately | Same class, not re-measured separately | Same class, not re-measured separately | Same class, not re-measured separately |
Note: 1.1.5 / 1.1.6 have been separately re-measured through 1000000, and
parseStructural(10000000) has also been measured separately. For 1.4.3, this update separately
re-measured every row at <= 10000000. For 1.4.4, this follow-up separately reran only
parseRichText(5000 / 20000 / 200000 / 1000000 / 10000000); the remaining 1.4.4 rows are still
best read as the same class without page-specific reruns. Only the 20000000+ rows remain in the
same fully iterative 1.1.2+ class without page-specific reruns.
The 1.4.3 reread is best split into two bands:
5000 ~ 1000000: the current workspace still sits in the "deep chains are safe and the small-to-mid constant is still decent" zone.parseStructural(5000)is already close to the1.1.2 ~ 1.1.4band again, andparseRichText(1000000)lands around ~5.46 s10000000:1.4.3/1.4.4still finish under the fully iterative path, but the constant is no longer among the best historical points on this line.parseStructural(10000000)is about ~21.15 s, somewhat slower than the dedicated1.1.5/1.1.6reruns;1.4.3parseRichText(10000000)landed at 64.12 s / 64.87 s / 64.96 s (mean ~64.65 s), while1.4.4landed at 68.00 s / 62.24 s / 66.25 s (mean ~65.50 s). So this is not a one-off spike but a stable heavier path under the current semantics.
Note: the public parseStructural(20000000) path now completes after stripMeta was rewritten to
build the public tree directly without the intermediate Map<IndexedStructuralNode, StructuralNode>.
The 20 M / 30 M / 40 M / 50 M runs above used NODE_OPTIONS=--max-old-space-size=12288 / 18432 / 24576 / 32768
respectively, and finished at roughly:
1.4.3 parseStructural(10 M): this rerun used--max-old-space-size=12288, ending around5.59 GB heap / 5.83 GB RSS1.4.3 parseRichText(10 M): this rerun used--max-old-space-size=13312, ending around7.65 GB heap / 9.86 GB RSS; the three reruns stayed in essentially the same RSS / heap band1.4.4 parseRichText(10 M): this rerun also used--max-old-space-size=13312, ending around7.85 GB heap / 9.87 GB RSSon the 3-run mean- 20 M: 10.7 GB heap / 11.9 GB RSS
- 30 M: 16.1 GB heap / 17.0 GB RSS
- 40 M: 21.4 GB heap / 22.7 GB RSS
- 50 M: 26.8 GB heap / 28.0 GB RSS
Under Node's default heap limit, high-layer public-tree runs may still OOM depending on machine limits.
Stack overflow thresholds per version (same environment, binary search, ยฑ50 layers):
| API | 1.1.0 | 1.1.1 | 1.1.2+ |
|---|---|---|---|
parseStructural |
~3172 layers | ~1611 layers | No limit (fully iterative) |
parseRichText |
~3269 layers | ~3221 layers | No limit (fully iterative) |
1.1.2 eliminated three independent deep-nesting bottlenecks:
- Stack overflow โ
parseNodes,renderNodes,stripMeta,extractText, andmaterializeTextTokensconverted from recursion to explicit stack iteration. Nesting depth is now bounded only by heap memory materializeTextTokensO(nยฒ) re-traversal (render layer) โ each handler invocation recursed into the full subtree. AmaterializedArraysWeakSet now marks processed subtrees so subsequent calls skip them โ O(n)findInlineClose/findTagArgCloseO(nยฒ) forward scan (structural layer) โ each inline nesting level scanned forward for the matching close. Now uses lazy close: inline child frames continue scanning on the parent's text and complete when they encounter)$$; gating-confirmed inline-only tags skipgetTagCloserTypeentirely โ O(n)
5000-layer parseRichText: 1.1.1 ~17 s โ 1.1.2 ~23 ms (~740x faster).
From 1.1.2 onward, this version line has effectively crossed a generation boundary on stack
safety:
1.1.0/1.1.1still had to worry about call-stack limits1.1.2+mostly turns the problem into a heap-budget question instead
So this section should be read as a long-term note about stack safety and worst-case-path complexity, not as a day-to-day patch-by-patch benchmark that needs constant refreshing.
Lightweight Utility Ops
Measured on ~200 KB document (200,067 chars, 6,321 structural nodes, 9,165 rendered tokens). Data represents the average of 5 independent process runs.
| Operation | Time | Notes |
|---|---|---|
printStructural |
~2.00 ms | Lossless round-trip |
buildZones |
~0.74 ms | 1,897 zones |
walkTokens |
~0.50 ms | 9,165 visits |
mapTokens identity |
~1.21 ms | Wrapper-only cost |
mapTokens transform |
~1.55 ms | Rename every bold to strong |
These utility operations are effectively free compared with the parser itself in editor pipelines.