Performance Report: Stream vs Chunked vs Full
May 14, 2026 · View on GitHub
This report summarizes measured performance across five configurations and four document sizes. Benchmarks were run on Node.js using synthetic paragraph-heavy content.
Treat these numbers as local harness data, not as a blanket claim that every workload is faster. The latest generated snapshot in docs/perf-latest.json records benchmarkVersion, generatedAt, Node version, platform, CPU, CPU count, and commit SHA so results can be reproduced or compared against later runs.
Default API note:
- Normal callers should keep using
md.parse(src)/md.render(src). - Large finite strings can be handled by the default API via internal large-input optimizations on stock parser instances; plugin/custom-rule instances keep plain full-parse behavior unless chunking is explicitly enabled.
- Explicit chunk-stream APIs such as
parseIterable/UnboundedBufferare advanced tools for sources that already arrive as chunks; they are not required to benefit from the default large-text path.
Scenarios:
- S1: stream ON, cache OFF, chunk ON (stream + chunked, but reset cache each step)
- S2: stream ON, cache ON, chunk OFF (stream append fast-path only)
- S3: stream ON, cache ON, chunk ON (hybrid: chunked allowed and append fast-path)
- S4: stream OFF, chunk ON (full parse with chunked fallback)
- S5: stream OFF, chunk OFF (plain full parse)
Workloads measured per size:
- one-shot: single full parse of the entire document
- append workload: 1 initial parse + 5 append steps (growing to the target size)
Raw results (ms):
- size=5k chars
- one-shot best: S5 0.65ms
- append best: S3 0.79ms (S2: 1.28ms)
- size=20k chars
- one-shot best: S5 0.94ms
- append best: S2 2.03ms (S3: 2.58ms)
- size=50k chars
- one-shot best: S5 2.72ms
- append best: S3 2.57ms (S2: 2.81ms)
- size=100k chars
- one-shot best: S5 5.25ms
- append best: S3 6.57ms (S2: 6.91ms)
Append fast-path confirmation: With a stable env object, appendHits reached 5 (one per append) for S2/S3 across sizes.
Conclusions
-
One-shot parsing (no appends):
- For the tested content, plain full parse (S5) was consistently fastest from 5k to 100k chars.
- Chunked (S4) did not outperform full parse on these inputs. It may help on extremely large or fence/blank-line-heavy documents; tune thresholds if you enable it.
-
Append-heavy editing (growing documents):
- Stream with cache (S2/S3) clearly outperforms non-stream at medium and large sizes.
- Hybrid (S3) is usually as fast or slightly faster than stream-only (S2) for larger docs (≥ 50k), primarily because it can choose chunked on the initial parse when beneficial.
- For smaller docs (~5k–20k), stream-only (S2) can be a tiny bit faster than hybrid (S3), depending on thresholds, but both beat non-stream.
Recommendations
-
If you parse once (one-shot):
- Default to full parse (S5). Enable full-chunked fallback only after testing on your workload; consider starting thresholds at ~20k chars/400 lines.
-
If you support live editing with appends:
- Enable stream mode with cache (S2):
stream: trueand leavestreamChunkedFallback: false. - If initial parses are often large (tens of kB+), enable hybrid (S3):
streamChunkedFallback: truewith chunk size ~10k chars/200 lines.
- Enable stream mode with cache (S2):
-
Threshold tuning:
- Start with
streamChunkSizeChars ≈ 10k,streamChunkSizeLines ≈ 200. - For full parse chunked fallback, start with
fullChunkThresholdChars ≈ 20k,fullChunkThresholdLines ≈ 400, and chunk size8k–16kchars,150–250lines.
- Start with
Adaptive chunk sizing (default)
Both full and stream chunked fallbacks choose chunk sizes adaptively by default:
- Target around 8 chunks (
fullChunkTargetChunks/streamChunkTargetChunks), clamped to a practical range. - Effective sizes:
ceil(docChars / target)clamped to[8k, 32k]for characters and[150, 350]for lines. - Disable with
fullChunkAdaptive: falseorstreamChunkAdaptive: falseand pass fixed*SizeChars/*SizeLines. - Optionally cap the number of chunks with
fullChunkMaxChunks.
Notes:
- Adaptive sizing reduces the chance of over-chunking at ~100k+ where orchestration cost can dominate.
- Even with adaptive sizing, one-shot full parse (S5) can remain faster for some inputs. Validate on your data.
How to reproduce
- Build and run the matrix:
npm run build
node scripts/perf-matrix.mjs
- Optional: Sweep non-stream chunked settings on your own content:
npm run build
node scripts/full-vs-chunked-sweep.mjs
These scripts print best-per-size summaries and can export JSON by setting PERF_JSON=1.
When publishing or comparing benchmark numbers, include:
- Node.js version
- CPU and OS/platform
- benchmark version
- commit SHA for this repository
- baseline package versions
- content generator or fixture source
- warmup/iteration settings from the harness
Baseline: markdown-it (JS) example
For parity, we include the upstream markdown-it as a baseline in the matrix (scenario M1):
import MarkdownIt from 'markdown-it'
const md = new MarkdownIt()
const tokens = md.parse('# Title\n\nHello', {})
const html = md.render('# Title\n\nHello')
See the latest auto-generated numbers in docs/perf-latest.md.
Remark parse (parse-only)
We also include a Remark parser scenario (R1) to compare pure parse throughput. It exercises:
import { unified } from 'unified'
import remarkParse from 'remark-parse'
const u = unified().use(remarkParse)
const tree = u.parse('# Title\n\nHello')
Notes:
- This measures parse only (no HTML render). It appears in the perf matrix as
R1whenunifiedandremark-parseare installed. - Install deps once:
pnpm add -D unified remark-parse. - Run the matrix as usual:
npm run perf:matrix.
Regenerate the report in CI
You can refresh docs/perf-latest.md on demand via GitHub Actions:
- Go to your repository on GitHub → Actions → “Perf Report” → “Run workflow”.
- Optional inputs:
- ref: branch/tag/SHA to run against (defaults to current branch)
- node-version: Node.js version (default 20)
- package-manager: pnpm or npm (default pnpm)
The workflow will install deps, run perf:generate, upload the files as an artifact, and commit/push docs/perf-latest.md and docs/perf-latest.json if they changed.
Chinese version (zh-CN):
- Run “Perf Report (zh-CN)” workflow. It executes
perf:generate:zhand updatesdocs/perf-latest.zh-CN.mdsimilarly.