Performance Report: Stream vs Chunked vs Full

May 14, 2026 · View on GitHub

This report summarizes measured performance across five configurations and four document sizes. Benchmarks were run on Node.js using synthetic paragraph-heavy content.

Treat these numbers as local harness data, not as a blanket claim that every workload is faster. The latest generated snapshot in docs/perf-latest.json records benchmarkVersion, generatedAt, Node version, platform, CPU, CPU count, and commit SHA so results can be reproduced or compared against later runs.

Default API note:

Normal callers should keep using md.parse(src) / md.render(src).
Large finite strings can be handled by the default API via internal large-input optimizations on stock parser instances; plugin/custom-rule instances keep plain full-parse behavior unless chunking is explicitly enabled.
Explicit chunk-stream APIs such as parseIterable / UnboundedBuffer are advanced tools for sources that already arrive as chunks; they are not required to benefit from the default large-text path.

Scenarios:

S1: stream ON, cache OFF, chunk ON (stream + chunked, but reset cache each step)
S2: stream ON, cache ON, chunk OFF (stream append fast-path only)
S3: stream ON, cache ON, chunk ON (hybrid: chunked allowed and append fast-path)
S4: stream OFF, chunk ON (full parse with chunked fallback)
S5: stream OFF, chunk OFF (plain full parse)

Workloads measured per size:

one-shot: single full parse of the entire document
append workload: 1 initial parse + 5 append steps (growing to the target size)

Raw results (ms):

size=5k chars
- one-shot best: S5 0.65ms
- append best: S3 0.79ms (S2: 1.28ms)
size=20k chars
- one-shot best: S5 0.94ms
- append best: S2 2.03ms (S3: 2.58ms)
size=50k chars
- one-shot best: S5 2.72ms
- append best: S3 2.57ms (S2: 2.81ms)
size=100k chars
- one-shot best: S5 5.25ms
- append best: S3 6.57ms (S2: 6.91ms)

Append fast-path confirmation: With a stable env object, appendHits reached 5 (one per append) for S2/S3 across sizes.

Conclusions

One-shot parsing (no appends):
- For the tested content, plain full parse (S5) was consistently fastest from 5k to 100k chars.
- Chunked (S4) did not outperform full parse on these inputs. It may help on extremely large or fence/blank-line-heavy documents; tune thresholds if you enable it.
Append-heavy editing (growing documents):
- Stream with cache (S2/S3) clearly outperforms non-stream at medium and large sizes.
- Hybrid (S3) is usually as fast or slightly faster than stream-only (S2) for larger docs (≥ 50k), primarily because it can choose chunked on the initial parse when beneficial.
- For smaller docs (~5k–20k), stream-only (S2) can be a tiny bit faster than hybrid (S3), depending on thresholds, but both beat non-stream.

Recommendations

If you parse once (one-shot):
- Default to full parse (S5). Enable full-chunked fallback only after testing on your workload; consider starting thresholds at ~20k chars/400 lines.
If you support live editing with appends:
- Enable stream mode with cache (S2): stream: true and leave streamChunkedFallback: false.
- If initial parses are often large (tens of kB+), enable hybrid (S3): streamChunkedFallback: true with chunk size ~10k chars/200 lines.
Threshold tuning:
- Start with streamChunkSizeChars ≈ 10k, streamChunkSizeLines ≈ 200.
- For full parse chunked fallback, start with fullChunkThresholdChars ≈ 20k, fullChunkThresholdLines ≈ 400, and chunk size 8k–16k chars, 150–250 lines.

Adaptive chunk sizing (default)

Both full and stream chunked fallbacks choose chunk sizes adaptively by default:

Target around 8 chunks (fullChunkTargetChunks / streamChunkTargetChunks), clamped to a practical range.
Effective sizes: ceil(docChars / target) clamped to [8k, 32k] for characters and [150, 350] for lines.
Disable with fullChunkAdaptive: false or streamChunkAdaptive: false and pass fixed *SizeChars/*SizeLines.
Optionally cap the number of chunks with fullChunkMaxChunks.

Notes:

Adaptive sizing reduces the chance of over-chunking at ~100k+ where orchestration cost can dominate.
Even with adaptive sizing, one-shot full parse (S5) can remain faster for some inputs. Validate on your data.

How to reproduce

Build and run the matrix:

npm run build
node scripts/perf-matrix.mjs

Optional: Sweep non-stream chunked settings on your own content:

npm run build
node scripts/full-vs-chunked-sweep.mjs

These scripts print best-per-size summaries and can export JSON by setting PERF_JSON=1.

When publishing or comparing benchmark numbers, include:

Node.js version
CPU and OS/platform
benchmark version
commit SHA for this repository
baseline package versions
content generator or fixture source
warmup/iteration settings from the harness

Baseline: markdown-it (JS) example

For parity, we include the upstream markdown-it as a baseline in the matrix (scenario M1):

import MarkdownIt from 'markdown-it'

const md = new MarkdownIt()
const tokens = md.parse('# Title\n\nHello', {})
const html = md.render('# Title\n\nHello')

See the latest auto-generated numbers in docs/perf-latest.md.

Remark parse (parse-only)

We also include a Remark parser scenario (R1) to compare pure parse throughput. It exercises:

import { unified } from 'unified'
import remarkParse from 'remark-parse'

const u = unified().use(remarkParse)
const tree = u.parse('# Title\n\nHello')

Notes:

This measures parse only (no HTML render). It appears in the perf matrix as R1 when unified and remark-parse are installed.
Install deps once: pnpm add -D unified remark-parse.
Run the matrix as usual: npm run perf:matrix.

Regenerate the report in CI

You can refresh docs/perf-latest.md on demand via GitHub Actions:

Go to your repository on GitHub → Actions → “Perf Report” → “Run workflow”.
Optional inputs:
- ref: branch/tag/SHA to run against (defaults to current branch)
- node-version: Node.js version (default 20)
- package-manager: pnpm or npm (default pnpm)

The workflow will install deps, run perf:generate, upload the files as an artifact, and commit/push docs/perf-latest.md and docs/perf-latest.json if they changed.

Chinese version (zh-CN):

Run “Perf Report (zh-CN)” workflow. It executes perf:generate:zh and updates docs/perf-latest.zh-CN.md similarly.