Performance Report: Stream vs Chunked vs Full

May 14, 2026 · View on GitHub

This report summarizes measured performance across five configurations and four document sizes. Benchmarks were run on Node.js using synthetic paragraph-heavy content.

Treat these numbers as local harness data, not as a blanket claim that every workload is faster. The latest generated snapshot in docs/perf-latest.json records benchmarkVersion, generatedAt, Node version, platform, CPU, CPU count, and commit SHA so results can be reproduced or compared against later runs.

Default API note:

  • Normal callers should keep using md.parse(src) / md.render(src).
  • Large finite strings can be handled by the default API via internal large-input optimizations on stock parser instances; plugin/custom-rule instances keep plain full-parse behavior unless chunking is explicitly enabled.
  • Explicit chunk-stream APIs such as parseIterable / UnboundedBuffer are advanced tools for sources that already arrive as chunks; they are not required to benefit from the default large-text path.

Scenarios:

  • S1: stream ON, cache OFF, chunk ON (stream + chunked, but reset cache each step)
  • S2: stream ON, cache ON, chunk OFF (stream append fast-path only)
  • S3: stream ON, cache ON, chunk ON (hybrid: chunked allowed and append fast-path)
  • S4: stream OFF, chunk ON (full parse with chunked fallback)
  • S5: stream OFF, chunk OFF (plain full parse)

Workloads measured per size:

  • one-shot: single full parse of the entire document
  • append workload: 1 initial parse + 5 append steps (growing to the target size)

Raw results (ms):

  • size=5k chars
    • one-shot best: S5 0.65ms
    • append best: S3 0.79ms (S2: 1.28ms)
  • size=20k chars
    • one-shot best: S5 0.94ms
    • append best: S2 2.03ms (S3: 2.58ms)
  • size=50k chars
    • one-shot best: S5 2.72ms
    • append best: S3 2.57ms (S2: 2.81ms)
  • size=100k chars
    • one-shot best: S5 5.25ms
    • append best: S3 6.57ms (S2: 6.91ms)

Append fast-path confirmation: With a stable env object, appendHits reached 5 (one per append) for S2/S3 across sizes.

Conclusions

  • One-shot parsing (no appends):

    • For the tested content, plain full parse (S5) was consistently fastest from 5k to 100k chars.
    • Chunked (S4) did not outperform full parse on these inputs. It may help on extremely large or fence/blank-line-heavy documents; tune thresholds if you enable it.
  • Append-heavy editing (growing documents):

    • Stream with cache (S2/S3) clearly outperforms non-stream at medium and large sizes.
    • Hybrid (S3) is usually as fast or slightly faster than stream-only (S2) for larger docs (≥ 50k), primarily because it can choose chunked on the initial parse when beneficial.
    • For smaller docs (~5k–20k), stream-only (S2) can be a tiny bit faster than hybrid (S3), depending on thresholds, but both beat non-stream.

Recommendations

  • If you parse once (one-shot):

    • Default to full parse (S5). Enable full-chunked fallback only after testing on your workload; consider starting thresholds at ~20k chars/400 lines.
  • If you support live editing with appends:

    • Enable stream mode with cache (S2): stream: true and leave streamChunkedFallback: false.
    • If initial parses are often large (tens of kB+), enable hybrid (S3): streamChunkedFallback: true with chunk size ~10k chars/200 lines.
  • Threshold tuning:

    • Start with streamChunkSizeChars ≈ 10k, streamChunkSizeLines ≈ 200.
    • For full parse chunked fallback, start with fullChunkThresholdChars ≈ 20k, fullChunkThresholdLines ≈ 400, and chunk size 8k–16k chars, 150–250 lines.

Adaptive chunk sizing (default)

Both full and stream chunked fallbacks choose chunk sizes adaptively by default:

  • Target around 8 chunks (fullChunkTargetChunks / streamChunkTargetChunks), clamped to a practical range.
  • Effective sizes: ceil(docChars / target) clamped to [8k, 32k] for characters and [150, 350] for lines.
  • Disable with fullChunkAdaptive: false or streamChunkAdaptive: false and pass fixed *SizeChars/*SizeLines.
  • Optionally cap the number of chunks with fullChunkMaxChunks.

Notes:

  • Adaptive sizing reduces the chance of over-chunking at ~100k+ where orchestration cost can dominate.
  • Even with adaptive sizing, one-shot full parse (S5) can remain faster for some inputs. Validate on your data.

How to reproduce

  • Build and run the matrix:
npm run build
node scripts/perf-matrix.mjs
  • Optional: Sweep non-stream chunked settings on your own content:
npm run build
node scripts/full-vs-chunked-sweep.mjs

These scripts print best-per-size summaries and can export JSON by setting PERF_JSON=1.

When publishing or comparing benchmark numbers, include:

  • Node.js version
  • CPU and OS/platform
  • benchmark version
  • commit SHA for this repository
  • baseline package versions
  • content generator or fixture source
  • warmup/iteration settings from the harness

Baseline: markdown-it (JS) example

For parity, we include the upstream markdown-it as a baseline in the matrix (scenario M1):

import MarkdownIt from 'markdown-it'

const md = new MarkdownIt()
const tokens = md.parse('# Title\n\nHello', {})
const html = md.render('# Title\n\nHello')

See the latest auto-generated numbers in docs/perf-latest.md.

Remark parse (parse-only)

We also include a Remark parser scenario (R1) to compare pure parse throughput. It exercises:

import { unified } from 'unified'
import remarkParse from 'remark-parse'

const u = unified().use(remarkParse)
const tree = u.parse('# Title\n\nHello')

Notes:

  • This measures parse only (no HTML render). It appears in the perf matrix as R1 when unified and remark-parse are installed.
  • Install deps once: pnpm add -D unified remark-parse.
  • Run the matrix as usual: npm run perf:matrix.

Regenerate the report in CI

You can refresh docs/perf-latest.md on demand via GitHub Actions:

  • Go to your repository on GitHub → Actions → “Perf Report” → “Run workflow”.
  • Optional inputs:
    • ref: branch/tag/SHA to run against (defaults to current branch)
    • node-version: Node.js version (default 20)
    • package-manager: pnpm or npm (default pnpm)

The workflow will install deps, run perf:generate, upload the files as an artifact, and commit/push docs/perf-latest.md and docs/perf-latest.json if they changed.

Chinese version (zh-CN):

  • Run “Perf Report (zh-CN)” workflow. It executes perf:generate:zh and updates docs/perf-latest.zh-CN.md similarly.