Performance

June 14, 2026 · View on GitHub

Historical snapshot (v1.4). The numbers and suite list below are frozen as captured for v1.4 and are kept for reference. The pipeline has since changed: the core-engine, full-cv, and scalability suites were retired, and current numbers come from the current-speed / comparative / stress pipeline plus the JMH suite. See docs/operations/benchmarks.md.

All numbers below were captured from scripts/run-benchmarks.ps1 — the full local benchmark workflow that built the test classpath once and ran current-speed, comparative, core-engine, full-cv, scalability, and stress suites in sequence. They were captured on a developer laptop; CI machines are typically 1.5–2× slower. The benchmark methodology, profiles, GC stabilisation, percentile rule, and how to compare two runs locally are documented in docs/operations/benchmarks.md.

End-to-end latency

current-speed full profile — 12 warmup + 40 measurement iterations.

ScenarioAvg msp50 msp95 msDocs/sec
engine-simple3.002.734.86333.83
invoice-template17.7417.4425.1356.38
cv-template10.169.9114.0898.46
proposal-template18.2116.9323.5754.91
feature-rich36.0234.1841.7927.76

Per-stage breakdown (median ms per stage)

ScenarioComposeLayoutRenderTotal
invoice-template0.332.555.768.63
cv-template0.272.771.604.72
proposal-template0.349.545.6615.65

Render time is dominated by PDFBox serialization (36–67 % of total), so engine-side optimisations look smaller in the end-to-end avg than they do in the layout column. Page-background injection is a constant 1 fragment per page; column spans, layer stacks, and themes do not change the number of fragments emitted.

Parallel throughput

Invoice template, 12 docs per thread.

ThreadsTotal docsThroughputAvg doc ms
11289.56/s11.17
224143.53/s6.97
448245.26/s4.08
896328.78/s3.04

Near-linear scaling through 4 cores, ~2.7× throughput by 8 threads on a hyper-threaded CPU.

Linear scalability

scalability suite, simple docs.

ThreadsTotal docsThroughput
1100807.41/s
22001,960.75/s
44003,839.64/s
88007,394.56/s
161,60011,164.76/s

13.8× throughput at 16 threads — the engine has no global synchronisation in the hot path.

Stress test

50-thread pool, 5,000 documents, single run:

Successful: 5000
Errors:     0
Time:       2499 ms

~2,000 docs/sec sustained under contention, zero failures.

Comparative benchmark

Simple invoice-class document, 100 measurement iterations.

LibraryAvg msAvg heap MBNotes
iText 51.570.16low-level page primitives
GraphCompose v1.42.450.16semantic DSL + pagination
JasperReports4.450.19XML-template based engine

GraphCompose sits between low-level PDF generators (iText 5) and template engines (JasperReports): close to iText latency on a per-doc basis while exposing a fully semantic Java DSL with deterministic snapshots.

Engine-only timings

The GraphComposeBenchmark and FullCvBenchmark mains below were retired after v1.4. Equivalent timings now come from the CurrentSpeedBenchmark engine-simple scenario and the JMH TemplateCvJmhBenchmark.

  • GraphComposeBenchmark (engine-only, no PDF render): avg 1.04 ms, p50 0.97 ms, p95 1.64 ms.
  • FullCvBenchmark (full CV template, including render): avg 4.14 ms, p50 3.80 ms, p95 6.37 ms.