AD Library Benchmarks

April 12, 2026 · View on GitHub

Quant-finance benchmarks for C++ automatic differentiation libraries. Heston MC, SABR calibration, XVA CVA, and LIBOR swaption — finite differences, XAD, CppAD, Adept 2, autodiff. Reproducible from source.

Benchmark chart

GCC 13.3, Intel Xeon Platinum 8488C, Ubuntu 24.04, -O3 -mavx2 -mfma, 10K MC paths. Canonical CSV: results/results.csv.

Results

Gradient time — the cost of computing all sensitivities. Bold = fastest in row. The Primal column is the cost of evaluating the same workload once with raw doubles (no AD machinery), so the gap between Primal and each AAD library shows the recording-and-reverse-sweep overhead, and FD ≈ (N+1) × Primal as expected for forward-difference bumping.

#BenchmarkSensisPrimalFDXADXAD-CodegenCppADAdeptautodiff
1Heston MC89.3 ms83 ms40 ms21 ms268 ms91 msDNF
2SABR Calib152.1 ms33 ms8.4 ms4.9 ms32 ms19 ms38 ms
3XVA CVA40591 ms24.1 s2.6 s0.57 s8.0 s7.1 sDNF
4LIBOR Swaption161138 ms21.6 s1.00 s0.31 s4.57 s1.15 sDNF

Median of 10 measured iterations after warmup, reverse-mode for the AAD libraries. The Primal column is the median of the AAD libs' raw-double primal benchmarks, which all run the same underlying numerical kernel as the gradient timings (no RNG cost asymmetries). The sanity check holds: FD/Primal ≈ N+1 across all four rows.

Observations

  • Adjoint AD vs finite differences. FD scales O(N) with input count, so the gap to AAD widens from ~4× on Heston (8 inputs) to ~70× on LIBOR (161 inputs). FD is fine for spot checks but quickly becomes the bottleneck once you need more than a handful of sensitivities.
  • Tape libraries cluster within an order of magnitude. XAD's tape mode is the fastest tape library on every benchmark, by margins of ~1.1× (LIBOR vs Adept) up to 2.3× (Heston vs Adept). CppAD is consistently slowest of the three on the MC benchmarks but stays within an order of magnitude.
  • XAD-Codegen compiles the recorded graph to AVX2 native code at runtime and is roughly 2×–5× faster than XAD's own tape mode on the four benchmarks. Switching from tape to Codegen requires expressing data-dependent branches via xad::less(a,b).If(then,else) so the recorded graph is branch-free; no code changes outside the per-path kernel.
  • autodiff completes only 1 of 4 benchmarks (SABR, using forward dual mode). Forward mode is O(N) in input count and the alternative var reverse mode allocates a fresh heap-based expression tree on every gradient call — neither scales to MC pricing or larger calibrations. autodiff isn't a peer for the AAD workloads benchmarked here.

Libraries

LibraryModes availableRecording approach
XADForward & Adjoint, higher-orderTape-based; optional xad-codegen backend compiles the recorded graph to AVX2 native code
CppADForward & Reverse, higher-orderTape-based ADFun record/replay
Adept 2Forward & ReverseExpression templates with stack recording
autodiffForward (dual) & Reverse (var)Compile-time dual numbers / runtime expression tree

All four libraries support both forward and reverse modes; the suite exercises reverse mode, the standard choice for many-inputs/one-output workloads such as risk and pricing.

Build & run

cmake -B build -GNinja -DCMAKE_BUILD_TYPE=Release
cmake --build build
./build/ad_benchmarks

To enable XAD-Codegen results (xad-codegen is commercially licensed):

cmake -B build -GNinja -DCMAKE_BUILD_TYPE=Release \
  -DXAD_DIR=/path/to/xad \
  -DXAD_CODEGEN_DIR=/path/to/xad-codegen \
  -DENABLE_XAD_JIT=ON

Run ./build/ad_benchmarks --help for CLI options (--paths, --iters, --warmup, --csv, --only, --skip).

To regenerate the chart from a CSV:

python scripts/plot_results.py results/results.csv results/chart.png

Methodology

  • Identical compiler flags across libraries: -O3 -mavx2 -mfma (GCC/Clang) or /O2 /arch:AVX2 /fp:fast (MSVC).
  • Idiomatic APIs. Each library uses its own recommended pattern: XAD reverse-mode tape (xad::adj<double>), CppAD ADFun record/replay, Adept Stack recording, autodiff dual forward mode for SABR. No micro-optimizations applied to one library that wouldn't be applied to another.
  • Median of measured iterations after warmup; warmup excluded.
  • All four libraries' gradients agree with finite differences within numerical tolerance during development.
  • Same machine, same run. Re-running on a different machine scales all rows by roughly the same factor.

Contributing

PRs and issues welcome — especially:

  • More AD libraries (open an issue or PR with a wrapper following the pattern in xad/, cppad/, adept/, autodiff/).
  • More finance kernels (Bermudan/American MC, multi-curve bootstrapping, equity vol surface fitters, real PFE / ECL).
  • Methodology improvements or fairer wirings for any of the existing libraries.

Acknowledgements

The LIBOR swaption benchmark is adapted from Prof. Mike Giles' canonical adjoint LMM C++ code. Thanks to the maintainers of CppAD, Adept 2, and autodiff — a meaningful comparison is only possible because their libraries exist.

License

Copyright © 2026 Xcelerit Computing Ltd. Licensed under the MIT License. See CITATION.cff for citation metadata.