dtoa benchmark

April 30, 2026 · View on GitHub

This project is a rewrite of Milo Yip’s dtoa-benchmark with an updated set of algorithms reflecting the current state of the art and a simplified workflow.

Introduction

This benchmark measures the performance of converting double-precision IEEE-754 floating-point values (double) to ASCII strings. Each implementation exposes a function with the signature:

char* dtoa(double value, char* buffer);

that writes a textual representation of value into buffer and returns a pointer to one past the last written character. The resulting string must round-trip: parsing it back through a correct strtod must yield exactly the original double.

Note: dtoa is not a standard C or C++ function.

Procedure

The benchmark runs in two phases:

  1. Correctness verification. Every implementation is validated against a set of edge cases and 100,000 random double values (excluding ±inf and NaN) to confirm round-trip correctness.

  2. Performance measurement. For each implementation the benchmark runs:

    • 17 per-digit sub-benchmarks. Each converts a pool of 100,000 random double values reduced to a fixed precision of 1–17 significant decimal digits. These produce the time vs. digit count chart.
    • One mixed benchmark over a single shuffled pool containing all 1.7M values from the per-digit pools combined. Its mean time per conversion is reported as the headline Time (ns) in the results table; this is the metric to use for an at-a-glance comparison.

    Iteration counts and statistical stabilization are handled by Google Benchmark.

Build and Run

cmake .
make run-benchmark

Results are written in Google Benchmark's JSON format to:

results/<cpu>_<os>_<compiler>_<commit>.json

and automatically converted to a self-contained HTML report with the same base name. The JSON context block carries CPU/cache info, library version, and commit_hash/machine/os/compiler keys for downstream analysis.

Results

The following results were measured on a MacBook Pro (Apple M1 Pro) using:

  • Compiler: Apple clang version 21.0.0 (clang-2100.0.123.102)
  • OS: macOS
MethodTime (ns)Speedup
zmij6.45115.440x
xjb646.99106.465x
yy24.6330.235x
dragonbox28.9525.723x
fmt36.8420.214x
uscale45.8616.239x
ryu46.0716.164x
to_chars51.3514.503x
schubfach53.6213.889x
double-conversion87.438.518x
sprintf744.721.000x
ostringstream885.300.841x

Time per double (smaller is better): image

ostringstream and sprintf omitted; they are an order of magnitude slower than the rest.

Time vs digit count (log scale): image

Notes

  • null performs no conversion and measures loop + call overhead.
  • sprintf and ostringstream do not generate shortest representations (e.g. 0.10.10000000000000001).
  • ryu, dragonbox, and schubfach always emit exponential notation (e.g. 0.11E-1).

Additional benchmark results are available in the results directory and viewable online:

Methods

MethodDescription
asteriarocket::ascii_numput::put_DD
double-conversionEcmaScriptConverter::ToShortest which implements Grisu3 with bignum fallback
dragonboxjkj::dragonbox::to_chars_n with the full cache table
fmtfmt::format_to with compile-time format strings (uses Dragonbox)
nullno-op implementation; measures benchmark loop overhead
ostringstreamstd::ostringstream with setprecision(17)
ryud2s_buffered
schubfachC++ Schubfach implementation
sprintfC sprintf("%.17g", value)
to_charsstd::to_chars
yyyy_double_to_string from yyjson
zmijzmij::write

Notes

std::to_string is excluded because it does not guarantee round-trip correctness (until C++26).

Why is fast dtoa important?

Floating-point formatting is ubiquitous in text output. Standard facilities such as sprintf and std::stringstream are often slow. This benchmark originated from performance work in RapidJSON.

See Also