xjb: Fast Float to String Algorithm

May 8, 2026 · View on GitHub

xjb is a high-performance algorithm for converting floating-point numbers (float/double) to strings, designed for speed and correctness. It provides single-file implementations for both general use and JSON-specific formatting.

Background

In the field of floating-point number printing, Steele & White established several key principles in their seminal paper "How to Print Floating-Point Numbers Accurately":

Information Preservation: The conversion must produce a decimal string that, when parsed back, yields the original floating-point number or the closest possible value.
Shortest Representation: Use the minimal number of digits necessary to uniquely identify the floating-point number.
Proper Rounding: When multiple representations are equally valid, choose the one that rounds correctly.

1. Key Features

High Performance: Optimized algorithm with low branch prediction failure rate and SIMD acceleration
Correctness: Satisfies RFC8259 and ECMAScript standards for JSON number formatting
Small Footprint: Compact lookup tables for both float and double conversions
Versatility: Support for both float (IEEE754-binary32) and double (IEEE754-binary64) types
Modern Design: Low instruction dependency and high IPC (Instructions Per Cycle)

2. Implementation Files

2.1 Core Implementations

General float/double to string: src/ftoa.cpp
JSON-specific formatting: src/ftoa_json.cpp (satisfies RFC8259 and ECMAScript standards)

2.2 Algorithm Variants

Float to Decimal Conversion Algorithms

xjb32: For float (IEEE754-binary32) - bench/xjb/float_to_decimal/xjb32_i.cpp
xjb64: For double (IEEE754-binary64) - bench/xjb/float_to_decimal/xjb64_i.cpp

Float to String Conversion Algorithms

Full lookup table: bench/xjb/float_to_string/ftoa.cpp
Compressed lookup table: bench/xjb/float_to_string/ftoa_comp.cpp

2.3 Lookup Table Sizes

Type	Full Table (ftoa.cpp)	Compressed Table (ftoa_comp.cpp)
float	1352 bytes	64 bytes
double	15336 bytes	368 bytes

3. Usage Examples

Here are some examples of how the double to string algorithm handles various inputs:

Input	Output
123.45	"123.45"
1000	"1000.0"
123	"123.0"
123000	"123000.0"
1.2e23	"1.2e23"
1e100	"1e100"
0.0123	"0.0123"
0.001	"0.001"
0.000123	"0.000123"
1.2e-08	"1.2e-08"
0	"0.0"
NaN	"nan"
Inf	"inf"

4. Algorithm Characteristics

xjb is based on the schubfach algorithm and inspired by other state-of-the-art algorithms like yy, dragonbox, and grisu. Key characteristics include:

Dual Precision Support: Handles both float and double types
Optimized Process: Streamlined algorithm flow for maximum performance
Low Branch Misprediction: Designed to minimize pipeline stalls
SIMD Acceleration: Leverages SIMD instruction sets for faster processing
High Instruction Level Parallelism: Low dependency between instructions
Efficient Instruction Count: Minimizes the number of operations

5. Algorithm Principle

The following algorithmic process is derived from a variant of the Schubfach algorithm.

algorithm : The xjb Algorithm for Float-to-Decimal Conversion

Input : $c,q$ ; Floating-point components $c$ (significand) and $q$ (exponent)
Output : $d,k$ ; Decimal representation $d \cdot 10^k$ satisfying the SW principle

$c \cdot 2^q \gets v$
$\text{if } v \text{ is regular}$
$\quad k \gets \lfloor q \cdot \log_{10}(2) \rfloor$
$\text{else}$
$\quad k \gets \lfloor q \cdot \log_{10}(2) - \log_{10}(4/3) \rfloor$
$\text{end if}$
$m \gets \lfloor v \cdot 10^{-k-1} \rfloor$
$n \gets v \cdot 10^{-k-1} - m$
$ten \gets 10m$
$\delta \gets 10n - \lfloor 10n \rfloor$ // fractional part of $10n$
$\text{if } \delta = 0.5$
$\quad \text{if } \lfloor 10n \rfloor \bmod 2 = 0$
$\quad\quad one \gets \lfloor 10n \rfloor$ // round to even
$\quad \text{else}$
$\quad\quad one \gets \lfloor 10n \rfloor + 1$
$\quad \text{end if}$
$\text{elif } \delta < 0.5$
$\quad one \gets \lfloor 10n \rfloor$ // round to nearest
$\text{else}$
$\quad one \gets \lfloor 10n \rfloor + 1$ // round to nearest
$\text{end if}$
$\text{if } v \text{ is irregular}$
$\quad \text{if } \delta > 2^{q-2} \cdot 10^{-k}$
$\quad\quad one \gets \lfloor 10n \rfloor + 1$
$\quad \text{end if}$
$\quad \text{if } 2^{q-2} \cdot 10^{-k-1} \geq n$
$\quad\quad one \gets 0$
$\quad \text{end if}$
$\text{else}$
$\quad \text{if } 2^{q-1} \cdot 10^{-k-1} > n \text{ or } \left( 2^{q-1} \cdot 10^{-k-1} = n \text{ and } c \bmod 2 = 0 \right)$
$\quad\quad one \gets 0$ // minimum length
$\quad \text{end if}$
$\text{end if}$
$\text{if } 2^{q-1} \cdot 10^{-k-1} > 1-n \text{ or } \left( 2^{q-1} \cdot 10^{-k-1} = 1-n \text{ and } c \bmod 2 = 0 \right)$
$\quad one \gets 10$ // minimum length
$\text{end if}$
$d \gets ten + one$ // information preservation
$\text{return } d, k$

Algorithm Principle Introduction Document : xjb
Note : There is a mistake in Algorithm 1 on page 10 of the article. Please refer to the above implementation.

Our algorithm produced the same output results as the Schubfach algorithm. To verify the correctness of the algorithm, run the following command:

cd bench
make check

6. Benchmark Results

Benchmark tests are located in the bench directory. Build the benchmark program with make and run bench/main.cpp to test performance.

Latest benchmark results (as of 2026.4.13 v1.5.0) on various CPUs:

on Apple M5:

ramdom float on Apple M5
compiler: apple clang 21.0.0

ramdom double on Apple M5
compiler: apple clang 21.0.0

fixed-length double on Apple M5
compiler: apple clang 21.0.0

on Apple M1:

ramdom float on Apple M1
compiler: apple clang 21.0.0

ramdom double on Apple M1
compiler: apple clang 21.0.0

fixed-length double on Apple M1
compiler: apple clang 21.0.0

on AMD R7-7840H:

ramdom float on AMD R7-7840H
compiler: icpx 2025.0.4

ramdom double on AMD R7-7840H
compiler: icpx 2025.0.4

fixed-length double on AMD R7-7840H
compiler: icpx 2025.0.4

For detailed benchmark results and charts, please refer to the bench directory.

Algorithm	Description
schubfach	Raffaello Giulietti's algorithm. Original source: https://github.com/c4f7fcce9cb06515/Schubfach
schubfach_xjb	Improved version of schubfach with identical output results
ryu	Ulf Adams's Ryū algorithm
dragonbox	Junekey Jeon's C++ implementation
fmt	Victor Zverovich's formatting library (version: 12.1.0)
yy_double	yy's algorithm from https://github.com/ibireme/c_numconv_benchmark/blob/master/vendor/yy_double/yy_double.c
yy_json	yy's algorithm for JSON
teju	Algorithm with upcoming academic paper for correctness proof
xjb	XiangJunBo's algorithm (this project)
schubfach_vitaut	Vitaut's implementation of schubfach
zmij	Vitaut's algorithm
jnum	Jing Leng's algorithm
uscale	Russ Cox's algorithm ,document

8. Acknowledgements

Special thanks to the following contributors and inspirations:

Yaoyuan Guo (@ibireme) - Author of yyjson and yy_double algorithms, provided benchmark data and test code. This project was inspired by the yy algorithm.
Dougall Johnson (@dougallj) - Authored the NEON implementation used in xjb.
Daniel Lemire (@lemire) - Authored the AVX512IFMA implementation for converting integers to decimal strings, used in xjb.
Raffaello Giulietti (@rgiulietti) - Author of the Schubfach algorithm, which forms the foundation of xjb.
Victor Zverovich (@vitaut) - Author of the zmij algorithm and fmt library. Parts of this implementation are derived from zmij.

9. Projects Using xjb

ssrJSON - A SIMD-boosted high-performance Python JSON parsing library
jsoniter-scala - Scala macros for compile-time generation of fast JSON codecs