Cpp Serialization Benchmark

March 2, 2026 · View on GitHub

This repository has been forked from the original to add the Lite³ format to the benchmarks.

NameSerialize + DeserializeDeserializeSerializeTraverseDeserialize and traverseMessage size
Cap’n Proto66.55 ms0 ms66.55 ms210.1 ms211 ms50.5093 MB
cereal229.16 ms98.76 ms130.4 ms79.17 ms180.7 ms37.829 MB
Cista++ (offset)913.2 ms274.1 ms639.1 ms79.59 ms80.02 ms176.378 MB
Cista++ (offset slim)3.96 ms0.17 ms3.79 ms79.99 ms80.46 ms25.317 MB
Cista++ (raw)947.4 ms289.2 ms658.2 ms81.53 ms113.3 ms176.378 MB
Flatbuffers1887.49 ms41.69 ms1845.8 ms90.53 ms90.35 ms62.998 MB
Lite³ Buffer API7.79 ms4.77 ms3.02 ms79.39 ms84.92 ms38.069 MB
Lite³ Context API7.8 ms4.76 ms3.04 ms79.59 ms84.13 ms38.069 MB
zpp::bits4.66 ms1.9 ms2.76 ms78.66 ms81.21 ms37.8066 MB

Benchmark data:

This benchmark requires that g++-11 be installed:

sudo apt update
sudo apt install g++-11

To replicate this benchmark, run:

git clone https://github.com/fastserial/cpp-serialization-benchmark.git
cd cpp-serialization-benchmark/
git submodule update --init --recursive
mkdir build
cd build
export CXX=/usr/bin/g++-11
cmake -DCMAKE_BUILD_TYPE=Release ..
make

A single benchmark run can now be performed like so:

./cpp-serialization-benchmark

However to produce more consistent results, CPU frequency scaling should first be disabled to minimize variance:

apt update
apt install linux-cpupower
cpupower frequency-set -g performance
cpupower frequency-info

You should see:

    The governor "performance" may decide which speed to use

The OS can also introduce variance by inconsistent scheduling of threads across NUMA-domains. To prevent this, the process and memory should be pinned. Also, not one but multiple runs will increase the consistency of the results.

This command will perform 10 benchmark runs and write the results to output.txt:

lscpu >> output.txt && \
numactl -H >> output.txt && \
numactl --cpunodebind=0 --membind=0 ./cpp-serialization-benchmark >> output.txt && \
numactl --cpunodebind=0 --membind=0 ./cpp-serialization-benchmark >> output.txt && \
numactl --cpunodebind=0 --membind=0 ./cpp-serialization-benchmark >> output.txt && \
numactl --cpunodebind=0 --membind=0 ./cpp-serialization-benchmark >> output.txt && \
numactl --cpunodebind=0 --membind=0 ./cpp-serialization-benchmark >> output.txt && \
numactl --cpunodebind=0 --membind=0 ./cpp-serialization-benchmark >> output.txt && \
numactl --cpunodebind=0 --membind=0 ./cpp-serialization-benchmark >> output.txt && \
numactl --cpunodebind=0 --membind=0 ./cpp-serialization-benchmark >> output.txt && \
numactl --cpunodebind=0 --membind=0 ./cpp-serialization-benchmark >> output.txt && \
numactl --cpunodebind=0 --membind=0 ./cpp-serialization-benchmark >> output.txt

Original README:

C++ Serialization Benchmark Build Status

This benchmark suite accompanies the public release of the Cista++ serialization library.

This repository contains benchmarks for C++ (binary & high performance) serialization libraries. The goal was to create a benchmark based on a non-trivial data structure. In this case, we serialize, deserialize and traverse a graph (nodes and edges). Since the goal was to have a data structure containing pointers, we choose an "object oriented" representation of a graph instead of a simple adjacency matrix. Some frameworks do no support cyclic data structures. Thus, instead of having node pointers in the edge object, we just reference start and destination node by their index. Benchmarks are based on the Google Benchmark framework.

This repository compares the following C++ binary serialization libraries:

Other Benchmarks

Build & Execute

To run the benchmarks you need a C++17 compatible compiler and CMake. Tested on Mac OS X (but Linux should be fine, too).

git clone --recursive github.com:felixguendling/cpp-serialization-benchmark.git
cd cpp-serialization-benchmark
mkdir build
cd build
cmake -DCMAKE_BUILD_TYPE=Release ..
make
./cpp-serialization-benchmark

Results

LibrarySerializeDeserializeFast DeserializeTraverseDeserialize & TraverseSize
Cap’n Proto76 ms0.00 ms0.0 ms216 ms221 ms50.5M
cereal216 ms111.00 ms-67 ms174 ms37.8M
Cista++ offset4 ms0.16 ms0.0 ms67 ms66 ms25.3M
Cista++ raw650 ms24.80 ms24.8 ms66 ms91 ms176.4M
Flatbuffers1409 ms35.70 ms0.0 ms75 ms75 ms63.0M
zpp_bits4 ms6.58 ms6.6 ms65 ms72 ms37.8M

Cista++ offset describes the "slim" variant (where the edges use indices to reference source and target node instead of pointers).

Exact results can be found here.

Benchmarks were run on Ubuntu 20.04 on an AMD Ryzen 9 5900X, compiled with GCC 11.

Compilation Times

Compilation times are measured with code generation but without building the code generators or static libraries (Cap’n Proto, Flatbuffers).

Libraryclang-7 on Mac OS X
Cap’n Proto0.440s
cereal1.827s
Cista++ raw1.351s
Flatbuffers0.857s

Contribute

You have found a mistake/bug or want to contribute new benchmarks? Feel free to open an issue/pull request! :smiley: