runtime-benchmarks

May 20, 2026 ยท View on GitHub

Benchmarks to compare the performance of async runtimes / executors.

An interactive view of the full results dataset is available at: https://fleetcode.com/runtime-benchmarks/

Results summary table (64 cores / 64 threads):

RuntimecitorlibforkTooManyCookstbbtaskflowcppcorocorosHPXconcurrencpplibcoro
Mean Ratio to Best
(lower is better)
1.06x1.18x1.32x3.25x3.34x3.67x4.83x186.81x258.19x2513.96x
skynet24932 us36519 us37949 us142887 us185019 us144907 us94532 us11965484 us12514888 us119686148 us
nqueens80765 us69010 us70248 us125730 us114721 us162773 us741141 us3452638 us8000700 us33347402 us
fib(39)49966 us62367 us86731 us215592 us156590 us269384 us187136 us10789290 us20663744 us238019884 us
matmul(2048)48069 us44140 us44822 us50127 us50173 us49684 us46780 us59066 us57383 us374412 us
RuntimeTooManyCookscobaltlibcorocppcoro
Mean Ratio to Best
(lower is better)
1.00x1.16x1.42x1.55x
io_socket_st339340 us393036 us483074 us524717 us
RuntimeTooManyCooks_mtTooManyCooks_st_asiolibcoro_mtcobalt_st_asio
Mean Ratio to Best
(lower is better)
1.00x1.00x1.05x2.33x
channel390771 us391661 us409967 us910778 us

Peak Memory Usage (Max RSS) (64 cores / 64 threads):

RuntimecitorlibforkTooManyCooksTooManyCooks_st_asioTooManyCooks_mttbbtaskflowcppcorocorosconcurrencppHPXlibcorolibcoro_mtcobalt_st_asiocobalt
skynet20.74 MB11.36 MB13.99 MBN/AN/A14.29 MB10.91 MB134.09 MB10.94 MB11.38 MB24.87 GB14.66 GBN/AN/AN/A
nqueens51.01 MB10.51 MB14.12 MBN/AN/A12.66 MB10.88 MB134.09 MB9.18 MB13.04 MB11.23 GB5.03 GBN/AN/AN/A
fib(39)44.5 MB11.52 MB12.62 MBN/AN/A11.69 MB9.1 MB134.13 MB9.44 MB11.29 MB15.59 GB15.81 GBN/AN/AN/A
matmul(2048)68.8 MB60.43 MB60.07 MBN/AN/A59.35 MB56.17 MB186.36 MB58.55 MB61.16 MB103.89 MB56.37 MBN/AN/AN/A
io_socket_stN/AN/A10.62 MBN/AN/AN/AN/A9.36 MBN/AN/AN/A10.91 MBN/AN/A10.95 MB
channelN/AN/AN/A25.76 MB32.99 MBN/AN/AN/AN/AN/AN/AN/A9.95 MB7.68 MBN/A
Click to view the machine configuration used in the summary tables
  • Processor: EPYC 7742 64-core processor
  • Worker Thread Count: 64 (no SMT)
  • OS: Debian 13 Server
  • Compiler: Clang 21.1.7 Release (-O3 -march=native)
  • CPU boost enabled / schedutil governor
  • Linked against libtcmalloc_minimal.so.4

What's covered?

Currently only includes C++ frameworks, and several recursive fork-join benchmarks:

  • recursive fibonacci (forks x2)
  • skynet (original link) but increased to 100M tasks (forks x10)
  • nqueens (forks up to x14)
  • matmul (forks x4)

As well as some miscellaneous benchmarks:

  • channel - tests the performance of the library's async MPMC queue
  • io_socket_st - tests TCP ping-pong between a single-threaded client and single-threaded server

Benchmark problem sizes were chosen to balance between making the total runtime of a full sweep tolerable (especially on weaker hardware with slower runtimes), and being sufficiently large to show meaningful differentiation between faster runtimes.

How to build and run the benchmarks yourself

Note: if you have issues with a particular runtime, you can simply remove it from line 17 of build_and_bench_all.py to skip it.

Install Dependencies:

  • The build+bench script uses python3. The only Python dependency is libyaml.
  • CMake + Clang 18 or newer
  • libfork and TooManyCooks depend on the hwloc library.
  • TBB benchmarks depend on system installed TBB - see the installation guide here for the newest version or you may be able to find the old version 'libtbb-dev' in your system package manager
  • HPX and boost::cobalt requires Boost 1.82 or newer. You may need to build Boost from source, since cobalt is currently not included in distro packages.
  • A high performance allocator (tcmalloc, jemalloc, or mimalloc) is also recommended. The build script will dynamically link to any of these if they are available.

On Debian/Ubuntu: sudo apt-get install cmake hwloc libhwloc-dev intel-oneapi-tbb-devel libtcmalloc-minimal4

On MacOS: brew install cmake gperftools hwloc libyaml tbb

Get Quick Results (uses threads = #CPUs):

NOTE: If a particular library or benchmark fails to build or run, don't worry - its output will simply be ignored.

python3 ./build_and_bench_all.py

Results will appear in RESULTS.md and RESULTS.csv files.

Get Full Results (sweeps threads from 1 to #CPUs):

python3 ./build_and_bench_all.py full

Results will also appear in RESULTS.json file; this file can be parsed by the interactive benchmarks site. A locally viewable version of this HTML chart will be generated as well.

Benchmark a Single Runtime (sweeps threads from 1 to #CPUs):

git-ref can be a SHA, tag, or branch:

  ./build_and_bench_all.py <runtime> [git-ref]
  ./build_and_bench_all.py compare <runtime> <new-git-ref> [baseline-git-ref]

Future Plans

Frameworks to come:

Benchmarks to come:

  • Some inspiration here