runtime-benchmarks

March 9, 2026 ยท View on GitHub

Benchmarks to compare the performance of async runtimes / executors.

An interactive view of the full results dataset is available at: https://fleetcode.com/runtime-benchmarks/

Results summary table (64 cores / 64 threads):

RuntimelibforkTooManyCookstbbtaskflowcppcorocorosHPXconcurrencpplibcoro
Mean Ratio to Best
(lower is better)
1.00x1.11x2.82x2.98x3.53x4.45x160.93x170.80x2238.69x
skynet39639 us42512 us146884 us200196 us156739 us110734 us14654199 us12085877 us153184034 us
nqueens78579 us83539 us161880 us183805 us186797 us883579 us4498900 us8252158 us43830994 us
fib(39)67668 us84565 us272178 us203514 us438185 us171781 us14550913 us18381070 us305949459 us
matmul(2048)41733 us43626 us62264 us62783 us54275 us50580 us72222 us68116 us465260 us
RuntimeTooManyCooks_st_asioTooManyCooks_mtlibcoro_mtcobalt_st_asio
Mean Ratio to Best
(lower is better)
1.00x1.02x1.55x3.77x
channel365842 us374115 us565826 us1379967 us
RuntimeTooManyCookscobaltcppcorolibcoro
Mean Ratio to Best
(lower is better)
1.00x1.12x1.45x1.48x
io_socket_st393705 us441244 us569703 us582490 us

Peak Memory Usage (Max RSS) (64 cores / 64 threads):

RuntimelibforkTooManyCookstbbtaskflowcppcorocorosconcurrencppHPXlibcorocobalt
skynet10.14 MB11.4 MB12.66 MB7.62 MB134.03 MB9.43 MB11.03 MB24.81 GB16.19 GBN/A
nqueens13.15 MB14.11 MB11.08 MB8.44 MB134.07 MB9.31 MB11.02 MB11.18 GB4.98 GBN/A
fib(39)10.02 MB11.49 MB11.13 MB11.93 MB134.07 MB11.02 MB11.32 MB16.27 GB16.47 GBN/A
matmul(2048)60.88 MB63.63 MB58.52 MB59.4 MB186.33 MB58.47 MB61.1 MB109.14 MB56.14 MBN/A
io_socket_stN/A13.14 MBN/AN/A9.3 MBN/AN/AN/A7.8 MB9.06 MB
channelN/A33.0 MBN/AN/AN/AN/AN/AN/A11.01 MB10.92 MB
Click to view the machine configuration used in the summary tables
  • Processor: EPYC 7742 64-core processor
  • Worker Thread Count: 64 (no SMT)
  • OS: Debian 13 Server
  • Compiler: Clang 21.1.7 Release (-O3 -march=native)
  • CPU boost enabled / schedutil governor
  • Linked against libtcmalloc_minimal.so.4

What's covered?

Currently only includes C++ frameworks, and several recursive fork-join benchmarks:

  • recursive fibonacci (forks x2)
  • skynet (original link) but increased to 100M tasks (forks x10)
  • nqueens (forks up to x14)
  • matmul (forks x4)

As well as some miscellaneous benchmarks:

  • channel - tests the performance of the library's async MPMC queue
  • io_socket_st - tests TCP ping-pong between a single-threaded client and single-threaded server

Benchmark problem sizes were chosen to balance between making the total runtime of a full sweep tolerable (especially on weaker hardware with slower runtimes), and being sufficiently large to show meaningful differentiation between faster runtimes.

How to build and run the benchmarks yourself

Note: if you have issues with a particular runtime, you can simply remove it from line 17 of build_and_bench_all.py to skip it.

Install Dependencies:

  • The build+bench script uses python3. The only Python dependency is libyaml.
  • CMake + Clang 18 or newer
  • libfork and TooManyCooks depend on the hwloc library.
  • TBB benchmarks depend on system installed TBB - see the installation guide here for the newest version or you may be able to find the old version 'libtbb-dev' in your system package manager
  • HPX and boost::cobalt requires Boost 1.82 or newer. You may need to build Boost from source, since cobalt is currently not included in distro packages.
  • A high performance allocator (tcmalloc, jemalloc, or mimalloc) is also recommended. The build script will dynamically link to any of these if they are available.

On Debian/Ubuntu: sudo apt-get install cmake hwloc libhwloc-dev intel-oneapi-tbb-devel libtcmalloc-minimal4

On MacOS: brew install cmake gperftools hwloc libyaml tbb

Get Quick Results (uses threads = #CPUs):

NOTE: If a particular library or benchmark fails to build or run, don't worry - its output will simply be ignored.

python3 ./build_and_bench_all.py

Results will appear in RESULTS.md and RESULTS.csv files.

Get Full Results (sweeps threads from 1 to #CPUs):

python3 ./build_and_bench_all.py full

Results will also appear in RESULTS.json file; this file can be parsed by the interactive benchmarks site. A locally viewable version of this HTML chart will be generated as well.

Future Plans

Frameworks to come:

Benchmarks to come:

  • Some inspiration here