PROGRESS64

September 11, 2025 ยท View on GitHub

Progress by Design

Purpose

PROGRESS64 is a C library of scalable functions for parallel and concurrent programs. It provides functionality which is often required in multithreaded networking applications. The general goal is to provide primitives which enable scalable and preferably non-blocking concurrent applications.

A secondary purpose is to inform and inspire the use of the C11-based memory model (especially Release Consistency i.e. using load-acquire/store-release) for multithreaded programming.

Functionality

Non-blocking or concurrent functions

NameDescriptionProperties
antireplayreplay protectionlock-free/wait-free
blkringring buffer with blocking enqueue & dequeueblocking
buckringring buffer using pass-the-buck algorithmnon-blocking (1)
buckrobreorder buffer using pass-the-buck algorithmnon-blocking (1)
countershared countersreader obstruction-free, writer wait-free
cuckoohthash table - cuckoo with cellar, one-level movenon-blocking (1)
dequeMichael double ended queuelock-free
hashtablehash table - separate chaining with linked listslock-free
hazardptrsafe object reclamation using hazard pointersreader lock-free, writer blocking/non-blocking
hopscotchhash table - hopscotch with cellarnon-blocking (1)
mcasHarris/Fraser/Pratt multi-word CASlock-free
mcqueueMellor-Crummey concurrent queueblocking
msqueueMichael & Scott queue with configurable ABA workaround (lock/tag/smr)blocking/lock-free
laxrob'lax' reorder buffernon-blocking (1)
lfringring bufferlock-free
lfstacklock-free stack (using tagged pointers) with backoff and progress-in-update flaglock-free
linklistHarris single linked listlock-free
mbtriemulti-bit triereader lock-free/wait-free, writer non-blocking (1)
qsbrsafe object reclamation using quiescent state based reclamationreader wait-free, writer blocking
reassembleIP reassemblylock-free, resizeable
reorder'strict' reorder buffernon-blocking (1)
ringbufclassic ring buffer, support for user-defined element typeblocking & non-blocking (2), lock-free dequeue
stackTreiber stack with configurable ABA workaround (lock/tag/smr/llsc)blocking/lock-free
timertimerslock-free

"Obstruction-free", "lock-free" and "wait-free" have the standard definitions from computer science.

(1) Non-blocking but not (always) linearizable (2) Non-blocking within a limited window (currently 32 elements)

Locks & other blocking functions

NameDescriptionProperties
barrierthread barrierblocking
clhlockCLH queue lockmutex, fcfs, queue
hemlockhemlock queue lockmutex, fcfs, queue
mcslockMCS queue lockmutex, fcfs, queue
pfrwlockphase fair reader/writer lockrw, fcfs
rplockreciprocating lockmutex, queue
rwclhlockreader/writer CLH queue lockrw, fcfs, queue, sleep
rwlockreader/writer lock (writer preference)rw
rwlock_rrecursive version of rwlockrw, recursive
rwsynclightweight reader/writer synchronisation aka 'seqlock' (writer preference)rw
rwsync_rrecursive version of rwsyncrw, recursive
semaphorecounting semaphorerw, fcfs
skiplockskippable ticket-like lockmutex
spinlockbasic CAS-based spin lockmutex
tfrwlocktask fair reader/writer lockrw, fcfs
tfrwlock_rrecursive version of tfrw lockrw, fcfs, recursive
tktlockticket lockmutex, fcfs

"mutex" - mutual exclusion, only one thread at a time can acquire lock.
"rw" - multiple threads may concurrently acquire lock in reader (shared) mode.
"fcfs" - first come, first served. FCFS locks can be considered fair.
"queue" - each waiting thread spins on a separate location. Queue locks generally scale better with high lock contention.
"recursive" - the same thread can re-acquire the lock when it is already acquired.
"sleep" - waiting thread will sleep after spinning has timed out.

Concurrency support

NameDescriptionProperties
coroutinecoroutines using "crosscall"aarch64 only
fiberfibers using "crosscall"aarch64 only

Requirements

  • A C11 compiler (e.g. GCC or Clang) which supports the '__atomic' builtins and inline assembler. Several other GCC'isms are used as well.
  • Several functions require 64-bit and 128-bit atomics (e.g. CAS) support in the hardware.

HW/SW Support

  • Architectures
    • ARMv8/AArch64 (aka ARM64)
    • x86-64
  • Operating Systems
    • Linux
    • Windows
    • macOS (Darwin)

Usage

Use library through the provided C header files. Or copy source files into your own project. Use the 'verify' command to verify all or specific permutation(s) (interleavings) of two threads accessing a datatype.

Notes

  • Hazardptr and QSBR support one reclamation domain only. This is a trade-off that simplifies the API and usage. For QSBR, multiple reclamation domains do make sense...
  • The hazard pointer implementation is non-blocking (wait-free) when a thread has space for more retired objects than the total number of hazard pointers (for all threads).
  • The hazard pointer API will actually use the QSBR implementation when 'nrefs' (number of hazard pointers per thread) is set to 0 when the hazard pointer domain is allocated.
  • The resizeable reassembly function is experimental and has not yet endured stress testing.
  • The mbtrie is experimental and has not yet endured stress testing.
  • The hopscotch hash table is experimental and has not yet endured stress testing.
  • The cuckooht hash table is experimental and has not yet endured stress testing.
  • When using Safe Memory Reclamation as ABA workaround with the Treiber stack, LIFO order is not guaranteed (so not really a LIFO stack...)
  • The skiplock is a simplified version of a ring buffer based ticket-like lock.

TODO

  • Some missing examples
  • Multithreaded stress test programs for e.g. hash tables, reassembly, mbtrie
  • Use C11 features instead of GNU extensions and other non-standard features

License

SPDX BSD-3-Clause

Design

TODO

Author

Ola Liljedahl ola.liljedahl@arm.com

Background

Many of the solutions in PROGRESS64 were created when solving scalability problems in the 3GPP IP Transport function and when contributing to the OpenDataPlane project.