Real-Time Tuning Guide

May 28, 2026 · View on GitHub

Dora achieves 10-17x lower latency than ROS2 out of the box via zero-copy shared memory and Apache Arrow. For latency-sensitive robotics deployments (1-10ms control loops), additional OS-level tuning can reduce jitter further.

Quick Summary

TuningJitter ReductionDifficulty
CPU frequency governor -> performanceEliminates 50-200us transitionsEasy
CPU affinity (taskset)2-5x (avoids CPU migration)Easy
Memory locking (mlockall / --rt flag)Eliminates 100us+ page faultsEasy
Thread priority (SCHED_FIFO / --rt flag)Deterministic schedulingMedium
PREEMPT_RT kernelSub-100us worst-caseMedium
Disable transparent huge pagesAvoids THP compaction stallsEasy

Daemon --rt Flag

Dora provides an optional --rt flag that enables memory locking and real-time thread priority:

# Start daemon with real-time profile
dora daemon --rt

# Requires CAP_SYS_NICE + CAP_IPC_LOCK (or run as root)
sudo dora daemon --rt

This sets:

  • mlockall(MCL_CURRENT | MCL_FUTURE) — pin all memory, prevent page faults
  • SCHED_FIFO priority 50 — real-time scheduling for the main daemon thread

The setup writes RT: ... success/failure diagnostics to stderr unconditionally — even with --quiet. The flag affects structured log filtering, but a failed mlockall or SCHED_FIFO promotion is an operational warning that must always be visible.

Important: SCHED_FIFO applies only to the main thread (the one calling block_on). Tokio worker threads remain at SCHED_OTHER. For full RT coverage, use taskset + chrt to promote the entire process, or use isolcpus to dedicate cores.

Warning: SCHED_FIFO without guardrails can starve other processes. Ensure /proc/sys/kernel/sched_rt_runtime_us is set (default 950000 = 95%) to prevent total system hang.

Linux Kernel Tuning

CPU Frequency Governor

# Set all CPUs to performance mode (no frequency scaling)
echo performance | sudo tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor

# Verify
cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor

Disable Transparent Huge Pages

echo never | sudo tee /sys/kernel/mm/transparent_hugepage/enabled
echo never | sudo tee /sys/kernel/mm/transparent_hugepage/defrag

PREEMPT_RT Kernel (Optional)

For sub-100us worst-case latency, use a PREEMPT_RT patched kernel:

# Ubuntu 22.04+
sudo apt install linux-lowlatency

# Debian
sudo apt install linux-image-rt-amd64

# Or build from source with CONFIG_PREEMPT_RT=y

Boot Parameters

# /etc/default/grub — add to GRUB_CMDLINE_LINUX:
clocksource=tsc tsc=reliable nohz_full=2-7 isolcpus=2-7 rcu_nocbs=2-7
  • nohz_full=2-7 — disable timer ticks on cores 2-7 (for real-time tasks)
  • isolcpus=2-7 — isolate cores from general scheduler
  • rcu_nocbs=2-7 — offload RCU callbacks from isolated cores

Process-Level Tuning

CPU Affinity

# Pin daemon to cores 2-3 (avoid core 0 for IRQs)
taskset -c 2,3 dora daemon

# Pin a specific node to core 4
taskset -c 4 ./my-node

# Verify affinity
taskset -p <pid>

Real-Time Priority

# Set SCHED_FIFO priority 50 (requires CAP_SYS_NICE)
chrt -f 50 dora daemon

# Set nice level (less aggressive than SCHED_FIFO)
nice -n -20 dora daemon

Memory Locking

# Increase memlock limit (required for SHM + mlockall)
ulimit -l unlimited

# Or in /etc/security/limits.conf:
dora-user  soft  memlock  unlimited
dora-user  hard  memlock  unlimited

Systemd Service Configuration

[Unit]
Description=Dora Daemon
After=network.target

[Service]
Type=simple
ExecStart=/usr/local/bin/dora daemon --rt
User=dora
Group=dora

# CPU affinity: pin to cores 2-3
CPUAffinity=2 3

# Memory locking
LimitMEMLOCK=infinity

# Real-time scheduling (use instead of Nice when --rt is enabled)
LimitRTPRIO=99
# Note: Nice=-20 is ignored under SCHED_FIFO; omit when using --rt.

# Capabilities (instead of running as root)
AmbientCapabilities=CAP_SYS_NICE CAP_IPC_LOCK

# Restart on failure
Restart=always
RestartSec=1

[Install]
WantedBy=multi-user.target

Container Deployment (Docker / Kubernetes)

Docker

docker run \
  --cap-add SYS_NICE \
  --cap-add IPC_LOCK \
  --ulimit memlock=-1:-1 \
  --cpuset-cpus="2,3" \
  dora-image dora daemon --rt

Kubernetes

apiVersion: v1
kind: Pod
spec:
  containers:
    - name: dora-daemon
      command: ["dora", "daemon", "--rt"]
      resources:
        limits:
          cpu: "2"
          memory: "4Gi"
        requests:
          cpu: "2"     # Guaranteed QoS (no CPU throttling)
          memory: "4Gi"
      securityContext:
        capabilities:
          add: ["SYS_NICE", "IPC_LOCK"]

Tokio Runtime Tuning

# Control worker thread count (default: num_cpus)
dora daemon --worker-threads 4

Zenoh Transport Tuning

For distributed deployments, Zenoh transport settings affect latency:

// zenoh-config.json5
{
  "transport": {
    "unicast": {
      "lowlatency": true     // Disable batching for lower latency
    }
  }
}
ZENOH_CONFIG=zenoh-config.json5 dora daemon

Benchmarking Tips

# Pin benchmark to specific cores for reproducible results
taskset -c 2,3 dora run examples/benchmark/dataflow.yml --release

# Disable CPU frequency scaling before benchmarking
echo performance | sudo tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor

# Use --stop-after for fixed-duration runs
dora run examples/benchmark/dataflow.yml --stop-after 30s

What Dora Cannot Guarantee

  • Hard real-time: Dora runs on Linux (not an RTOS). Even with PREEMPT_RT, worst-case latency is microseconds, not nanoseconds.
  • Deterministic allocation: The Rust standard allocator is not real-time safe. Consider jemalloc or mimalloc for more predictable allocation patterns.
  • Interrupt-free execution: Network interrupts, disk I/O, and kernel scheduler decisions can always cause jitter. Use isolcpus and IRQ affinity to minimize.

CI Coverage

Minimum gate (automated): tests/example-smoke.rs::smoke_daemon_rt_emits_setup_messages and smoke_daemon_rt_quiet_still_emits_setup_messages (both #[cfg(unix)]) assert that the RT branch fires and writes its diagnostics to stderr — under both default and --quiet flags. They do not require CAP_IPC_LOCK / CAP_SYS_NICE: the fallback RT: mlockall failed: … / RT: sched_setscheduler failed: … messages are accepted, which is exactly what an unprivileged GHA runner produces. This catches regressions where the RT block is bypassed, silenced, or reordered (the #1701 silent-fallback class of bug).

Full validation (manual Tier 2): Verifying that mlockall actually locked memory and SCHED_FIFO actually promoted the thread is still not automated, for two reasons:

  1. Privilege. --rt needs CAP_SYS_NICE and CAP_IPC_LOCK. GHA runners are unprivileged and refuse sudo for anything that touches scheduling or memory locking.
  2. Kernel config. SCHED_FIFO needs CONFIG_RT_GROUP_SCHED (and ideally a PREEMPT_RT kernel for meaningful guarantees). GHA runners use stock Ubuntu kernels.

A self-hosted runner with an RT-patched kernel could run this, but the cost-benefit is poor until there's evidence of regressions in the RT path. Tracked as #256.

Maintainer Validation Runbook

Run this end-to-end before a release when code in binaries/daemon/src/ or the scheduler/priority setup has changed. Requires root or CAP_SYS_NICE + CAP_IPC_LOCK.

Prerequisites

  • Linux with a reasonably modern kernel (5.x+ fine for soft-RT)
  • sudo or capabilities granted
  • dora CLI installed from the release candidate

Smoke the --rt flag

# 1. Start a coordinator and --rt daemon
dora coordinator &
sudo dora daemon --rt &

# 2. Wait for registration
sleep 3
dora list  # should show no dataflows, daemon registered

# 3. Verify the daemon has mlockall applied.
# VmLck should be non-zero and close to VmRSS if mlockall succeeded.
grep -E '^(VmRSS|VmLck):' /proc/$(pgrep -f 'dora daemon')/status

# 4. Verify SCHED_FIFO on the daemon's main thread.
chrt -p $(pgrep -f 'dora daemon')
# Expected: "scheduling policy: SCHED_FIFO" (not SCHED_OTHER)

Pass criteria:

  • VmLck > 0 (memory actually locked)
  • scheduling policy: SCHED_FIFO (priority class applied)

Smoke cpu_affinity (covered by Tier 1)

Already automated in the cpu-affinity-smoke nightly job — it runs the cpu-affinity-probe fixture and asserts the mask lands on the spawned process. If cpu-affinity-smoke is green, this part is covered. No manual step required.

Scope of --rt: daemon process only

--rt applies mlockall + SCHED_FIFO to the daemon process itself (see binaries/cli/src/command/daemon.rs). It does not propagate SCHED_FIFO to spawned node processes — the spawn path at binaries/daemon/src/spawn/prepared.rs:422 only applies cpu_affinity in the pre_exec hook; there is no sched_setscheduler call on the child.

This is by design: real-time scheduling on arbitrary user code is risky (a tight non-yielding loop under SCHED_FIFO can lock up a core until reboot). Nodes that need SCHED_FIFO must opt in explicitly, for example via thread-priority or a direct libc::sched_setscheduler call inside the node's own init code.

Verify node processes inherit cpu_affinity under --rt

The daemon-level --rt profile is orthogonal to cpu_affinity — both should still apply correctly. Tier 1 cpu-affinity-smoke already covers this for a default daemon; the manual step is to confirm the same under --rt.

pkill -f "dora (coordinator|daemon)" 2>/dev/null; sleep 1
dora coordinator &
sudo dora daemon --rt &
sleep 2
dora start examples/cpu-affinity-probe/dataflow.yml --name rt-affinity --detach
sleep 3
dora logs rt-affinity 2>&1 | grep AFFINITY_MASK
# Expected: AFFINITY_MASK:0,1 (matches cpu_affinity in the fixture)
dora stop --name rt-affinity 2>/dev/null || true
pkill -f "dora (coordinator|daemon)" 2>/dev/null

Pass criterion: AFFINITY_MASK:0,1. If the mask differs or is empty, --rt is breaking the pre_exec affinity hook — regression.

Jitter regression test

Only if the PR touches code that could affect jitter (daemon hot path, send/recv loops, allocator changes).

Both runs must go through an externally-started daemon so the --rt profile actually applies. dora run spawns its own in-process daemon (see binaries/cli/src/command/run.rs) and will silently ignore the background --rt daemon, so use dora start instead.

# Stock run — plain daemon, no --rt
pkill -f "dora (coordinator|daemon)" 2>/dev/null; sleep 1
dora coordinator &
dora daemon &
sleep 2
dora start examples/benchmark/dataflow.yml --name stock-bench --detach
sleep 32   # 30s --stop-after not supported on `dora start`; poll to completion
dora logs stock-bench > /tmp/stock.log 2>&1
dora stop --name stock-bench 2>/dev/null || true
pkill -f "dora (coordinator|daemon)" 2>/dev/null; sleep 1

# --rt run — daemon with the RT profile
dora coordinator &
sudo dora daemon --rt &
sleep 2
dora start examples/benchmark/dataflow.yml --name rt-bench --detach
sleep 32
dora logs rt-bench > /tmp/rt.log 2>&1
dora stop --name rt-bench 2>/dev/null || true
pkill -f "dora (coordinator|daemon)" 2>/dev/null; sleep 1

# Compare p99 latencies. `--rt` should be at least as low, typically 2-5x better.
grep -E 'p99|max' /tmp/stock.log /tmp/rt.log

Flag in the release notes if p99 regressed versus the previous release.

When to skip this runbook

  • Docs-only PR
  • No changes under binaries/daemon/src/spawn/, binaries/daemon/src/lib.rs around the --rt flag, or dora-message descriptor fields that feed into the scheduler

When this becomes automated

This runbook is the fallback until one of:

  1. A self-hosted RT runner lands and runs a weekly RT smoke job
  2. GHA provides a privileged RT-kernel runner class
  3. The --rt feature is removed or superseded

At that point, update this section to reflect the new coverage.