ADR-087: RuVix Cognition Kernel

March 14, 2026 · View on GitHub

Status

Accepted — Phase A Implemented

Date

2026-03-08 (Proposed) 2026-03-14 (Phase A Implemented)

Implementation Status

Phase A: Linux-Hosted Nucleus ✅ COMPLETE

CrateStatusTestsDescription
ruvix-types63Core types: 6 primitives, handles, proof tokens, capabilities
ruvix-region51Memory regions: Immutable, AppendOnly, Slab policies
ruvix-queue47io_uring-style ring buffers, zero-copy IPC
ruvix-cap54seL4-inspired capability manager, derivation trees
ruvix-proof733-tier proof engine: Reflex <100ns, Standard <100μs, Deep <10ms
ruvix-sched39Coherence-aware scheduler with novelty boosting
ruvix-boot595-stage RVF boot loader with ML-DSA-65 signatures
ruvix-vecgraph55Kernel-resident vector/graph stores with HNSW
ruvix-nucleus319Unified kernel: 12 syscalls, checkpoint/replay

Total: 760 tests passing

Security Invariants Implemented

  • SEC-001: Boot signature failure → PANIC (no fallback)
  • SEC-002: Proof cache with 100ms TTL, single-use nonces, max 64 entries
  • SEC-003: Capability delegation depth limit (max 8)
  • SEC-004: TOCTOU protection for zero-copy IPC

Phase B: Bare Metal AArch64 — Pending

Target: QEMU virt, Raspberry Pi 4/5


Phase B: Bare Metal AArch64 Port

B.1 Objectives

Phase B transforms the Linux-hosted nucleus into a true bare metal microkernel running natively on AArch64 hardware. Key objectives:

  1. Remove all Linux/std dependencies — Eliminate libc, pthreads, mmap, and all Linux syscalls
  2. Native AArch64 boot — Boot directly on QEMU virt machine and Raspberry Pi 4/5 hardware
  3. Hardware capability enforcement — Use MMU page tables for region protection and capability boundaries
  4. Real interrupt handling — GIC-400 interrupt controller integration for device drivers
  5. Hardware timer integration — ARM Generic Timer for scheduling and timer_wait syscall

B.2 New Crates

CratePurposeDependenciesLines of Code (Est.)
ruvix-halHardware Abstraction Layer traitsruvix-types~500
ruvix-aarch64AArch64 boot, MMU, exceptionsruvix-hal, ruvix-types~2,000
ruvix-driversPL011 UART, GIC-400, ARM Timerruvix-hal, ruvix-aarch64~1,500
ruvix-physmemPhysical memory allocatorruvix-types, ruvix-aarch64~800

These join the Phase A primitives table:

CrateStatusTestsDescription
ruvix-hal⏳ Phase BTBDHAL traits for UART, timer, interrupt controller, MMU
ruvix-aarch64⏳ Phase BTBDException vectors, MMU tables, boot sequence
ruvix-drivers⏳ Phase BTBDPL011 UART, GIC-400, ARM Timer drivers
ruvix-physmem⏳ Phase BTBDBuddy allocator for physical page frames

B.3 Memory Map (QEMU virt)

The QEMU AArch64 virt machine provides the following memory layout:

RegionStart AddressEnd AddressSizePurpose
Flash0x0000_00000x0800_0000128 MBBoot ROM, RVF package location
GIC Distributor0x0800_00000x0801_000064 KBGIC-400 distributor registers
GIC CPU Interface0x0801_00000x0802_000064 KBGIC-400 CPU interface registers
UART0x0900_00000x0900_10004 KBPL011 UART registers
RTC0x0901_00000x0901_10004 KBPL031 Real-Time Clock
GPIO0x0903_00000x0903_10004 KBPL061 GPIO controller
PCIe Config0x4000_00000x4010_0000256 MBPCIe ECAM configuration space
RAM0x4000_00000x8000_00001 GBMain system memory (default)

Raspberry Pi 4/5 Differences:

  • Base RAM starts at 0x0000_0000 (not 0x4000_0000)
  • GIC base at 0xFF84_0000 (BCM2711 interrupt controller)
  • UART0 at 0xFE20_1000 (mini UART) or UART2 at 0xFE20_1400 (PL011)
  • Device tree required for full peripheral discovery

B.4 Boot Sequence

The bare metal boot sequence consists of five stages:

Stage 1: Assembly Entry (_start)

_start:
    // Executed at EL1 (kernel mode)
    // x0 = DTB address (from bootloader)

    1. Check execution level (ensure EL1)
    2. Set up exception vector table (VBAR_EL1)
    3. Initialize stack pointer (SP_EL1 = kernel_stack_top)
    4. Clear BSS segment (zero-fill static data)
    5. Save DTB address to known location
    6. Jump to Rust entry point (kernel_entry)

Registers on entry:

  • x0 — Device Tree Blob (DTB) address
  • x1-x3 — Reserved (bootloader-specific)
  • PC — Entry point (_start in boot ROM or RAM)

Assembly file: ruvix-aarch64/src/boot.S (~150 lines)

Stage 2: MMU Initialization

Before Rust code can execute, the kernel must set up virtual memory:

// ruvix-aarch64/src/mmu.rs
pub unsafe fn init_mmu() {
    // 1. Identity map kernel code/data (VA = PA)
    //    0x4000_0000 - 0x4010_0000 (16 MB)
    //    Read-only for code, RW for data, no user access

    // 2. Map kernel heap region
    //    VA: 0xFFFF_FF00_0000_0000 (canonical high)
    //    PA: (allocated physical pages)
    //    RW, no user access

    // 3. Map device MMIO regions
    //    GIC, UART, timers — uncacheable, device memory

    // 4. Enable MMU (SCTLR_EL1.M = 1)
}

Page table structure:

  • 4 KB pages, 4-level translation (TTBR0_EL1 for user, TTBR1_EL1 for kernel)
  • Kernel mappings in top canonical half (0xFFFF_FF00_0000_0000+)
  • Physical memory allocator initialized from DTB memory nodes

Stage 3: Exception Vectors and GIC Initialization

// ruvix-aarch64/src/exceptions.rs
#[repr(C, align(2048))]
pub struct ExceptionVectorTable {
    sync_el1_sp0:  extern "C" fn(),  // Synchronous from EL1 using SP_EL0
    irq_el1_sp0:   extern "C" fn(),  // IRQ from EL1 using SP_EL0
    fiq_el1_sp0:   extern "C" fn(),  // FIQ from EL1 using SP_EL0
    serror_el1_sp0: extern "C" fn(), // SError from EL1 using SP_EL0
    // ... (16 total vectors for all EL/SP combinations)
}

pub unsafe fn init_exceptions() {
    let vector_table = &EXCEPTION_VECTORS as *const _ as u64;
    asm!("msr VBAR_EL1, {}", in(reg) vector_table);
}

GIC-400 initialization:

// ruvix-drivers/src/gic400.rs
pub unsafe fn init_gic() {
    // 1. Enable distributor (GICD_CTLR)
    // 2. Configure interrupt priorities (GICD_IPRIORITYR)
    // 3. Set interrupt targets (GICD_ITARGETSR) — all to CPU0
    // 4. Enable CPU interface (GICC_CTLR)
    // 5. Set priority mask (GICC_PMR) — allow all priorities
    // 6. Enable specific interrupts (GICD_ISENABLER)
    //    - UART RX interrupt (IRQ 33)
    //    - Timer interrupt (IRQ 27 for secure physical timer)
}

Stage 4: Kernel Entry and Capability Table Initialization

// ruvix-nucleus/src/main.rs
#[no_mangle]
pub extern "C" fn kernel_entry(dtb_addr: usize) -> ! {
    // 1. Parse device tree to discover memory layout
    let dtb = unsafe { DeviceTree::from_addr(dtb_addr) };
    let memory_nodes = dtb.memory_nodes();

    // 2. Initialize physical memory allocator
    let mut phys_allocator = PhysicalAllocator::new(memory_nodes);

    // 3. Initialize region manager (convert mmap to physical pages)
    let region_mgr = RegionManager::new(&mut phys_allocator);

    // 4. Initialize capability manager
    let cap_mgr = CapabilityManager::new();

    // 5. Create root task with all initial capabilities
    let root_task = Task::new_root(&cap_mgr, &region_mgr);

    // 6. Initialize queue manager
    let queue_mgr = QueueManager::new(&mut phys_allocator);

    // 7. Initialize proof engine
    let proof_engine = ProofEngine::new(&cap_mgr);

    // 8. Initialize vector/graph kernel objects
    let vecgraph_mgr = VecGraphManager::new(&mut phys_allocator, &proof_engine);

    // 9. Initialize scheduler
    let scheduler = Scheduler::new();
    scheduler.add_task(root_task);

    // 10. Load boot RVF and jump to first component
    load_boot_rvf_and_start(&cap_mgr, &region_mgr, &queue_mgr);

    // 11. Enter scheduler loop (never returns)
    scheduler.run()
}

Stage 5: First RVF Component Load

// ruvix-boot/src/loader.rs
pub fn load_boot_rvf_and_start(
    cap_mgr: &CapabilityManager,
    region_mgr: &RegionManager,
    queue_mgr: &QueueManager,
) {
    // 1. Read RVF from flash at 0x0000_0000
    let rvf_bytes = unsafe { read_flash(0x0000_0000, RVF_MAX_SIZE) };

    // 2. Verify ML-DSA-65 signature (Stage 1 from Phase A)
    let manifest = rvf::verify_and_parse(&rvf_bytes)
        .expect("Boot RVF signature verification failed");

    // 3. Create regions per memory schema
    for region_spec in manifest.memory_schema {
        region_mgr.create_region(region_spec.size, region_spec.policy);
    }

    // 4. Load WASM components into executable regions
    for component in manifest.components {
        let code_region = region_mgr.map_immutable(&component.wasm_bytes);
        // WASM runtime initialization happens here (Wasmtime or wasm-micro-runtime)
    }

    // 5. Wire queues per manifest
    for queue_spec in manifest.queue_wiring {
        queue_mgr.create_queue(queue_spec.size, queue_spec.schema);
    }

    // 6. Spawn initial tasks
    for entry_point in manifest.entry_points {
        let task = Task::spawn(
            entry_point.component_id,
            entry_point.caps,
            TaskPriority::Normal,
            None, // no deadline
        );
        scheduler.add_task(task);
    }

    // 7. Emit boot attestation
    kernel.attest_emit(
        &AttestPayload::Boot {
            rvf_hash: manifest.hash,
            timestamp: timer.now(),
        },
        &ProofToken::trusted_boot(),
    );
}

B.5 Security Model Enhancements

MMU-Enforced Capability Boundaries

In Phase A (Linux-hosted), region protection relied on mprotect(). In Phase B, the kernel directly controls page table entries:

// ruvix-region/src/region.rs (Phase B version)
impl RegionManager {
    pub fn map_region(&mut self, policy: RegionPolicy, size: usize) -> Result<RegionHandle> {
        let phys_pages = self.phys_allocator.allocate_pages(pages_for(size))?;

        let pte_flags = match policy {
            RegionPolicy::Immutable => {
                // User-accessible, read-only, cacheable
                PTE_USER | PTE_RO | PTE_CACHEABLE
            }
            RegionPolicy::AppendOnly { .. } => {
                // Kernel-only, write-append enforced via capability checks
                PTE_KERNEL_RW | PTE_CACHEABLE
            }
            RegionPolicy::Slab { .. } => {
                // Kernel-only, full RW
                PTE_KERNEL_RW | PTE_CACHEABLE
            }
        };

        // Map into kernel address space with appropriate permissions
        let virt_addr = self.mmu.map_pages(phys_pages, pte_flags)?;

        Ok(RegionHandle {
            virt_addr,
            phys_pages,
            policy,
            size,
        })
    }
}

Page table entry flags:

const PTE_VALID:      u64 = 1 << 0;   // Valid entry
const PTE_USER:       u64 = 1 << 6;   // User-accessible (EL0)
const PTE_RO:         u64 = 1 << 7;   // Read-only (AP[2])
const PTE_KERNEL_RW:  u64 = 0 << 6;   // Kernel-only, RW
const PTE_CACHEABLE:  u64 = 0b11 << 2; // Normal memory, write-back
const PTE_DEVICE:     u64 = 0b00 << 2; // Device memory, strongly-ordered

EL1/EL0 Separation for Kernel/User

  • EL1 (Kernel mode): All kernel code, syscall handlers, interrupt handlers, scheduler
  • EL0 (User mode): All RVF components, WASM runtime, AgentDB, RuView

Syscalls trigger synchronous exceptions (SVC instruction) and transition EL0 → EL1. The exception handler validates capabilities before dispatching to syscall implementation.

// ruvix-aarch64/src/syscall.rs
#[no_mangle]
pub extern "C" fn svc_handler(syscall_num: u64, args: &SyscallArgs) -> SyscallResult {
    let current_task = scheduler::current_task();

    match syscall_num {
        0 => task_spawn(current_task, args),
        1 => cap_grant(current_task, args),
        2 => region_map(current_task, args),
        // ... (12 total syscalls)
        _ => Err(KernelError::InvalidSyscall),
    }
}

Secure Monitor Calls for Attestation

On hardware with Arm TrustZone (Raspberry Pi 4/5), the kernel can invoke Secure Monitor calls (SMC instruction) to request cryptographic operations from the Trusted Execution Environment (TEE):

// ruvix-aarch64/src/smc.rs
pub fn secure_hash(data: &[u8]) -> [u8; 32] {
    // SMC call to Secure World for hardware-backed SHA-256
    let result = unsafe {
        smc_call(SMC_HASH_SHA256, data.as_ptr(), data.len())
    };
    result.hash
}

pub fn secure_sign_attestation(attestation: &ProofAttestation) -> Signature {
    // SMC call to Secure World for ML-DSA-65 signing using device key
    unsafe {
        smc_call(SMC_SIGN_MLDSA65, attestation as *const _, size_of::<ProofAttestation>())
    }.signature
}

Use cases:

  • Boot attestation signed by device-unique key (burned into eFUSE)
  • Witness log entries signed by TEE to prevent kernel compromise from tampering
  • Remote attestation for distributed RuVix mesh (Demo 5)

B.6 Build Path (Weeks 19-42)

WeekMilestoneDeliverablesTests
19-20AArch64 bootstrapboot.S, exception vectors, minimal UART outputQEMU boots, prints "RuVix"
21-22MMU setup4-level page tables, identity + kernel mappingsVirtual memory functional
23-24Physical allocatorBuddy allocator for 4KB pages from DTBUnit tests, fuzz tests
25-26Region → MMURegionManager using page tables, not mmapPhase A region tests pass
27-28GIC-400 driverInterrupt enable/disable, IRQ routingTimer IRQ delivered to kernel
29-30UART driverPL011 TX/RX with interrupt supportConsole I/O functional
31-32ARM Timer driverGeneric Timer for timer_wait, scheduler ticksDeadline scheduling works
33-34Queue → interruptsDevice interrupts delivered as queue messagessensor_subscribe functional
35-36WASM runtime portWasmtime or wasm-micro-runtime on bare metal"Hello World" WASM executes
37-38Scheduler on hardwareCoherence-aware scheduler using timer IRQsMulti-task preemption works
39-40RVF boot on QEMUFull boot sequence from flash RVFPhase A acceptance test (QEMU)
41-42Raspberry Pi 4 portDevice tree parsing, BCM2711 peripheralsAcceptance test (real hardware)

B.7 Testing Strategy

QEMU Integration Tests

# ruvix-test/qemu_integration.sh
qemu-system-aarch64 \
  -machine virt,gic-version=3 \
  -cpu cortex-a72 \
  -m 1G \
  -kernel target/aarch64-unknown-none/release/ruvix-kernel \
  -drive file=boot.rvf,format=raw,if=pflash \
  -nographic \
  -semihosting-config enable=on,target=native

Test cases:

  1. Boot sequence completes without panic
  2. MMU maps kernel and user regions correctly
  3. Timer interrupt fires at 100 Hz
  4. UART TX/RX works (loopback test)
  5. GIC delivers interrupts to correct handlers
  6. RVF signature verification passes/fails as expected
  7. Phase A acceptance test (vector mutation + replay)

Raspberry Pi 4 Hardware Tests

# Copy kernel to SD card FAT32 partition
cp target/aarch64-unknown-none/release/ruvix-kernel /media/boot/kernel8.img
cp boot.rvf /media/boot/

# config.txt
arm_64bit=1
kernel=kernel8.img
enable_uart=1

Test cases:

  1. UART console output appears on GPIO 14/15
  2. LED blink test using GPIO driver
  3. USB keyboard input via queue subscription
  4. Network packet reception via queue (if Ethernet driver implemented)
  5. Thermal monitoring via RPi-specific sensors

B.8 Known Limitations

  1. No symmetric multiprocessing (SMP) — Phase B targets single-core. Multi-core support requires per-CPU interrupt handling and scheduler affinity (future ADR).

  2. No DMA — Device drivers use programmed I/O. DMA requires IOMMU configuration and physical address constraints (future enhancement).

  3. No floating-point in kernel — AArch64 NEON/SVE disabled in kernel mode. Vector distance computations may use integer quantized formats or userspace WASM components (decision pending).

  4. Fixed memory layout — No dynamic memory expansion. All regions declared at boot in RVF manifest.

  5. No power management — No CPU idle states, frequency scaling, or suspend/resume. Target: always-on edge devices.


Deciders

ruv

  • ADR-029 RVF canonical binary format
  • ADR-030 RVF cognitive container / self-booting vector files
  • ADR-042 Security RVF AIDefence TEE
  • ADR-047 Proof-gated mutation protocol
  • ADR-014 Coherence engine architecture
  • ADR-061 Reasoning kernel architecture
  • ADR-006 Unified memory pool and paging strategy
  • ADR-005 WASM runtime integration
  • ADR-032 RVF WASM integration
  • crates/cognitum-gate-kernel/ — no_std WASM coherence kernel
  • crates/ruvector-verified/ — ProofGate, ProofEnvironment, attestation chain
  • crates/rvf/ — RVF format implementation
  • crates/ruvector-core/ — HNSW vector database
  • crates/ruvector-graph-transformer/ — graph mutation substrate

1. Context

1.1 The Problem with Conventional Operating Systems

Every major operating system today — Linux, Windows, macOS, seL4, Zephyr — was designed for a world where the primary compute actor is a human being operating through a process abstraction. The process model assumes:

  1. A single sequential instruction stream per thread
  2. File-based persistent state (byte streams with names)
  3. POSIX IPC semantics (pipes, sockets, signals)
  4. Discretionary or mandatory access control based on user identity
  5. A scheduler optimized for interactive latency or batch throughput

None of these assumptions hold for agentic workloads. An AI agent does not think in files. It thinks in vectors, graphs, proofs, and causal event streams. It does not need fork/exec. It needs capability-gated task spawning with proof-of-intent. It does not communicate through byte pipes. It communicates through typed semantic queues where every message carries a coherence score and a witness hash.

Running agentic workloads on Linux is like running a modern web application on a mainframe batch scheduler — technically possible, structurally wrong.

1.2 What RuVector Already Provides

The RuVector ecosystem (107 crates, 50+ npm packages) has incrementally built every primitive needed for a cognition kernel, but scattered across userspace libraries:

  • RVF (ADR-029): A self-describing binary format with segments for vectors, graphs, WASM microkernels, cryptographic witnesses, and TEE attestation quotes.
  • Cognitum Gate Kernel (cognitum-gate-kernel): A 256-tile no_std WASM coherence fabric operating on mincut partitions.
  • Proof-Gated Mutation (ADR-047): ProofGate<T> enforcing "no proof, no mutation" at the type level with 82-byte attestation witnesses.
  • Coherence Engine (ADR-014): Structural consistency scoring that replaces probabilistic confidence with graph-theoretic guarantees.
  • Cognitive Containers (ADR-030): Self-booting RVF files that carry their own execution kernel, enabling single-file microservices.
  • Reasoning Kernel (ADR-061): A brain-augmented reasoning protocol with witnessable artifacts at every step.
  • RVF Security Hardening (ADR-042): TEE attestation, AIDefence layers, EBPF policy enforcement, and witness chain audit.

These primitives exist. They work. But they run on top of Linux, mediated by POSIX, paying the abstraction tax at every boundary. RuVix promotes them to first-class kernel resources.

1.3 Why Not Just Use seL4/Zephyr/Unikernel

seL4 proves that a capability kernel with formal verification is viable (8,700 lines of C, fully verified). But seL4 has no concept of vectors, graphs, coherence, or proofs. Adding these would require reimplementing the entire RuVector stack as userspace servers communicating through IPC — reintroducing the overhead we want to eliminate.

Zephyr/FreeRTOS target microcontrollers with cooperative/preemptive scheduling. They have no memory protection, no capability model, and no concept of attestation.

Unikernels (Hermit, Unikraft) eliminate the OS/application boundary but retain POSIX semantics. They make Linux faster, not different.

RuVix is different: it has six kernel primitives, twelve syscalls, and every mutation is proof-gated. Everything else — including the entire AgentDB intelligence runtime, Claude Code adapters, and RuView perception pipeline — lives above the kernel in RVF component space.


2. Decision

2.1 Core Thesis

RuVix is a cognition kernel. It is not a general-purpose operating system. It has exactly six kernel primitives:

PrimitivePurposeAnalog
TaskUnit of concurrent execution with capability setseL4 TCB
CapabilityUnforgeable typed token granting access to a resourceseL4 capability
RegionContiguous memory with access policy (immutable, append-only, slab)seL4 Untyped + frame
QueueTyped ring buffer for inter-task communicationio_uring SQ/CQ
TimerDeadline-driven scheduling primitivePOSIX timer_create
ProofCryptographic attestation gating state mutationNovel (from ADR-047)

Everything else — file systems, networking, device drivers, vector indexes, graph engines, AI inference — is an RVF component running in user space, communicating through queues, accessing resources through capabilities.

2.2 Architecture Overview

┌─────────────────────────────────────────────────────────────────────┐
│                      AGENT CONTROL PLANE                           │
│  Claude Code │ Codex │ Custom Agents │ AgentDB Planner Runtime     │
├─────────────────────────────────────────────────────────────────────┤
│                      RVF COMPONENT SPACE                           │
│  ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐              │
│  │ RuView   │ │ AgentDB  │ │ RuVLLM   │ │ Network  │ ...          │
│  │ Percep.  │ │ Intelli. │ │ Infer.   │ │ Stack    │              │
│  └────┬─────┘ └────┬─────┘ └────┬─────┘ └────┬─────┘              │
│       │queue        │queue       │queue        │queue               │
├───────┴─────────────┴────────────┴─────────────┴───────────────────┤
│                      RUVIX COGNITION KERNEL                        │
│                                                                     │
│  ┌────────────────┐  ┌────────────────┐  ┌────────────────────┐    │
│  │ Capability Mgr │  │ Queue IPC      │  │ Coherence-Aware    │    │
│  │ (cap_grant,    │  │ (queue_send,   │  │ Scheduler          │    │
│  │  cap_revoke)   │  │  queue_recv)   │  │ (deadline+novelty  │    │
│  │                │  │  io_uring ring │  │  +structural risk) │    │
│  └────────────────┘  └────────────────┘  └────────────────────┘    │
│  ┌────────────────┐  ┌────────────────┐  ┌────────────────────┐    │
│  │ Region Memory  │  │ Proof Engine   │  │ Vector/Graph       │    │
│  │ (slabs, immut, │  │ (attest_emit,  │  │ Kernel Objects     │    │
│  │  append-only)  │  │  proof_verify) │  │ (vector_get/put,   │    │
│  │                │  │                │  │  graph_apply)      │    │
│  └────────────────┘  └────────────────┘  └────────────────────┘    │
│                                                                     │
│  ┌────────────────────────────────────────────────────────────┐    │
│  │ RVF Boot Loader — mounts signed RVF packages as root      │    │
│  └────────────────────────────────────────────────────────────┘    │
├─────────────────────────────────────────────────────────────────────┤
│                      HARDWARE / HYPERVISOR                          │
│  AArch64 (primary) │ x86_64 (secondary) │ WASM (hosted)           │
└─────────────────────────────────────────────────────────────────────┘

3. Syscall Surface

RuVix exposes exactly 12 syscalls. This is a hard architectural constraint. New functionality is added through RVF components, not new syscalls.

3.1 Syscall Table

/// The complete RuVix syscall interface.
/// No syscall may be added without an ADR amendment and ABI version bump.

// --- Task Management ---

/// Spawn a new task with an explicit capability set.
/// The caller must hold Cap<TaskFactory> to invoke this.
/// Returns a handle to the new task.
fn task_spawn(
    entry: RvfComponentId,      // RVF component containing the entry point
    caps: &[CapHandle],         // capabilities granted to the new task
    priority: TaskPriority,     // base scheduling priority
    deadline: Option<Duration>, // optional hard deadline
) -> Result<TaskHandle, KernelError>;

// --- Capability Management ---

/// Grant a capability to another task.
/// The granting task must hold the capability with the Grant right.
/// Capabilities are unforgeable kernel objects.
fn cap_grant(
    target: TaskHandle,
    cap: CapHandle,
    rights: CapRights,          // subset of caller's rights on this cap
) -> Result<CapHandle, KernelError>;

// --- Region Memory ---

/// Map a memory region into the calling task's address space.
/// Region policy (immutable, append-only, slab) is set at creation.
fn region_map(
    size: usize,
    policy: RegionPolicy,       // Immutable | AppendOnly | Slab { slot_size }
    cap: CapHandle,             // capability authorizing the mapping
) -> Result<RegionHandle, KernelError>;

// --- Queue IPC ---

/// Send a typed message to a queue.
/// The message is zero-copy if sender and receiver share a region.
fn queue_send(
    queue: QueueHandle,
    msg: &[u8],                 // serialized message (RVF wire format)
    priority: MsgPriority,
) -> Result<(), KernelError>;

/// Receive a message from a queue.
/// Blocks until a message is available or the timeout expires.
fn queue_recv(
    queue: QueueHandle,
    buf: &mut [u8],
    timeout: Duration,
) -> Result<usize, KernelError>;

// --- Timer ---

/// Wait until a deadline or duration elapses.
/// The scheduler may preempt the task and resume it when the timer fires.
fn timer_wait(
    deadline: TimerSpec,        // Absolute(Instant) | Relative(Duration)
) -> Result<(), KernelError>;

// --- RVF Boot ---

/// Mount a signed RVF package into the component namespace.
/// The kernel verifies the package signature, proof policy, and
/// witness log policy before making components available.
fn rvf_mount(
    rvf_data: &[u8],           // raw RVF bytes (or region handle)
    mount_point: &str,          // namespace path (e.g., "/agents/planner")
    cap: CapHandle,             // capability authorizing the mount
) -> Result<RvfMountHandle, KernelError>;

// --- Attestation ---

/// Emit a cryptographic attestation for a completed operation.
/// The attestation is appended to the kernel's witness log.
/// Returns the 82-byte attestation (compatible with ADR-047 ProofAttestation).
fn attest_emit(
    operation: &AttestPayload,  // what was done
    proof: &ProofToken,         // the proof that authorized it
) -> Result<ProofAttestation, KernelError>;

// --- Vector/Graph Kernel Objects ---

/// Read a vector from a kernel-resident vector store.
/// Returns the vector data and its coherence metadata.
fn vector_get(
    store: VectorStoreHandle,
    key: VectorKey,
) -> Result<(Vec<f32>, CoherenceMeta), KernelError>;

/// Write a vector to a kernel-resident vector store.
/// Requires a valid proof token — no proof, no mutation.
fn vector_put_proved(
    store: VectorStoreHandle,
    key: VectorKey,
    data: &[f32],
    proof: ProofToken,
) -> Result<ProofAttestation, KernelError>;

/// Apply a graph mutation (add/remove node/edge, update weight).
/// Requires a valid proof token — no proof, no mutation.
fn graph_apply_proved(
    graph: GraphHandle,
    mutation: &GraphMutation,
    proof: ProofToken,
) -> Result<ProofAttestation, KernelError>;

// --- Sensor / Perception ---

/// Subscribe to a sensor stream (RuView perception events).
/// Events are delivered to the specified queue.
fn sensor_subscribe(
    sensor: SensorDescriptor,   // identifies the sensor (type, device, filter)
    target_queue: QueueHandle,
    cap: CapHandle,
) -> Result<SubscriptionHandle, KernelError>;

3.2 Syscall Properties

Every syscall satisfies these invariants:

  1. Capability-gated: No syscall succeeds without an appropriate capability handle. There is no ambient authority.
  2. Proof-required for mutation: vector_put_proved, graph_apply_proved, and rvf_mount require cryptographic proof tokens. Read-only operations do not.
  3. Bounded latency: Every syscall has a worst-case execution time expressible in cycles. The kernel contains no unbounded loops.
  4. Witness-logged: Every successful syscall that mutates state emits a witness record to the kernel's append-only log.
  5. No allocation in syscall path: The kernel pre-allocates all internal structures. Syscalls operate on pre-mapped regions and pre-created queues.

4. Memory Model

4.1 Region-Based Memory

RuVix replaces virtual memory with regions. A region is a contiguous, capability-protected memory object with one of three policies:

#[derive(Clone, Copy, Debug)]
pub enum RegionPolicy {
    /// Contents are set once at creation and never modified.
    /// The kernel may deduplicate identical immutable regions.
    /// Ideal for: RVF component code, trained model weights, lookup tables.
    Immutable,

    /// Contents can only be appended, never overwritten or truncated.
    /// A monotonic write cursor tracks the append position.
    /// Ideal for: witness logs, event streams, time-series vectors.
    AppendOnly {
        max_size: usize,
    },

    /// Fixed-size slots allocated from a free list.
    /// Slots can be freed and reused. No fragmentation by construction.
    /// Ideal for: task control blocks, capability tables, queue ring buffers.
    Slab {
        slot_size: usize,
        slot_count: usize,
    },
}

4.2 No Virtual Memory, No Page Faults

RuVix does not implement demand paging. All regions are physically backed at region_map time. This eliminates:

  • Page fault handlers (a major source of kernel complexity and timing jitter)
  • Swap — if memory is exhausted, region_map returns Err(OutOfMemory)
  • Copy-on-write — immutable regions are shared by reference; mutable regions are explicitly copied through region_map with a source handle

This design follows seL4's philosophy: the kernel provides the mechanism (regions), and policy (which regions to create, when to reclaim) is handled by a user-space resource manager running as an RVF component.

4.3 Vector Store as Kernel Memory Object

Unlike conventional kernels where all data structures are userspace constructs, RuVix makes vector stores and graph stores kernel-resident objects. Vector data lives in kernel-managed regions with the same protection as capability tables. HNSW index nodes are slab-allocated (fixed-size slots, zero allocator overhead during search). Coherence metadata is co-located with each vector (coherence score, last-mutation epoch, proof attestation hash). On AArch64 with SVE/SME, the kernel performs distance computations in-kernel without context switches.

pub struct KernelVectorStore {
    hnsw_region: RegionHandle,       // slab region for HNSW graph nodes
    data_region: RegionHandle,       // slab region for vector data (f32 or quantized)
    witness_region: RegionHandle,    // append-only mutation witness log
    coherence_config: CoherenceConfig,
    proof_policy: ProofPolicy,
    dimensions: u32,
    capacity: u32,
}

pub struct KernelGraphStore {
    node_region: RegionHandle,       // slab region for graph nodes
    edge_region: RegionHandle,       // slab region for adjacency lists
    witness_region: RegionHandle,    // append-only mutation witness log
    partition_meta: PartitionMeta,   // MinCut partition metadata
    proof_policy: ProofPolicy,
}

5. Scheduling Model

5.1 Coherence-Aware Scheduler

The RuVix scheduler is not a conventional priority scheduler. It combines three signals:

  1. Deadline pressure: Hard real-time tasks with deadline set in task_spawn get earliest-deadline-first (EDF) scheduling within their capability partition.
  2. Novelty signal: Tasks processing genuinely new information (measured by vector distance from recent inputs) get a priority boost. This prevents the system from starving exploration in favor of exploitation.
  3. Structural risk: Tasks whose pending mutations would increase graph incoherence (lowering the coherence score below a threshold) get deprioritized until a proof-verified coherence restoration is scheduled.
fn compute_priority(task: &TaskControlBlock) -> SchedulerScore {
    let deadline_urgency = task.deadline.map_or(0.0, |d| {
        1.0 / (d.saturating_duration_since(now()).as_micros() as f64 + 1.0)
    });
    let novelty_boost = task.pending_input_novelty; // 0.0..1.0
    let risk_penalty = task.pending_coherence_delta.min(0.0).abs() * RISK_WEIGHT;
    SchedulerScore { score: deadline_urgency + novelty_boost - risk_penalty }
}

5.2 Scheduling Guarantees

  • No priority inversion: Capability-based access means tasks cannot block on resources they do not hold capabilities for. The kernel never needs priority inheritance protocols.
  • Bounded preemption: The kernel preempts at queue boundaries (after a queue_send or queue_recv completes), not at arbitrary instruction boundaries. This eliminates the need for kernel-level spinlocks.
  • Partition scheduling: Tasks are grouped by their RVF mount origin. Each partition gets a guaranteed time slice, preventing a misbehaving RVF component from starving others.

6. Capability Manager

6.1 seL4-Inspired Explicit Object Access

Every kernel object (task, region, queue, timer, vector store, graph store, RVF mount) is accessed exclusively through capabilities. A capability is an unforgeable kernel-managed token comprising:

/// A capability is a kernel-managed, unforgeable access token.
#[derive(Clone)]
pub struct Capability {
    /// Unique identifier for the kernel object.
    object_id: ObjectId,
    /// The type of kernel object (Task, Region, Queue, Timer, VectorStore, GraphStore, RvfMount).
    object_type: ObjectType,
    /// Rights bitmap: Read, Write, Grant, Revoke, Execute, Prove.
    rights: CapRights,
    /// Capability badge — caller-visible identifier for demultiplexing.
    badge: u64,
    /// Epoch — invalidated if the object is destroyed or the capability is revoked.
    epoch: u64,
}

bitflags::bitflags! {
    pub struct CapRights: u32 {
        const READ    = 0b0000_0001;
        const WRITE   = 0b0000_0010;
        const GRANT   = 0b0000_0100;
        const REVOKE  = 0b0000_1000;
        const EXECUTE = 0b0001_0000;
        const PROVE   = 0b0010_0000; // right to generate proof tokens for this object
    }
}

6.2 Capability Derivation Rules

A task can only grant capabilities it holds, with equal or fewer rights. PROVE is required for vector_put_proved/graph_apply_proved. GRANT is required for cap_grant. Revoking a capability invalidates all derived capabilities (propagation through the derivation tree).

6.3 Initial Capability Set

At boot, the kernel creates a root task holding capabilities for all physical memory (as untyped regions), the boot RVF package, the kernel witness log (append-only), and a root queue for hardware interrupts. The root task creates all other kernel objects and distributes capabilities. Following seL4's principle: the kernel creates nothing after boot.


7. Queue-First IPC

7.1 Shared Ring Queues

All inter-task communication in RuVix goes through queues. There are no synchronous IPC calls, no shared memory without explicit region grants, and no signals.

pub struct KernelQueue {
    ring_region: RegionHandle,   // shared region containing the ring buffer
    ring_size: u32,              // power of 2
    sq_head: AtomicU32,          // submission queue head (sender writes)
    sq_tail: AtomicU32,          // submission queue tail (kernel advances)
    cq_head: AtomicU32,          // completion queue head (receiver writes)
    cq_tail: AtomicU32,          // completion queue tail (kernel advances)
    schema: WitTypeId,           // RVF WIT type for message validation
    max_msg_size: u32,
}

7.2 Zero-Copy Semantics

When sender and receiver share a region, queue_send places a descriptor (offset + length) in the ring rather than copying bytes. The receiver reads directly from the shared region. This is critical for high-throughput vector streaming where copying 768-dimensional f32 vectors would be prohibitive.

7.3 Queue-Based Device Drivers

Hardware interrupts are delivered as messages to designated queues. A device driver is an RVF component that:

  1. Holds capabilities for device MMIO regions
  2. Subscribes to interrupt queues
  3. Translates hardware events into typed messages on application queues

This means all device drivers run in user space with no kernel privileges beyond their capability set.


8. Proof-Gated Mutation Protocol

8.1 Kernel-Enforced Invariant

In RuVix, proof-gated mutation (ADR-047) is not a library convention. It is a kernel invariant. The kernel physically prevents state mutation without a valid proof token.

/// A proof token authorizing a specific mutation.
/// Generated by the Proof Engine and consumed by a mutating syscall.
pub struct ProofToken {
    /// Hash of the mutation being authorized.
    mutation_hash: [u8; 32],
    /// Proof tier (Reflex, Standard, Deep) — from ADR-047.
    tier: ProofTier,
    /// The proof payload (Merkle witness, ZK proof, or coherence certificate).
    payload: ProofPayload,
    /// Expiry — proofs are time-bounded to prevent replay.
    valid_until: Instant,
    /// Nonce — prevents proof reuse.
    nonce: u64,
}

#[derive(Clone, Copy)]
pub enum ProofTier {
    /// Sub-microsecond hash check. For high-frequency vector updates.
    Reflex,
    /// Merkle witness verification. For graph mutations.
    Standard,
    /// Full coherence verification with mincut analysis. For structural changes.
    Deep,
}

8.2 Proof Lifecycle

  1. A task prepares a mutation (e.g., GraphMutation::AddEdge { from, to, weight }).
  2. The task computes the mutation hash and requests a proof from the Proof Engine (an RVF component, not in-kernel).
  3. The Proof Engine evaluates the mutation against the current coherence state, proof policy, and attestation chain.
  4. If approved, the Proof Engine issues a ProofToken with a bounded validity window.
  5. The task calls graph_apply_proved(graph, &mutation, proof).
  6. The kernel verifies: (a) the proof token matches the mutation hash, (b) the token has not expired, (c) the nonce has not been used, (d) the calling task holds PROVE rights on the graph.
  7. If all checks pass, the mutation is applied and an attestation is emitted to the witness log.
  8. If any check fails, the syscall returns Err(ProofRejected) and no state changes.

8.3 Proof Composition

Regional proofs compose. When multiple mutations within a mincut partition are all proved individually, a partition-level proof can be derived (see ADR-047 Section 4). The kernel maintains partition coherence scores and can fast-path mutations within a coherent partition using Reflex tier proofs.


9. RVF Boot Sequence

9.1 Boot Protocol

RuVix boots from a single signed RVF file. The boot sequence is:

Power On / Hypervisor Start


┌──────────────────────────────────────────────┐
│ Stage 0: Hardware Init (AArch64)             │
│  - Initialize MMU with identity mapping      │
│  - Initialize UART for early console         │
│  - Detect available memory, cache topology   │
│  - If TEE: initialize realm/enclave          │
└──────────────────────┬───────────────────────┘


┌──────────────────────────────────────────────┐
│ Stage 1: RVF Manifest Parse                  │
│  - Read 4 KB boot manifest from RVF header   │
│  - Verify ML-DSA-65 signature (post-quantum) │
│  - Parse component graph                     │
│  - Parse memory schema (region requirements) │
│  - Parse proof policy                        │
│  - Parse witness log policy                  │
│  - Parse rollback hooks                      │
└──────────────────────┬───────────────────────┘


┌──────────────────────────────────────────────┐
│ Stage 2: Kernel Object Creation              │
│  - Create root task with all capabilities    │
│  - Create initial regions from memory schema │
│  - Create boot queue for init messages       │
│  - Initialize kernel witness log (append)    │
│  - Initialize kernel vector store (if spec.) │
│  - Initialize kernel graph store (if spec.)  │
└──────────────────────┬───────────────────────┘


┌──────────────────────────────────────────────┐
│ Stage 3: Component Mount                     │
│  - Mount RVF components per component graph  │
│  - Distribute capabilities per manifest      │
│  - Spawn initial tasks per WIT entry points  │
│  - Connect queues per manifest wiring        │
└──────────────────────┬───────────────────────┘


┌──────────────────────────────────────────────┐
│ Stage 4: First Attestation                   │
│  - Emit boot attestation to witness log      │
│  - Record: RVF hash, capability table hash,  │
│    region layout hash, timestamp             │
│  - System is now live                        │
└──────────────────────────────────────────────┘

9.2 RVF as Boot Object

An RVF boot package is a complete cognitive unit containing:

SectionPurposeSize Budget
ManifestComponent graph, memory schema, proof policy, rollback hooks, witness log policy4 KB
SignaturesML-DSA-65 package signature + per-component signatures2-8 KB
WIT/ABIComponent interface types (WASM Interface Types)1-16 KB
Component GraphDAG of components with queue wiring and capability grants1-4 KB
Memory SchemaRegion declarations (size, policy, initial content)1-4 KB
Proof PolicyPer-component proof tier requirements512 B - 2 KB
Rollback HooksWASM functions for state rollback on proof failure1-8 KB
Witness Log PolicyRetention, compression, export rules for attestations256 B - 1 KB
WASM ComponentsCompiled WASM component binaries8 KB - 16 MB
Initial DataPre-loaded vectors, graph state, model weights0 - 1 GB

9.3 Deterministic Replay

Because every mutation is witnessed and every input is queued, a RuVix system can replay from any checkpoint:

  1. Load a checkpoint RVF (containing region snapshots + witness log prefix)
  2. Replay queued messages in witness-log order
  3. Re-verify proofs at each mutation
  4. The resulting state must be identical (bit-for-bit) to the original

This is the foundation of the acceptance test (Section 12).


10. RuView as Perception Plane

10.1 Position in Architecture

RuView sits outside the kernel but close — it is the first RVF component layer. Its job is to normalize external signals into typed, coherence-scored events and publish them into kernel queues.

10.2 Sensor Abstraction

/// A sensor descriptor identifies a data source for RuView.
pub struct SensorDescriptor {
    /// Sensor type: Camera, Microphone, NetworkTap, MarketFeed, GitStream, etc.
    sensor_type: SensorType,
    /// Device identifier (hardware address, URL, stream ID).
    device_id: DeviceId,
    /// Filter expression (e.g., "symbol=AAPL" or "file_ext=.rs").
    filter: Option<FilterExpr>,
    /// Requested sampling rate (events per second, 0 = all).
    sample_rate: u32,
}

10.3 Event Normalization

RuView transforms raw sensor data into PerceptionEvent structs carrying: a vector embedding (matching kernel store dimensionality), a coherence score relative to recent context, a causal hash linking to the previous event from the same sensor, and a nanosecond timestamp. Events flow through sensor_subscribe into kernel queues where agent tasks consume them.


11. AgentDB as Planner/Intelligence Runtime

11.1 Explicit Non-Kernel Placement

AgentDB, Claude Code, Codex, and all AI reasoning systems are NOT in the trusted kernel. They are RVF components running in user space with:

  • Capability-restricted access to vector stores (read + proved write)
  • Queue-based communication (no direct kernel memory access)
  • Proof-gated mutation (the intelligence runtime cannot modify state without passing through the proof engine)

This is a deliberate security boundary. The kernel trusts mathematics (proofs, hashes, capabilities). It does not trust neural networks.

11.2 Control Plane Adapters

Claude Code and Codex connect to RuVix as control plane adapters:

┌──────────────┐    queue     ┌──────────────┐    syscall    ┌────────┐
│ Claude Code  │◀────────────▶│ AgentDB      │──────────────▶│ RuVix  │
│ (external)   │  WebSocket   │ (RVF comp.)  │  cap-gated    │ Kernel │
└──────────────┘              └──────────────┘               └────────┘

The adapter translates natural language intent into typed mutations with proof requests. The kernel neither knows nor cares that the mutation originated from an LLM.


12. Build Path

Phase A: Linux-Hosted Nucleus (Days 1-60)

Goal: Implement all 12 syscalls as a Rust library running in Linux userspace. Freeze the ABI.

WeekMilestoneDeliverable
1-2Core typesruvix-types crate: all kernel object types, capability types, proof types. No_std compatible.
3-4Region managerruvix-region crate: slab allocator, append-only regions, immutable regions. Uses mmap on Linux.
5-6Queue IPCruvix-queue crate: io_uring-style ring buffers in shared memory. Lock-free send/recv.
7-8Capability managerruvix-cap crate: capability table, derivation tree, revocation propagation.
9-10Proof engineruvix-proof crate: proof token generation/verification, witness log. Integrates ruvector-verified.
11-12Vector/Graph kernel objectsruvix-vecgraph crate: kernel-resident vector and graph stores using regions. Integrates ruvector-core and ruvector-graph-transformer.
13-14Schedulerruvix-sched crate: coherence-aware task scheduler. Runs as a Linux thread scheduler.
15-16RVF boot loaderruvix-boot crate: RVF package parsing, component mounting, capability distribution.
17-18Integration + ABI freezeruvix-nucleus crate: all subsystems integrated. ABI frozen. Acceptance test passes.

Phase A Acceptance Test: A signed RVF boots in the Linux-hosted nucleus, consumes a simulated RuView event from a queue, performs one proof-gated vector mutation, emits an attestation to the witness log, shuts down, restarts from checkpoint, replays to the same state, and the final vector store contents are bit-identical.

Phase B: Bare Metal AArch64 Microkernel (Days 60-120)

Goal: Run the same ABI on bare metal AArch64 (Raspberry Pi 4/5, QEMU virt).

WeekMilestoneDeliverable
19-22AArch64 bootstrapException vectors, MMU setup, UART driver, physical memory manager.
23-26Region → physical memoryReplace mmap with direct physical page allocation. Implement region policies on hardware page tables.
27-30Interrupt → queueGIC (Generic Interrupt Controller) driver delivering interrupts as queue messages. Timer driver for timer_wait.
31-34WASM component runtimeEmbedded Wasmtime or wasm-micro-runtime for executing RVF WASM components on bare metal.
35-38Scheduler on hardwareCoherence-aware scheduler using AArch64 timer interrupts for preemption.
39-42Acceptance test on hardwareSame acceptance test as Phase A, running on QEMU AArch64 virt machine.

No Phase C

There is no POSIX compatibility layer. RuVix does not implement open(), read(), write(), fork(), exec(), or any POSIX syscall. Applications that need POSIX run on Linux. RuVix runs cognitive workloads.


13. Demo Applications

13.1 Application Matrix

#ApplicationCategoryKernel Features ExercisedComplexity
1Proof-Gated Vector JournalFoundationalvector_put_proved, attest_emit, deterministic replayLow
2Edge ML Inference PipelinePracticalrvf_mount, queue IPC, sensor_subscribe, vector_get, timer_waitMedium
3Autonomous Drone Swarm CoordinatorPracticaltask_spawn (per-drone), cap_grant (dynamic trust), queue mesh, coherence schedulerHigh
4Self-Healing Knowledge GraphPracticalgraph_apply_proved, coherence scoring, proof composition, rollback hooksHigh
5Collective Intelligence MeshExoticMulti-kernel federation via queue bridges, cross-kernel cap_grant, distributed proof compositionVery High
6Quantum-Coherent Memory ReplayExoticSuperposition-tagged vectors, deferred proof resolution, probabilistic region policiesVery High
7Biological Signal ProcessorExoticsensor_subscribe (EEG/EMG), real-time coherence scoring, deadline scheduling, witness-backed diagnosticsHigh
8Adversarial Reasoning ArenaExoticCompeting agent tasks with conflicting proof policies, capability isolation, coherence arbitrationVery High

13.2 Demo 1: Proof-Gated Vector Journal

The simplest demonstration of the kernel's core invariant.

RVF Package: vector_journal.rvf
Components:
  - writer: Generates random vectors, requests proofs, calls vector_put_proved
  - reader: Periodically calls vector_get, verifies coherence metadata
  - auditor: Reads witness log, verifies attestation chain integrity

Test scenario:
  1. Writer stores 1000 vectors with valid proofs → all succeed
  2. Writer attempts store without proof → kernel rejects (Err(ProofRejected))
  3. Writer attempts store with expired proof → kernel rejects
  4. System checkpoints, restarts, replays → final state identical
  5. Auditor verifies: witness log contains exactly 1000 attestations

13.3 Demo 2: Edge ML Inference Pipeline

An RVF package that runs a complete ML inference pipeline on an edge device.

RVF Package: edge_inference.rvf
Components:
  - sensor_adapter: Subscribes to camera sensor, emits frame embeddings to queue
  - feature_store: Kernel vector store holding reference embeddings
  - classifier: Receives embeddings, queries feature_store, emits classifications
  - model_updater: Periodically receives new model weights via queue,
                   performs proof-gated update of feature_store

Kernel features:
  - sensor_subscribe for camera frames
  - queue_send/recv for pipeline stages
  - vector_get for nearest-neighbor lookup
  - vector_put_proved for model updates (proof-gated)
  - timer_wait for periodic model refresh
  - Coherence scheduler prioritizes inference over model updates

13.4 Demo 3: Autonomous Drone Swarm Coordinator

RVF Package: drone_swarm.rvf
Components:
  - coordinator: Maintains global mission graph, assigns waypoints
  - drone_agent[N]: Per-drone task with position vectors, local planning
  - trust_manager: Dynamically adjusts capabilities based on drone behavior
  - coherence_monitor: Watches for swarm fragmentation (graph mincut)

Kernel features:
  - task_spawn per drone agent (dynamic fleet scaling)
  - cap_grant/revoke for dynamic trust (misbehaving drone loses capabilities)
  - graph_apply_proved for mission graph updates
  - Coherence scheduler penalizes plans that fragment the swarm graph
  - Deterministic replay for post-mission analysis

13.5 Demo 4: Self-Healing Knowledge Graph

RVF Package: knowledge_graph.rvf
Components:
  - ingestor: Consumes knowledge events, proposes graph mutations
  - coherence_checker: Evaluates proposed mutations against graph invariants
  - healer: Detects coherence drops, proposes compensating mutations
  - checkpoint_manager: Periodic state snapshots with rollback hooks

Kernel features:
  - graph_apply_proved for every knowledge mutation
  - Proof composition across mincut partitions
  - Rollback hooks triggered when coherence drops below threshold
  - Witness log provides complete audit trail for knowledge provenance

13.6 Demo 5: Collective Intelligence Mesh

Multiple RuVix instances forming a distributed cognitive fabric. Each node runs its own kernel with local vector/graph stores. Queue bridges (network-backed queues) connect nodes. Cross-kernel capability delegation works via attested queue messages. Distributed proof composition allows node A to prove locally while node B verifies. The mesh self-organizes via coherence gradients with no central coordinator. Knowledge migrates toward nodes that use it (vector locality). Proof chains span multiple kernels (federated attestation via ruvector-raft).

13.7 Demo 6: Quantum-Coherent Memory Replay

A speculative demonstration exploring quantum-inspired memory semantics. Vectors are stored in superposition states (multiple weighted values). Proof resolution collapses superposition to a definite value. Until observed via vector_get, mutations accumulate as unresolved proofs. Replay can explore alternative proof resolution paths by checkpointing, resolving proofs differently, and comparing outcomes. Requires an experimental Superposition region policy not in the initial kernel.

13.8 Demo 7: Biological Signal Processor

EEG and EMG sensor adapters emit neural/muscle signal vectors via sensor_subscribe. A fusion engine combines them into intent vectors. Hard deadline scheduling (256 Hz = 3.9ms deadline) ensures real-time processing. Coherence scoring detects anomalous signals (seizure detection). Witness-backed diagnostics provide regulatory compliance audit trails. Proof-gated model updates prevent untested parameter changes in clinical settings.

13.9 Demo 8: Adversarial Reasoning Arena

Two competing agent tasks (red/blue) propose mutations to maximize conflicting objectives on a shared graph. An arbiter evaluates competing proofs and grants mutations to the stronger proof. Capability isolation prevents agents from accessing each other's state. The coherence-aware scheduler penalizes agents that lower coherence. The full witness log enables post-hoc analysis of adversarial dynamics and emergent strategies.


14. Failure Modes and Mitigations

Failure ModeImpactMitigation
Proof engine unavailableAll mutations blocked (no proof tokens issued)Kernel maintains a Reflex-tier proof cache for critical paths. Cache entries have short TTL (100ms). If proof engine is down for >1s, kernel emits a diagnostic attestation and suspends non-critical tasks.
Witness log fullNew attestations cannot be written; mutations blockedAppend-only regions have configurable max size. When 90% full, kernel emits a compaction request to the witness manager component. At 100%, kernel rejects mutations until space is freed. Witness log is never truncated — only checkpointed and archived.
Coherence score collapseScheduler deprioritizes all mutation tasks; system stallsCoherence floor threshold triggers automatic rollback to last checkpoint where coherence was above threshold. Rollback hooks in the RVF manifest execute compensating logic.
Capability leak (over-granting)Task gains access to resources beyond its intended scopeRevocation propagates through derivation tree. Periodic capability audit compares held capabilities against manifest-declared permissions. Discrepancies trigger automatic revocation.
Vector store capacity exhaustedvector_put_proved returns OutOfMemoryPre-allocated capacity is declared in RVF manifest. Resource manager component is responsible for eviction policy (LRU by coherence score, quantization-based compression). Kernel enforces capacity limits.
Queue overflowqueue_send returns QueueFull; producer back-pressureRing buffer size is declared at queue creation. Producers must handle QueueFull by retrying or dropping low-priority messages. Kernel never silently drops messages.
Malicious RVF packageArbitrary code execution, capability theftRVF signature verification at rvf_mount time. WASM component sandboxing. Component capabilities are limited to what the manifest declares and the mounting task grants. No ambient authority.
AArch64 hardware faultKernel panic, data corruptionRegion checksums enable corruption detection. Append-only witness log survives if storage is intact. Replay from last checkpoint recovers state. TEE attestation detects hardware tampering.

15. Consequences

15.1 Positive

  1. Proof-gated mutation as kernel invariant: Every state change is auditable, replayable, and formally justified. This eliminates entire categories of bugs (unauthorized writes, silent corruption, untracked state drift).

  2. Zero-overhead vector/graph operations: Vector and graph stores as kernel objects eliminate the syscall-per-query overhead of running a vector database as a userspace service. Distance computations can use kernel-privileged SIMD/SVE instructions.

  3. Deterministic replay: Because every input is queued, every mutation is proved, and every effect is witnessed, the system can replay from any checkpoint to reproduce any state. This is invaluable for debugging, auditing, and regulatory compliance.

  4. Minimal attack surface: 12 syscalls, 6 primitives, capability-only access. The kernel TCB is orders of magnitude smaller than Linux (~30M LOC) or even seL4 (~8.7K LOC verified C + ~600 LOC assembly). Target: <15K LOC Rust.

  5. RVF as universal deployment unit: A single signed file contains code, data, capabilities, proof policies, and rollback hooks. No package managers, no container runtimes, no dependency hell.

  6. Coherence-aware scheduling: The scheduler understands semantic content, not just priority numbers. This enables intelligent resource allocation in cognitive workloads.

  7. Natural integration with RuVector: All 107 existing crates can be compiled as RVF components. The kernel's vector/graph objects use the same formats and algorithms as ruvector-core and ruvector-graph-transformer.

15.2 Negative

  1. No POSIX compatibility: Existing software cannot run on RuVix without rewriting. This limits the ecosystem to purpose-built RVF components.

  2. Hardware support initially limited: AArch64-first means no x86_64 bare metal in Phase B. x86_64 support requires a separate BSP effort.

  3. Proof overhead on hot paths: Even Reflex tier proofs add ~100ns per mutation. For workloads with millions of mutations per second, this is measurable. Mitigation: batch mutations under a single partition proof.

  4. No dynamic memory allocation: Pre-allocated regions mean the system must know its memory requirements at boot time. Dynamic workloads require a resource manager component with eviction/compaction policies.

  5. Unfamiliar programming model: Developers accustomed to POSIX, threads, and mutexes must learn capabilities, queues, and proof-gated mutation. Documentation and tooling investment is required.

15.3 Risks

RiskLikelihoodImpactMitigation
ABI instability during Phase AMediumHigh — breaks all existing RVF componentsFreeze ABI at week 18. No changes without ADR amendment.
WASM component model immaturityMediumMedium — limits component interoperabilityPin to WASI Preview 2. Maintain a WIT type registry.
Performance regression vs. Linux-hosted RuVectorLowHigh — undermines the motivationBenchmark suite comparing Linux-hosted vs. RuVix-native for vector ops, graph mutations, queue throughput.
Formal verification infeasibilityHighMedium — limits trust claimsDo not claim formal verification in v1. Focus on testing and deterministic replay as the verification mechanism.
Single-developer dependencyHighHigh — project stalls if key contributor leavesDocument everything in ADRs. Keep kernel small enough for one person to understand fully.

16. Integration with Existing RuVector Crates

16.1 Crate Mapping

Existing CrateRuVix RoleIntegration Path
ruvector-coreKernel vector store implementationExtract HNSW algorithm into ruvix-vecgraph. Use slab regions instead of Vec<T>.
ruvector-graph-transformerKernel graph store implementationExtract graph mutation logic into ruvix-vecgraph. Proof-gate via kernel syscalls.
ruvector-verifiedProof engine foundationProofGate<T>, ProofEnvironment, ProofAttestation become kernel types. ruvix-proof wraps these.
cognitum-gate-kernelCoherence scoring referencePort 256-tile coherence fabric to operate on kernel graph store regions.
ruvector-coherenceScheduler coherence signalCoherence score computation feeds into compute_priority().
ruvector-mincutGraph partitioning for proof compositionMinCut partitions define proof composition boundaries.
rvfBoot loader format parserruvix-boot depends on rvf for manifest parsing and signature verification.
ruvector-raftMulti-kernel consensus (Demo 5)Queue-bridge Raft for distributed coherence in collective intelligence mesh.
ruvector-snapshotCheckpoint/restoreRegion snapshots for deterministic replay and rollback.
sonaAgentDB intelligence runtimeRuns as RVF component in user space. Communicates via queues.
ruvllmML inference componentRuns as RVF component. Uses vector_get for retrieval, proof-gated weight updates.
ruvector-temporal-tensorTime-series vector storageAppend-only regions are a natural fit for temporal tensor data.

16.2 Shared No_std Types

A new ruvix-types crate (no_std, no alloc) defines all kernel interface types. This crate is depended on by both kernel code and RVF component code, ensuring type-level compatibility across the boundary.

[package]
name = "ruvix-types"
version = "0.1.0"
edition = "2021"

[features]
default = []
std = []
alloc = []

[dependencies]
# Zero external dependencies for the kernel type crate

17. Acceptance Test Specification

The acceptance test is the single gate for Phase A completion. It is not a unit test — it is a system-level integration test that exercises every kernel subsystem.

17.1 Test Procedure

GIVEN:
  - A signed RVF package "acceptance.rvf" containing:
    - A sensor_adapter component (simulated)
    - A vector_store component (kernel-resident, capacity=100)
    - A proof_engine component
    - A writer component
    - A reader component
    - Proof policy: Standard tier for all mutations
    - Witness log policy: retain all, no compression

WHEN:
  Step 1: rvf_mount("acceptance.rvf", "/test", root_cap)
  Step 2: sensor_adapter emits one PerceptionEvent to queue
  Step 3: writer receives event, computes embedding vector
  Step 4: writer requests proof from proof_engine
  Step 5: writer calls vector_put_proved(store, key, vector, proof)
  Step 6: kernel verifies proof, applies mutation, emits attestation
  Step 7: reader calls vector_get(store, key)
  Step 8: System checkpoints (region snapshots + witness log)
  Step 9: System shuts down
  Step 10: System restarts from checkpoint
  Step 11: System replays witness log
  Step 12: reader calls vector_get(store, key) again

THEN:
  - Step 5 returns Ok(attestation) with 82-byte witness
  - Step 7 returns the exact vector and coherence metadata
  - Step 12 returns the EXACT SAME vector and coherence metadata as Step 7
  - Witness log contains exactly: 1 boot attestation + 1 mount attestation + 1 mutation attestation
  - No proof-less mutation was accepted at any point
  - Total replay time < 2x original execution time

18. Comparison with Prior Art

PropertyLinuxseL4ZephyrHermitRuVix
Kernel LOC~30M~8.7K~200K~50K<15K (target)
Primitivesprocess, file, socket, signal, pipeTCB, CNode, endpoint, notification, untypedthread, semaphore, FIFO, timerprocess (unikernel)task, capability, region, queue, timer, proof
Syscalls~450~12~150POSIX subset12
Memory modelvirtual memory + demand paginguntyped + retypeflat or MPU regionssingle address spaceregions (immutable/append/slab)
IPCpipe, socket, signal, shmemsynchronous endpointFIFO, mailbox, pipePOSIXqueue (io_uring-style)
Access controlDAC/MAC (users, groups, SELinux)capabilitiesnone (trusted code)none (unikernel)capabilities + proof
Vector/graph nativenonononoyes (kernel objects)
Proof-gated mutationnonononoyes (kernel invariant)
Formal verificationnoyes (functional correctness)nonono (v1), replay-based
POSIX compatibleyesno (but has CAmkES)partialyesno
Hardware targetallARM, RISC-V, x86MCUsx86_64, aarch64AArch64 (primary)

19. Open Questions

  1. Maximum proof verification time? Should Deep tier proofs >10ms be rejected? Position: yes, configurable per-partition timeout.

  2. Vector quantization: in-kernel or in-component? Position: in-component; kernel stores pre-quantized data.

  3. Multi-kernel conflicting proofs? Position: Raft consensus (ruvector-raft) with coherence score tiebreaker.

  4. Hot-swapping RVF components? Position: add rvf_unmount in a future ABI revision, not in initial 12 syscalls.

  5. Minimum viable Phase B hardware? Position: Raspberry Pi 4 (Cortex-A72) for accessibility, validate on Pi 5 (Cortex-A76).

  6. Boot signature failure behavior? Position: boot halts (kernel panic) on signature verification failure — no fallback, diagnostic to UART.

  7. Slab slot use-after-free? Position: slab allocations return capability-protected handles with per-slot generation counters; stale handles are detected and rejected.

  8. Queue schema validation boundary? Position: kernel validates message size and WIT type tag at queue_send time; deep structural validation is receiver-side. Document this as a trust boundary.


20. Security Hardening Notes

Identified during pre-release security audit. All are specification clarifications, not structural flaws.

20.1 Root Task Privilege Attenuation

The root task holds capabilities for all physical memory at boot. After Stage 3 component mounting completes, the root task MUST drop all capabilities except its own task handle and the kernel witness log (read-only). If the root task is implemented as a persistent init component, it retains only the minimum capability set declared in its RVF manifest. This prevents a compromised root task from owning the entire system post-boot.

20.2 Capability Delegation Depth Limit

cap_grant with GRANT right enables transitive delegation chains. To prevent unbounded delegation: maximum delegation depth is 8 (configurable per-RVF). A GRANT_ONCE right (non-transitive) is available for cases where a task should delegate access but not the ability to further delegate. The periodic capability audit (Section 14) flags delegation chains deeper than 4.

20.3 Boot RVF Proof Bootstrap

Mounting the Proof Engine itself cannot require a user-space proof (circular dependency). Resolution: Stage 3 component mounting from the cryptographically-signed boot RVF is the single kernel-trusted path that bypasses proof token requirements. Post-boot rvf_mount calls always require proof tokens from the now-running Proof Engine. The boot RVF's signature verification at Stage 1 provides equivalent assurance.

20.4 Reflex Proof Cache Scoping

The Reflex-tier proof cache (Section 14, proof engine unavailable mitigation) caches proof tokens scoped to a specific (mutation_hash, nonce) pair, not to operation classes. Cached proofs are single-use (nonce consumed on first verification). Cache entries have 100ms TTL. The cache size is bounded (default: 64 entries). This prevents window-of-opportunity attacks where coherence degrades between cache insertion and consumption.

20.5 Zero-Copy IPC TOCTOU Mitigation

When queue_send places a descriptor referencing a shared region, the referenced data segment must be in an Immutable or AppendOnly region. The kernel rejects queue_send descriptors pointing into Slab regions (which permit overwrites). This eliminates time-of-check-to-time-of-use attacks where a sender modifies shared data after the receiver reads the descriptor but before it processes the content.

20.6 Boot Signature Failure

If ML-DSA-65 signature verification fails at Stage 1, the kernel panics immediately. There is no fallback boot path, no recovery mode, and no unsigned boot option. A diagnostic message is written to UART (if initialized). This is a deliberate design choice: a system that boots unsigned code provides no security guarantees.


21. References

  1. seL4 Microkernel — Klein et al., "seL4: Formal Verification of an OS Kernel," SOSP 2009. The capability model and "kernel creates nothing after boot" principle directly inspire RuVix's capability manager.

  2. io_uring — Axboe, "Efficient IO with io_uring," 2019. The submission/completion ring design inspires RuVix's queue IPC.

  3. WASM Component Model — W3C WebAssembly CG, "Component Model," 2024. WIT (WASM Interface Types) provides the type system for RVF component interfaces.

  4. RVF Specification — ADR-029, RuVector project. The canonical binary format that becomes the RuVix boot object.

  5. Proof-Gated Mutation — ADR-047, RuVector project. The ProofGate<T> type and three-tier proof routing that becomes a kernel invariant.

  6. Hermit OS — Lankes et al., "A Rust-Based Unikernel," VEE 2023. Demonstrates Rust-native kernel development and links against application at build time.

  7. Theseus OS — Boos et al., "Theseus: an Experiment in Operating System Structure and State Management," OSDI 2020. Safe-language OS using Rust ownership for isolation without hardware privilege rings.

  8. Capability Hardware Enhanced RISC Instructions (CHERI) — Watson et al., "CHERI: A Hybrid Capability-System Architecture," IEEE S&P 2015. Hardware-enforced capabilities that could accelerate RuVix's capability checks.

  9. Coherence Engine — ADR-014, RuVector project. Graph-theoretic consistency scoring replacing probabilistic confidence.

  10. Cognitum Gate Kernelcrates/cognitum-gate-kernel/, RuVector project. No_std WASM kernel for 256-tile coherence fabric.


22. Decision Record

This ADR proposes RuVix as a new architectural layer in the RuVector ecosystem. It does not replace any existing crate or ADR. It promotes existing primitives (RVF, proof-gated mutation, coherence scoring, capability-based access) from library conventions to kernel-enforced invariants.

The build path is intentionally conservative: Phase A delivers a Linux-hosted prototype with the full syscall surface. Phase B delivers bare metal. There is no Phase C because POSIX compatibility would compromise every design principle.

The acceptance test is the single measure of success: a signed RVF boots, consumes an event, performs a proof-gated mutation, emits an attestation, and replays deterministically.


Phase C: Multi-Core and DMA Support

C.1 Objectives

Phase C extends the bare metal kernel with symmetric multi-processing (SMP) and DMA capabilities, enabling parallel execution across multiple CPU cores and zero-copy I/O operations.

  1. Symmetric Multi-Processing (SMP) — Support up to 256 cores with per-CPU data structures, spinlocks, and inter-processor interrupts (IPIs)
  2. DMA Controller Abstraction — Zero-copy I/O for high-bandwidth peripherals (network, storage, sensors)
  3. Device Tree Parsing — Runtime hardware discovery for portable multi-platform support
  4. Memory Coherence — Proper cache and memory barrier usage for SMP correctness
  5. Per-CPU Scheduling — Load balancing with coherence-aware task migration

C.2 New Crates

CratePurposeDependenciesLines of Code (Est.)
ruvix-smpMulti-core boot, per-CPU data, spinlocks, IPIsruvix-aarch64, ruvix-types~1,000
ruvix-dmaDMA controller abstraction for zero-copy I/Oruvix-hal, ruvix-physmem~500
ruvix-dtbDevice tree blob (DTB) parser for hardware discoveryruvix-types~600

Updated primitives table:

CrateStatusTestsDescription
ruvix-smpPhase CTBDSMP boot, per-CPU data, spinlocks, IPIs
ruvix-dmaPhase CTBDDMA controller abstraction, scatter-gather lists
ruvix-dtbPhase CTBDFlattened Device Tree parser, memory/peripheral discovery

C.3 SMP Boot Sequence

Secondary CPU bring-up follows the ARM PSCI (Power State Coordination Interface) v0.2 specification, with spin-table fallback for platforms without PSCI.

┌─────────────────────────────────────────────────────────────────────────┐
│                         SMP BOOT SEQUENCE                                │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                          │
│  PRIMARY CPU (CPU 0)                    SECONDARY CPUs (1..N)            │
│  ═════════════════                      ═══════════════════════          │
│                                                                          │
│  1. Execute _start                      1. Held in firmware/spin-table   │
│     │                                                                    │
│  2. Initialize MMU, GIC                                                  │
│     │                                                                    │
│  3. Parse DTB → discover CPUs                                            │
│     │                                                                    │
│  4. Allocate per-CPU data                                                │
│     │                                                                    │
│  5. For each secondary CPU:             2. PSCI: CPU_ON or spin-release  │
│     │   ┌──────────────────────────────────────────────────────────┐     │
│     └──▶│ smc #0 (PSCI CPU_ON)  OR  write spin-table release addr │     │
│         └──────────────────────────────────────────────────────────┘     │
│                                              │                           │
│                                         3. Jump to secondary_entry       │
│                                              │                           │
│                                         4. Initialize per-CPU state      │
│                                              │                           │
│                                         5. Enable local GIC interface    │
│                                              │                           │
│                                         6. Signal ready via IPI          │
│  6. Wait for all CPUs ready                  │                           │
│     ◀────────────────────────────────────────┘                           │
│     │                                                                    │
│  7. Enter scheduler                     7. Enter scheduler               │
│                                                                          │
└─────────────────────────────────────────────────────────────────────────┘

PSCI-Based Boot (Preferred)

// ruvix-smp/src/boot.rs
pub unsafe fn boot_secondary_psci(cpu_id: usize, entry_point: usize) -> Result<(), PsciError> {
    let result = smc_call(
        PSCI_CPU_ON,           // Function ID
        cpu_id as u64,         // Target CPU MPIDR
        entry_point as u64,    // Entry point address
        0,                     // Context ID (passed in x0)
    );

    match result {
        0 => Ok(()),                          // SUCCESS
        PSCI_E_ALREADY_ON => Ok(()),          // Already running
        PSCI_E_ON_PENDING => Ok(()),          // Boot in progress
        err => Err(PsciError::from(err)),
    }
}

Spin-Table Boot (Fallback)

// ruvix-smp/src/boot.rs
pub unsafe fn boot_secondary_spintable(
    cpu_id: usize,
    spin_table_addr: *mut u64,
    entry_point: usize,
) {
    // Write entry point to spin-table release address
    core::ptr::write_volatile(spin_table_addr, entry_point as u64);

    // Data synchronization barrier ensures write is visible
    asm!("dsb sy");

    // Send event to wake sleeping CPUs
    asm!("sev");
}

Per-CPU Data Structure

// ruvix-smp/src/percpu.rs
#[repr(C, align(64))]  // Cache-line aligned to prevent false sharing
pub struct PerCpuData {
    /// CPU identifier (MPIDR_EL1 affinity)
    pub cpu_id: u64,

    /// Current running task (if any)
    pub current_task: Option<TaskHandle>,

    /// Per-CPU scheduler run queue
    pub run_queue: RunQueue,

    /// Per-CPU timer state
    pub timer: TimerState,

    /// Per-CPU GIC interface state
    pub gic_cpu: GicCpuState,

    /// Per-CPU exception stack pointer
    pub exception_stack: *mut u8,

    /// Boot synchronization flag
    pub ready: AtomicBool,

    /// Padding to fill cache line
    _pad: [u8; 16],
}

impl PerCpuData {
    /// Access current CPU's data via TPIDR_EL1 register
    pub fn current() -> &'static mut PerCpuData {
        let ptr: *mut PerCpuData;
        unsafe {
            asm!("mrs {}, TPIDR_EL1", out(reg) ptr);
            &mut *ptr
        }
    }
}

C.4 Memory Barriers and Synchronization

SMP correctness requires careful use of AArch64 memory barriers. The kernel enforces these invariants:

┌─────────────────────────────────────────────────────────────────────────┐
│                    MEMORY BARRIER USAGE GUIDE                            │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                          │
│  BARRIER        │ INSTRUCTION │ USE CASE                                 │
│  ═══════════════│═════════════│══════════════════════════════════════    │
│                 │             │                                          │
│  DMB (Data      │ dmb sy      │ Before reading shared data after         │
│  Memory         │ dmb ish     │ acquiring a lock. After writing          │
│  Barrier)       │ dmb ishst   │ shared data before releasing a lock.     │
│                 │             │                                          │
│  DSB (Data      │ dsb sy      │ Before executing WFI/WFE (wait for       │
│  Synchronization│ dsb ish     │ interrupt/event). After memory-mapped    │
│  Barrier)       │ dsb ishst   │ I/O writes to ensure completion.         │
│                 │             │                                          │
│  ISB (Instr.    │ isb         │ After modifying system registers         │
│  Synchronization│             │ (TTBR, VBAR, etc.). After self-          │
│  Barrier)       │             │ modifying code or cache maintenance.     │
│                 │             │                                          │
│  Shareability   │             │                                          │
│  Domains:       │             │                                          │
│    sy  = Full   │             │ All observers in the system              │
│    ish = Inner  │             │ Inner-shareable (all CPUs)               │
│    osh = Outer  │             │ Outer-shareable (clusters + DMA)         │
│                 │             │                                          │
└─────────────────────────────────────────────────────────────────────────┘

Spinlock Implementation

// ruvix-smp/src/spinlock.rs

/// Ticket spinlock with proper memory ordering
pub struct SpinLock<T> {
    next_ticket: AtomicU32,
    now_serving: AtomicU32,
    data: UnsafeCell<T>,
}

impl<T> SpinLock<T> {
    pub fn lock(&self) -> SpinLockGuard<'_, T> {
        // Acquire ticket atomically
        let ticket = self.next_ticket.fetch_add(1, Ordering::Relaxed);

        // Spin until our ticket is served
        while self.now_serving.load(Ordering::Acquire) != ticket {
            // WFE: wait for event, reduces power consumption
            unsafe { asm!("wfe") };
        }

        // Acquired — DMB ensures all prior writes are visible
        SpinLockGuard { lock: self }
    }
}

impl<T> Drop for SpinLockGuard<'_, T> {
    fn drop(&mut self) {
        // Release: increment now_serving with Release ordering
        self.lock.now_serving.fetch_add(1, Ordering::Release);

        // SEV: signal event to wake waiting CPUs
        unsafe { asm!("sev") };
    }
}

Inter-Processor Interrupts (IPIs)

// ruvix-smp/src/ipi.rs

/// IPI types used by the kernel
#[repr(u8)]
pub enum IpiType {
    /// Reschedule request — target CPU should check run queue
    Reschedule = 0,

    /// TLB shootdown — invalidate TLB entries for an address range
    TlbInvalidate = 1,

    /// Stop CPU — for kernel panic or shutdown
    Stop = 2,

    /// Function call — execute a closure on target CPU
    Call = 3,
}

pub fn send_ipi(target_cpu: usize, ipi_type: IpiType) {
    let gic = GicDistributor::get();

    // Use SGI (Software Generated Interrupt) for IPIs
    // SGI ID 0-15 are available for software use
    let sgi_id = ipi_type as u8;

    gic.send_sgi(target_cpu, sgi_id);
}

pub fn broadcast_ipi(ipi_type: IpiType, include_self: bool) {
    let gic = GicDistributor::get();
    let target = if include_self {
        SgiTarget::AllIncludingSelf
    } else {
        SgiTarget::AllExcludingSelf
    };

    gic.send_sgi_broadcast(target, ipi_type as u8);
}

C.5 DMA Controller Abstraction

// ruvix-dma/src/lib.rs

/// DMA transfer descriptor
pub struct DmaDescriptor {
    /// Source physical address (for mem-to-dev)
    pub src_addr: PhysAddr,

    /// Destination physical address (for dev-to-mem)
    pub dst_addr: PhysAddr,

    /// Transfer length in bytes
    pub length: usize,

    /// Transfer direction
    pub direction: DmaDirection,

    /// Completion callback (optional)
    pub callback: Option<fn(DmaResult)>,
}

#[derive(Clone, Copy)]
pub enum DmaDirection {
    MemToDevice,
    DeviceToMem,
    MemToMem,
}

/// DMA controller trait implemented by platform-specific drivers
pub trait DmaController {
    /// Allocate a DMA channel
    fn allocate_channel(&self) -> Result<DmaChannel, DmaError>;

    /// Submit a scatter-gather list for transfer
    fn submit_sg(
        &self,
        channel: DmaChannel,
        descriptors: &[DmaDescriptor],
    ) -> Result<DmaTransferId, DmaError>;

    /// Poll for completion (non-blocking)
    fn poll(&self, transfer_id: DmaTransferId) -> DmaStatus;

    /// Wait for completion (blocking)
    fn wait(&self, transfer_id: DmaTransferId) -> DmaResult;

    /// Cancel a pending transfer
    fn cancel(&self, transfer_id: DmaTransferId);
}

Scatter-Gather DMA for Zero-Copy Queue IPC

// Integration with ruvix-queue for zero-copy network packet reception

pub fn receive_packet_zero_copy(
    dma: &dyn DmaController,
    queue: &KernelQueue,
    device: &NetworkDevice,
) -> Result<(), DmaError> {
    // Allocate receive buffer from queue's shared region
    let buffer = queue.ring_region.allocate_slot()?;

    // Build DMA descriptor pointing directly to queue buffer
    let desc = DmaDescriptor {
        src_addr: device.rx_fifo_addr(),  // Device FIFO address
        dst_addr: buffer.phys_addr(),      // Queue buffer physical address
        length: MTU,
        direction: DmaDirection::DeviceToMem,
        callback: Some(|result| {
            // On completion, advance queue tail
            queue.advance_cq_tail(result.bytes_transferred);
        }),
    };

    // Submit DMA transfer — no CPU copying involved
    dma.submit_sg(channel, &[desc])?;

    Ok(())
}

C.6 Device Tree Parser

// ruvix-dtb/src/lib.rs

/// Parsed device tree structure
pub struct DeviceTree<'a> {
    root: DtNode<'a>,
    strings_block: &'a [u8],
    reserved_memory: Vec<MemoryRange>,
}

impl<'a> DeviceTree<'a> {
    /// Parse DTB from a raw pointer (passed by bootloader in x0)
    pub unsafe fn from_ptr(ptr: *const u8) -> Result<Self, DtbError> {
        let header = &*(ptr as *const FdtHeader);

        // Validate magic number
        if u32::from_be(header.magic) != FDT_MAGIC {
            return Err(DtbError::InvalidMagic);
        }

        // Parse structure block, strings block, reserved memory
        // ...
    }

    /// Enumerate all CPUs from /cpus node
    pub fn cpus(&self) -> impl Iterator<Item = CpuNode> + '_ {
        self.root
            .find_node("/cpus")
            .into_iter()
            .flat_map(|cpus| cpus.children())
            .filter(|n| n.name().starts_with("cpu@"))
            .map(CpuNode::from)
    }

    /// Enumerate memory regions from /memory nodes
    pub fn memory_regions(&self) -> impl Iterator<Item = MemoryRange> + '_ {
        self.root
            .find_nodes_by_type("memory")
            .flat_map(|n| n.reg_property())
    }

    /// Find a device by compatible string
    pub fn find_compatible(&self, compat: &str) -> Option<DtNode<'a>> {
        self.root.find_by_compatible(compat)
    }
}

/// CPU node information
pub struct CpuNode {
    pub cpu_id: u64,           // reg property
    pub enable_method: EnableMethod,
    pub spin_table_addr: Option<PhysAddr>,
    pub psci_method: Option<PsciMethod>,
}

pub enum EnableMethod {
    Psci,
    SpinTable,
}

C.7 Build Path (Weeks 43-54)

WeekMilestoneDeliverablesTests
43-44DTB parserruvix-dtb crate, CPU/memory enumerationUnit tests with real Pi4/Pi5 DTBs
45-46Per-CPU dataTPIDR_EL1 setup, per-CPU allocatorEach CPU accesses correct data
47-48SpinlocksTicket spinlock, reader-writer locksStress test across all CPUs
49-50Secondary bootPSCI CPU_ON, spin-table fallbackAll CPUs reach scheduler
51-52IPIsSGI-based IPIs, TLB shootdownCross-CPU reschedule, TLB invalidate
53-54DMA controllerGICv3 DMA, scatter-gatherZero-copy packet reception

Phase D: Raspberry Pi 4/5 Support

D.1 Hardware Differences

Phase D provides dedicated support for Raspberry Pi 4 (BCM2711) and Raspberry Pi 5 (BCM2712) hardware. These platforms differ significantly from the QEMU virt machine.

┌─────────────────────────────────────────────────────────────────────────┐
│                    RASPBERRY PI 4 vs PI 5 COMPARISON                     │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                          │
│  FEATURE            │ Raspberry Pi 4 (BCM2711)  │ Raspberry Pi 5 (BCM2712)│
│  ═══════════════════│═══════════════════════════│═════════════════════════│
│  CPU                │ 4x Cortex-A72 @ 1.8GHz    │ 4x Cortex-A76 @ 2.4GHz   │
│  RAM                │ 1/2/4/8 GB LPDDR4         │ 4/8 GB LPDDR4X           │
│  GPU                │ VideoCore VI              │ VideoCore VII            │
│                     │                           │                          │
│  PERIPHERAL BASE    │ 0xFE00_0000               │ 0x1F00_0000_0000         │
│  RAM BASE           │ 0x0000_0000               │ 0x0000_0000              │
│                     │                           │                          │
│  INTERRUPT          │ GIC-400 (GICv2)           │ GIC-500 (GICv3)          │
│  CONTROLLER         │ @ 0xFF84_0000             │ @ 0x10_7FFF_0000         │
│                     │                           │                          │
│  UART               │ PL011 @ 0xFE20_1000       │ PL011 @ 0x1F00_0003_0000 │
│                     │ Mini @ 0xFE21_5000        │                          │
│                     │                           │                          │
│  GPIO               │ @ 0xFE20_0000             │ @ 0x1F00_00D0_0000       │
│                     │                           │                          │
│  MAILBOX            │ @ 0xFE00_B880             │ @ 0x1F00_0000_0880       │
│  (VideoCore IPC)    │                           │                          │
│                     │                           │                          │
│  PCIe               │ 1x Gen 2.0 lane           │ 1x Gen 3.0 lane          │
│                     │                           │ (M.2 NVMe support)       │
│                     │                           │                          │
│  BOOT               │ Start4.elf → kernel8.img │ RP1 chip → kernel_2712.img│
│                     │                           │                          │
└─────────────────────────────────────────────────────────────────────────┘

Memory Maps

Raspberry Pi 4 (BCM2711):

RegionStart AddressEnd AddressSizePurpose
RAM (Low)0x0000_00000x3C00_0000960 MBMain memory (below GPU split)
Mailbox0xFE00_B8800xFE00_B8BF64 BVideoCore mailbox registers
GPIO0xFE20_00000xFE20_00F4244 BGPIO registers
UART00xFE20_10000xFE20_1FFF4 KBPL011 UART
Mini UART0xFE21_50000xFE21_507F128 BAux mini UART
GIC Dist0xFF84_10000xFF84_1FFF4 KBGIC-400 distributor
GIC CPU0xFF84_20000xFF84_2FFF4 KBGIC-400 CPU interface
Local Periph0xFF80_00000xFF80_0FFF4 KBPer-core timers, mailboxes

Raspberry Pi 5 (BCM2712):

RegionStart AddressEnd AddressSizePurpose
RAM0x0000_00000x1_0000_00004 GBMain memory
High Periph0x1F00_0000_00000x1F00_FFFF_FFFF4 GBPeripheral space
GIC-5000x10_7FFF_00000x10_7FFF_FFFF64 KBGICv3
RP1 Periph0x1F00_0000_0000variesvariesRP1 south bridge (USB, Ethernet)

D.2 New Crates

CratePurposeDependenciesLines of Code (Est.)
ruvix-bcm2711BCM2711/2712 SoC drivers (GPIO, mailbox, UART)ruvix-hal, ruvix-aarch64~1,200
ruvix-rpi-bootRPi-specific boot support (config.txt parsing, stub)ruvix-aarch64, ruvix-dtb~400

D.3 Boot Process

Raspberry Pi boot differs significantly from QEMU:

┌─────────────────────────────────────────────────────────────────────────┐
│                    RASPBERRY PI BOOT SEQUENCE                            │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                          │
│  STAGE 0: On-Chip ROM (Raspberry Pi 4/5)                                 │
│  ════════════════════════════════════════                                │
│  1. Power-on reset                                                       │
│  2. Load second-stage bootloader from SD/eMMC/USB                        │
│     - Pi 4: bootcode.bin (from boot partition)                           │
│     - Pi 5: RP1 chip handles initial boot                                │
│                                                                          │
│  STAGE 1: VideoCore Firmware                                             │
│  ════════════════════════════════════════                                │
│  1. Load start4.elf (or start.elf) from boot partition                   │
│  2. Parse config.txt for boot configuration                              │
│  3. Initialize DRAM, clocks, voltage                                     │
│  4. Load kernel8.img (AArch64 kernel) to 0x80000                         │
│  5. Load device tree (bcm2711-rpi-4-b.dtb) to memory                     │
│  6. Release ARM cores with DTB address in x0                             │
│                                                                          │
│  STAGE 2: RuVix Entry (kernel8.img)                                      │
│  ════════════════════════════════════════                                │
│  1. _start at 0x80000 (or address from config.txt)                       │
│  2. Parse DTB to discover hardware                                       │
│  3. Initialize UART for early console                                    │
│  4. Continue standard RuVix boot sequence                                │
│                                                                          │
└─────────────────────────────────────────────────────────────────────────┘

config.txt Configuration

# /boot/config.txt for RuVix on Raspberry Pi 4/5

# Enable 64-bit mode
arm_64bit=1

# Kernel filename
kernel=ruvix-kernel8.img

# Enable UART for early console
enable_uart=1

# Disable Bluetooth to free up PL011 UART
dtoverlay=disable-bt

# Fixed kernel load address
kernel_address=0x80000

# Pass DTB to kernel
device_tree=bcm2711-rpi-4-b.dtb

# Disable GPU boot splash
disable_splash=1

# Minimum GPU memory (more for kernel)
gpu_mem=16

# For Pi 5: use correct kernel name
[pi5]
kernel=ruvix-kernel_2712.img
device_tree=bcm2712-rpi-5-b.dtb

D.4 BCM2711/2712 Drivers

GPIO Driver

// ruvix-bcm2711/src/gpio.rs

const GPIO_BASE_PI4: usize = 0xFE20_0000;
const GPIO_BASE_PI5: usize = 0x1F00_00D0_0000;

pub struct BcmGpio {
    base: *mut u32,
}

impl BcmGpio {
    pub fn new(platform: Platform) -> Self {
        let base = match platform {
            Platform::Pi4 => GPIO_BASE_PI4,
            Platform::Pi5 => GPIO_BASE_PI5,
        } as *mut u32;

        Self { base }
    }

    /// Set pin function (input, output, alt0-5)
    pub fn set_function(&self, pin: u8, function: GpioFunction) {
        let reg = (pin / 10) as usize;
        let shift = ((pin % 10) * 3) as u32;

        unsafe {
            let ptr = self.base.add(reg);  // GPFSEL0-5
            let mut val = ptr.read_volatile();
            val &= !(0b111 << shift);
            val |= (function as u32) << shift;
            ptr.write_volatile(val);
        }
    }

    /// Set output pin high/low
    pub fn set(&self, pin: u8, high: bool) {
        let reg = if high { 7 } else { 10 };  // GPSET0 or GPCLR0
        let offset = (pin / 32) as usize;
        let bit = 1u32 << (pin % 32);

        unsafe {
            self.base.add(reg + offset).write_volatile(bit);
        }
    }

    /// Read input pin level
    pub fn read(&self, pin: u8) -> bool {
        let offset = (pin / 32) as usize;
        let bit = 1u32 << (pin % 32);

        unsafe {
            (self.base.add(13 + offset).read_volatile() & bit) != 0
        }
    }
}

#[repr(u32)]
pub enum GpioFunction {
    Input = 0b000,
    Output = 0b001,
    Alt0 = 0b100,
    Alt1 = 0b101,
    Alt2 = 0b110,
    Alt3 = 0b111,
    Alt4 = 0b011,
    Alt5 = 0b010,
}

VideoCore Mailbox

// ruvix-bcm2711/src/mailbox.rs

const MAILBOX_BASE_PI4: usize = 0xFE00_B880;

/// Property tag IDs for VideoCore communication
pub mod tags {
    pub const GET_BOARD_REVISION: u32 = 0x0001_0002;
    pub const GET_ARM_MEMORY: u32 = 0x0001_0005;
    pub const GET_VC_MEMORY: u32 = 0x0001_0006;
    pub const SET_POWER_STATE: u32 = 0x0002_8001;
    pub const GET_CLOCK_RATE: u32 = 0x0003_0002;
    pub const SET_CLOCK_RATE: u32 = 0x0003_8002;
}

pub struct Mailbox {
    base: *mut u32,
}

impl Mailbox {
    /// Send a property tag request to VideoCore
    pub fn call(&self, channel: u8, message: &mut PropertyMessage) -> Result<(), MailboxError> {
        // Ensure 16-byte alignment
        let ptr = message as *mut _ as usize;
        assert!(ptr & 0xF == 0, "Mailbox message must be 16-byte aligned");

        // Wait for mailbox to be ready
        while self.status() & MAILBOX_FULL != 0 {
            core::hint::spin_loop();
        }

        // Write message address (with channel in low 4 bits)
        let value = (ptr as u32) | (channel as u32);
        unsafe {
            self.base.add(8).write_volatile(value);  // WRITE register
        }

        // Wait for response
        loop {
            while self.status() & MAILBOX_EMPTY != 0 {
                core::hint::spin_loop();
            }

            let response = unsafe { self.base.read_volatile() };  // READ register
            if response & 0xF == channel as u32 {
                break;
            }
        }

        // Check response code
        if message.code == MAILBOX_RESPONSE_SUCCESS {
            Ok(())
        } else {
            Err(MailboxError::ResponseFailed)
        }
    }
}

D.5 Build Path (Weeks 55-62)

WeekMilestoneDeliverablesTests
55-56RPi boot stubLinker script, kernel8.img generation, UART outputBoot to console on Pi 4
57-58GPIO driverBCM GPIO, LED blink, button inputGPIO functional tests
59-60VideoCore mailboxProperty tags, memory query, clock controlQuery board info
61-62Pi 5 portBCM2712 addresses, GICv3, RP1Boot on Pi 5 hardware

Phase E: Networking and Filesystem

E.1 Network Stack Architecture

Phase E implements a minimal network stack optimized for agentic workloads: high-throughput, low-latency vector streaming and RVF package distribution.

┌─────────────────────────────────────────────────────────────────────────┐
│                      NETWORK STACK ARCHITECTURE                          │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                          │
│  ┌───────────────────────────────────────────────────────────────────┐   │
│  │                    RVF COMPONENT SPACE                             │   │
│  │  ┌─────────────┐  ┌─────────────┐  ┌─────────────────────────┐    │   │
│  │  │ AgentDB     │  │ RVF Package │  │ Mesh Coordinator        │    │   │
│  │  │ Replication │  │ Distributor │  │ (Demo 5)                │    │   │
│  │  └──────┬──────┘  └──────┬──────┘  └───────────┬─────────────┘    │   │
│  │         │                │                     │                  │   │
│  │         └────────────────┼─────────────────────┘                  │   │
│  │                          │ queue                                   │   │
│  └──────────────────────────┼────────────────────────────────────────┘   │
│                             │                                            │
│  ┌──────────────────────────┼────────────────────────────────────────┐   │
│  │  ruvix-net (RVF component)                                        │   │
│  │  ═════════════════════════                                        │   │
│  │  ┌─────────────────────────────────────────────────────────────┐  │   │
│  │  │                     SOCKET LAYER                             │  │   │
│  │  │   UdpSocket   │   TcpListener (future)   │   RawSocket      │  │   │
│  │  └─────────────────────────────────────────────────────────────┘  │   │
│  │                             │                                      │   │
│  │  ┌─────────────────────────────────────────────────────────────┐  │   │
│  │  │                    TRANSPORT LAYER                           │  │   │
│  │  │        UDP         │        ICMP        │     (TCP future)   │  │   │
│  │  └─────────────────────────────────────────────────────────────┘  │   │
│  │                             │                                      │   │
│  │  ┌─────────────────────────────────────────────────────────────┐  │   │
│  │  │                    NETWORK LAYER (IPv4)                      │  │   │
│  │  │   IP Header   │   Routing Table   │   ICMP Echo/Reply       │  │   │
│  │  └─────────────────────────────────────────────────────────────┘  │   │
│  │                             │                                      │   │
│  │  ┌─────────────────────────────────────────────────────────────┐  │   │
│  │  │                    LINK LAYER                                │  │   │
│  │  │   ARP Cache   │   Ethernet Frame   │   MAC Address          │  │   │
│  │  └─────────────────────────────────────────────────────────────┘  │   │
│  │                             │                                      │   │
│  └─────────────────────────────┼─────────────────────────────────────┘   │
│                                │ queue_send/recv                         │
│  ┌─────────────────────────────┼─────────────────────────────────────┐   │
│  │  KERNEL                     │                                      │   │
│  │  ═══════                    │                                      │   │
│  │  ┌─────────────────────────────────────────────────────────────┐  │   │
│  │  │                 NETWORK DEVICE DRIVER                        │  │   │
│  │  │   virtio-net (QEMU)  │   bcmgenet (Pi4)  │   RP1 (Pi5)      │  │   │
│  │  └─────────────────────────────────────────────────────────────┘  │   │
│  │                                                                    │   │
│  └────────────────────────────────────────────────────────────────────┘   │
│                                                                          │
└─────────────────────────────────────────────────────────────────────────┘

virtio-net Driver

// ruvix-drivers/src/virtio_net.rs

const VIRTIO_NET_F_MAC: u64 = 1 << 5;
const VIRTIO_NET_F_MRG_RXBUF: u64 = 1 << 15;

pub struct VirtioNet {
    common: VirtioCommon,
    rx_queue: VirtQueue,
    tx_queue: VirtQueue,
    mac_address: [u8; 6],
}

impl VirtioNet {
    pub fn init(base: *mut u8) -> Result<Self, VirtioError> {
        let common = VirtioCommon::new(base)?;

        // Negotiate features
        let features = common.read_features();
        let negotiated = features & (VIRTIO_NET_F_MAC | VIRTIO_NET_F_MRG_RXBUF);
        common.write_features(negotiated);

        // Initialize virtqueues
        let rx_queue = VirtQueue::new(&common, 0, 256)?;  // Queue 0 = RX
        let tx_queue = VirtQueue::new(&common, 1, 256)?;  // Queue 1 = TX

        // Read MAC address
        let mut mac = [0u8; 6];
        for i in 0..6 {
            mac[i] = common.read_config(i);
        }

        common.set_driver_ok();

        Ok(Self { common, rx_queue, tx_queue, mac_address: mac })
    }

    /// Receive a packet (DMA zero-copy into queue buffer)
    pub fn receive(&mut self, buffer: &mut [u8]) -> Result<usize, NetError> {
        let desc_id = self.rx_queue.pop_used()?;
        let len = self.rx_queue.get_used_len(desc_id);

        // Copy from virtqueue buffer to caller's buffer
        self.rx_queue.read_buffer(desc_id, buffer)?;

        // Recycle the descriptor
        self.rx_queue.add_available(desc_id);

        Ok(len)
    }

    /// Transmit a packet
    pub fn transmit(&mut self, packet: &[u8]) -> Result<(), NetError> {
        let desc_id = self.tx_queue.alloc_descriptor()?;

        self.tx_queue.write_buffer(desc_id, packet)?;
        self.tx_queue.add_available(desc_id);
        self.tx_queue.notify();

        Ok(())
    }
}

ARP Cache

// ruvix-net/src/arp.rs

pub struct ArpCache {
    entries: [Option<ArpEntry>; 64],
    pending: [Option<ArpPending>; 16],
}

pub struct ArpEntry {
    ip: Ipv4Addr,
    mac: [u8; 6],
    expires: Instant,
}

impl ArpCache {
    pub fn lookup(&self, ip: Ipv4Addr) -> Option<[u8; 6]> {
        self.entries.iter()
            .flatten()
            .find(|e| e.ip == ip && e.expires > Instant::now())
            .map(|e| e.mac)
    }

    pub fn insert(&mut self, ip: Ipv4Addr, mac: [u8; 6]) {
        let entry = ArpEntry {
            ip,
            mac,
            expires: Instant::now() + Duration::from_secs(300),
        };

        // Find empty slot or oldest entry
        let slot = self.entries.iter_mut()
            .enumerate()
            .min_by_key(|(_, e)| e.as_ref().map(|e| e.expires))
            .map(|(i, _)| i)
            .unwrap();

        self.entries[slot] = Some(entry);
    }
}

E.2 Filesystem Architecture

RuVix implements a minimal VFS layer optimized for RVF package storage and retrieval, not general-purpose file operations.

┌─────────────────────────────────────────────────────────────────────────┐
│                     FILESYSTEM ARCHITECTURE                              │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                          │
│  ┌───────────────────────────────────────────────────────────────────┐   │
│  │                           VFS LAYER                                │   │
│  │  ════════════════════════════════════════════════════════════════ │   │
│  │                                                                    │   │
│  │  open(path) → FileHandle       read(fh, buf) → usize              │   │
│  │  close(fh)                     write(fh, buf) → usize (limited)   │   │
│  │  stat(path) → Metadata         readdir(path) → [DirEntry]         │   │
│  │                                                                    │   │
│  └────────────────────────────┬──────────────────────────────────────┘   │
│                               │                                          │
│         ┌─────────────────────┼─────────────────────┐                    │
│         │                     │                     │                    │
│         ▼                     ▼                     ▼                    │
│  ┌─────────────┐      ┌─────────────┐      ┌─────────────┐               │
│  │   FAT32     │      │   RamFS     │      │  (Future)   │               │
│  │  (read-only)│      │    /tmp     │      │   ext4      │               │
│  ├─────────────┤      ├─────────────┤      ├─────────────┤               │
│  │ SD card     │      │ Kernel heap │      │ NVMe/USB    │               │
│  │ boot part.  │      │ regions     │      │             │               │
│  └─────────────┘      └─────────────┘      └─────────────┘               │
│                                                                          │
│  Mount points:                                                           │
│    /boot  → FAT32 (SD card boot partition, RVF packages)                │
│    /tmp   → RamFS (temporary files, checkpoints)                        │
│    /rvf   → Virtual (mounted RVF component namespaces)                  │
│                                                                          │
└─────────────────────────────────────────────────────────────────────────┘

VFS Layer

// ruvix-fs/src/vfs.rs

pub trait Filesystem: Send + Sync {
    fn mount(&mut self, device: &dyn BlockDevice) -> Result<(), FsError>;
    fn open(&self, path: &str, flags: OpenFlags) -> Result<FileHandle, FsError>;
    fn read(&self, handle: FileHandle, buf: &mut [u8]) -> Result<usize, FsError>;
    fn write(&self, handle: FileHandle, buf: &[u8]) -> Result<usize, FsError>;
    fn close(&self, handle: FileHandle);
    fn stat(&self, path: &str) -> Result<Metadata, FsError>;
    fn readdir(&self, path: &str) -> Result<Vec<DirEntry>, FsError>;
}

pub struct Vfs {
    mounts: BTreeMap<PathBuf, Box<dyn Filesystem>>,
}

impl Vfs {
    pub fn mount(&mut self, path: &str, fs: Box<dyn Filesystem>) -> Result<(), FsError> {
        self.mounts.insert(PathBuf::from(path), fs);
        Ok(())
    }

    pub fn open(&self, path: &str, flags: OpenFlags) -> Result<FileHandle, FsError> {
        let (mount_path, relative_path) = self.resolve_mount(path)?;
        let fs = self.mounts.get(mount_path).ok_or(FsError::NotFound)?;
        fs.open(relative_path, flags)
    }
}

FAT32 (Read-Only)

// ruvix-fs/src/fat32.rs

pub struct Fat32 {
    device: Box<dyn BlockDevice>,
    bpb: BiosParameterBlock,
    fat_start: u64,
    data_start: u64,
    root_cluster: u32,
}

impl Fat32 {
    pub fn mount(device: Box<dyn BlockDevice>) -> Result<Self, FsError> {
        let mut sector = [0u8; 512];
        device.read_block(0, &mut sector)?;

        let bpb = BiosParameterBlock::parse(&sector)?;

        // Validate FAT32 signature
        if bpb.sectors_per_fat_16 != 0 {
            return Err(FsError::InvalidFilesystem);
        }

        let fat_start = bpb.reserved_sectors as u64;
        let data_start = fat_start + (bpb.num_fats as u64 * bpb.sectors_per_fat_32 as u64);

        Ok(Self {
            device,
            bpb,
            fat_start,
            data_start,
            root_cluster: bpb.root_cluster,
        })
    }

    fn read_cluster(&self, cluster: u32, buf: &mut [u8]) -> Result<(), FsError> {
        let sector = self.cluster_to_sector(cluster);
        for i in 0..self.bpb.sectors_per_cluster {
            let offset = i as usize * 512;
            self.device.read_block(sector + i as u64, &mut buf[offset..offset + 512])?;
        }
        Ok(())
    }

    fn cluster_to_sector(&self, cluster: u32) -> u64 {
        self.data_start + ((cluster - 2) as u64 * self.bpb.sectors_per_cluster as u64)
    }
}

RamFS

// ruvix-fs/src/ramfs.rs

pub struct RamFs {
    root: RamFsNode,
    allocator: RegionAllocator,
}

enum RamFsNode {
    Directory {
        name: String,
        children: Vec<RamFsNode>,
    },
    File {
        name: String,
        data: RegionHandle,
        size: usize,
    },
}

impl Filesystem for RamFs {
    fn open(&self, path: &str, flags: OpenFlags) -> Result<FileHandle, FsError> {
        let node = self.resolve_path(path)?;
        match node {
            RamFsNode::File { data, size, .. } => {
                Ok(FileHandle::new(data.clone(), *size, flags))
            }
            RamFsNode::Directory { .. } => Err(FsError::IsDirectory),
        }
    }

    fn write(&self, handle: FileHandle, buf: &[u8]) -> Result<usize, FsError> {
        // RamFS supports writes for checkpoints
        let region = handle.region();
        region.write(handle.position(), buf)?;
        Ok(buf.len())
    }
}

E.3 New Crates

CratePurposeDependenciesLines of Code (Est.)
ruvix-netEthernet/IP/UDP/ICMP network stackruvix-types, ruvix-queue~1,500
ruvix-fsVFS layer, FAT32, RamFSruvix-types, ruvix-region~1,800

E.4 Build Path (Weeks 63-74)

WeekMilestoneDeliverablesTests
63-64Block device layerSD card driver, virtio-blkRead/write sectors
65-66FAT32 read-onlyBPB parsing, cluster chain, directory traversalRead files from SD
67-68VFS layerMount points, path resolution, file handlesMount FAT32 at /boot
69-70RamFSIn-memory filesystem for /tmpCheckpoint read/write
71-72Ethernet drivervirtio-net, bcmgenet (Pi4)Packet TX/RX
73-74IP/UDP/ICMPARP, IP routing, UDP sockets, pingPing reply, UDP echo

QEMU Swarm Simulation

Swarm.1 Architecture

The QEMU Swarm Simulation system enables testing of distributed RuVix deployments without physical hardware. Multiple QEMU instances run in parallel, connected via virtual networking.

┌─────────────────────────────────────────────────────────────────────────┐
│                    QEMU SWARM SIMULATION ARCHITECTURE                    │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                          │
│  ┌───────────────────────────────────────────────────────────────────┐   │
│  │                       SWARM ORCHESTRATOR                           │   │
│  │  ═══════════════════════════════════════════════════════════════  │   │
│  │                                                                    │   │
│  │   swarm-ctl                                                        │   │
│  │   ├── launch     : Start N QEMU nodes                              │   │
│  │   ├── connect    : Establish virtual network                       │   │
│  │   ├── deploy     : Push RVF packages to nodes                      │   │
│  │   ├── monitor    : Aggregate console output                        │   │
│  │   ├── inject     : Inject faults (network, disk, CPU)             │   │
│  │   └── teardown   : Graceful shutdown all nodes                     │   │
│  │                                                                    │   │
│  └──────────────────────────────┬────────────────────────────────────┘   │
│                                 │                                        │
│                     ┌───────────┼───────────┐                            │
│                     │           │           │                            │
│              ┌──────▼──────┐ ┌──▼───┐ ┌─────▼─────┐                      │
│              │   Node 0    │ │Node 1│ │  Node N   │                      │
│              │  (QEMU)     │ │(QEMU)│ │  (QEMU)   │                      │
│              │             │ │      │ │           │                      │
│              │ ┌─────────┐ │ │ ...  │ │ ┌───────┐ │                      │
│              │ │ RuVix   │ │ │      │ │ │RuVix  │ │                      │
│              │ │ Kernel  │ │ │      │ │ │Kernel │ │                      │
│              │ └────┬────┘ │ │      │ │ └───┬───┘ │                      │
│              │      │      │ │      │ │     │     │                      │
│              │ ┌────▼────┐ │ │      │ │ ┌───▼───┐ │                      │
│              │ │ virtio  │ │ │      │ │ │virtio │ │                      │
│              │ │   net   │ │ │      │ │ │ net   │ │                      │
│              │ └────┬────┘ │ │      │ │ └───┬───┘ │                      │
│              └──────┼──────┘ └──────┘ └─────┼─────┘                      │
│                     │                       │                            │
│              ┌──────▼───────────────────────▼──────┐                     │
│              │         VIRTUAL NETWORK              │                     │
│              │  ════════════════════════════════   │                     │
│              │                                      │                     │
│              │   TAP interfaces + bridge            │                     │
│              │   OR                                 │                     │
│              │   QEMU user-mode networking          │                     │
│              │   OR                                 │                     │
│              │   vde_switch for complex topologies  │                     │
│              │                                      │                     │
│              └──────────────────────────────────────┘                     │
│                                                                          │
└─────────────────────────────────────────────────────────────────────────┘

Orchestrator Implementation

// tools/swarm-ctl/src/main.rs

use std::process::{Command, Stdio};
use std::collections::HashMap;

pub struct SwarmOrchestrator {
    nodes: Vec<QemuNode>,
    network: VirtualNetwork,
    console_mux: ConsoleMux,
}

pub struct QemuNode {
    id: usize,
    process: std::process::Child,
    console: UnixStream,
    monitor: UnixStream,
    mac_address: [u8; 6],
    ip_address: Ipv4Addr,
}

impl SwarmOrchestrator {
    pub fn launch(&mut self, config: &SwarmConfig) -> Result<(), SwarmError> {
        // Create virtual network
        self.network = VirtualNetwork::create(&config.topology)?;

        for i in 0..config.node_count {
            let node = self.spawn_node(i, &config)?;
            self.nodes.push(node);
        }

        // Wait for all nodes to boot
        self.wait_for_boot_complete()?;

        Ok(())
    }

    fn spawn_node(&self, id: usize, config: &SwarmConfig) -> Result<QemuNode, SwarmError> {
        let mac = generate_mac(id);
        let console_sock = format!("/tmp/ruvix-swarm-{}-console.sock", id);
        let monitor_sock = format!("/tmp/ruvix-swarm-{}-monitor.sock", id);

        let mut cmd = Command::new("qemu-system-aarch64");
        cmd.args([
            "-machine", "virt,gic-version=3",
            "-cpu", "cortex-a72",
            "-m", &format!("{}M", config.memory_mb),
            "-smp", &format!("{}", config.cpus),
            "-kernel", &config.kernel_path,
            "-drive", &format!("file={},format=raw,if=pflash", config.rvf_path),
            // Networking
            "-netdev", &format!("tap,id=net0,ifname=tap{},script=no,downscript=no", id),
            "-device", &format!("virtio-net-pci,netdev=net0,mac={}", mac_to_string(&mac)),
            // Console
            "-chardev", &format!("socket,id=console,path={},server=on,wait=off", console_sock),
            "-serial", "chardev:console",
            // Monitor
            "-chardev", &format!("socket,id=monitor,path={},server=on,wait=off", monitor_sock),
            "-monitor", "chardev:monitor",
            // No graphics
            "-nographic",
        ]);

        let process = cmd
            .stdin(Stdio::null())
            .stdout(Stdio::null())
            .stderr(Stdio::null())
            .spawn()?;

        // Connect to console and monitor sockets
        std::thread::sleep(Duration::from_millis(500));
        let console = UnixStream::connect(&console_sock)?;
        let monitor = UnixStream::connect(&monitor_sock)?;

        Ok(QemuNode {
            id,
            process,
            console,
            monitor,
            mac_address: mac,
            ip_address: Ipv4Addr::new(10, 0, 0, (id + 1) as u8),
        })
    }
}

Swarm.2 Configuration

Node Resource Allocation

# swarm-config.yaml

cluster:
  name: "ruvix-test-cluster"
  node_count: 5

nodes:
  default:
    cpus: 2
    memory_mb: 512
    kernel: "target/aarch64-unknown-none/release/ruvix-kernel"
    rvf: "boot.rvf"

  # Override for specific nodes
  coordinator:
    index: 0
    cpus: 4
    memory_mb: 1024
    rvf: "coordinator.rvf"

  workers:
    indices: [1, 2, 3, 4]
    cpus: 2
    memory_mb: 512
    rvf: "worker.rvf"

Network Topologies

┌─────────────────────────────────────────────────────────────────────────┐
│                      SUPPORTED NETWORK TOPOLOGIES                        │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                          │
│  RING TOPOLOGY                    MESH TOPOLOGY                          │
│  ══════════════                   ══════════════                         │
│                                                                          │
│      ┌───┐                            ┌───┐                              │
│      │ 0 │◄──────┐                    │ 0 │                              │
│      └─┬─┘      │                    └─┬─┘                              │
│        │        │                  ╱   │   ╲                            │
│        ▼        │                ╱     │     ╲                          │
│      ┌───┐      │           ┌───┐    ┌─┴─┐    ┌───┐                      │
│      │ 1 │      │           │ 1 │◄──▶│   │◄──▶│ 2 │                      │
│      └─┬─┘      │           └─┬─┘    └───┘    └─┬─┘                      │
│        │        │             │  ╲           ╱  │                        │
│        ▼        │             │    ╲       ╱    │                        │
│      ┌───┐      │             │      ╲   ╱      │                        │
│      │ 2 │      │           ┌─┴─┐    ┌─┴─┐    ┌─┴─┐                      │
│      └─┬─┘      │           │ 3 │◄──▶│ 4 │◄──▶│   │                      │
│        │        │           └───┘    └───┘    └───┘                      │
│        ▼        │                                                        │
│      ┌───┐      │                                                        │
│      │ 3 │──────┘                                                        │
│      └───┘                                                               │
│                                                                          │
│  STAR TOPOLOGY                    HIERARCHICAL TOPOLOGY                  │
│  ══════════════                   ═════════════════════                  │
│                                                                          │
│      ┌───┐  ┌───┐                        ┌───┐                           │
│      │ 1 │  │ 2 │                        │ 0 │  (Coordinator)            │
│      └─┬─┘  └─┬─┘                        └─┬─┘                           │
│        │      │                        ╱   │   ╲                         │
│        └──┬───┘                      ╱     │     ╲                       │
│           ▼                        ╱       │       ╲                     │
│         ┌───┐                  ┌───┐     ┌───┐     ┌───┐                 │
│         │ 0 │  (Hub)           │ 1 │     │ 2 │     │ 3 │  (Leaders)      │
│         └─┬─┘                  └─┬─┘     └─┬─┘     └─┬─┘                 │
│        ╱  │  ╲                   │         │         │                   │
│      ╱    │    ╲              ┌──┴──┐   ┌──┴──┐   ┌──┴──┐                │
│  ┌───┐  ┌───┐  ┌───┐         │4│5│  │6│7│  │8│9│  (Workers)              │
│  │ 3 │  │ 4 │  │ 5 │         └─┴─┘  └─┴─┘  └─┴─┘                         │
│  └───┘  └───┘  └───┘                                                     │
│                                                                          │
└─────────────────────────────────────────────────────────────────────────┘
# Network topology configuration

network:
  topology: "mesh"  # ring | mesh | star | hierarchical

  # Mesh-specific: which nodes connect to which
  mesh:
    connectivity: 0.7  # 70% of possible edges exist

  # Star-specific: hub node index
  star:
    hub: 0

  # Hierarchical-specific
  hierarchical:
    coordinator: 0
    leaders: [1, 2, 3]
    workers_per_leader: 3

  # Network parameters
  latency_ms: 1
  bandwidth_mbps: 1000
  packet_loss_percent: 0.0  # For fault injection

Fault Injection Scenarios

# fault-injection.yaml

scenarios:
  network_partition:
    description: "Partition cluster into two halves"
    trigger: "manual"
    action:
      type: "network_partition"
      groups:
        - [0, 1, 2]
        - [3, 4]
    duration_s: 30

  node_crash:
    description: "Kill a random worker node"
    trigger: "random"
    probability: 0.01  # per second
    action:
      type: "kill_node"
      target: "random_worker"
    recovery_s: 10  # restart after

  network_delay:
    description: "Add latency to specific link"
    trigger: "manual"
    action:
      type: "add_latency"
      from: 0
      to: 1
      latency_ms: 100
      jitter_ms: 20
    duration_s: 60

  disk_failure:
    description: "Simulate disk I/O errors"
    trigger: "manual"
    action:
      type: "disk_error"
      target: 2
      error_rate: 0.1  # 10% of reads/writes fail
    duration_s: 30

Swarm.3 Use Cases

Distributed Consensus Testing

# Test Raft consensus under network partition
./swarm-ctl launch --config swarm-5-node.yaml
./swarm-ctl deploy --rvf consensus-test.rvf

# Wait for cluster formation
./swarm-ctl wait --condition "consensus_formed"

# Inject network partition
./swarm-ctl inject --scenario network_partition

# Verify:
# - Majority partition elects new leader
# - Minority partition becomes read-only
# - After healing, cluster converges

./swarm-ctl verify --invariant "no_split_brain"
./swarm-ctl inject --scenario heal_network

./swarm-ctl verify --invariant "single_leader"
./swarm-ctl verify --invariant "log_consistency"

RVF Package Deployment Across Cluster

# Deploy an updated RVF package to all nodes
./swarm-ctl launch --config swarm-10-node.yaml

# Deploy initial package
./swarm-ctl deploy --rvf agent-v1.rvf --all

# Rolling update to v2
./swarm-ctl deploy --rvf agent-v2.rvf --strategy rolling --batch-size 2

# Verify no service disruption
./swarm-ctl verify --invariant "service_available" --throughout-deploy

# Rollback if needed
./swarm-ctl rollback --to agent-v1.rvf --all

Performance Benchmarking Under Load

# Launch benchmark cluster
./swarm-ctl launch --config swarm-benchmark.yaml

# Deploy benchmark workload
./swarm-ctl deploy --rvf vector-benchmark.rvf

# Generate load
./swarm-ctl benchmark \
  --workload "vector_insert" \
  --rate 10000  \  # ops/sec
  --duration 60 \
  --collect-metrics

# Results:
# - Throughput: ops/sec achieved
# - Latency: p50, p95, p99 in microseconds
# - CPU utilization per node
# - Memory usage per node
# - Network bandwidth utilization

Swarm.4 Console Multiplexing

// tools/swarm-ctl/src/console.rs

pub struct ConsoleMux {
    nodes: Vec<NodeConsole>,
    output_mode: OutputMode,
}

pub enum OutputMode {
    /// All output interleaved with node prefixes
    Interleaved,
    /// Each node to separate file
    SeparateFiles,
    /// TUI with panes per node
    Tui,
}

impl ConsoleMux {
    pub fn run(&mut self) {
        match self.output_mode {
            OutputMode::Interleaved => {
                // Poll all consoles, prefix output with node ID
                loop {
                    for node in &mut self.nodes {
                        if let Some(line) = node.read_line() {
                            println!("[node-{}] {}", node.id, line);
                        }
                    }
                    std::thread::sleep(Duration::from_millis(10));
                }
            }
            OutputMode::Tui => {
                // Use crossterm/tui-rs for multi-pane display
                self.run_tui();
            }
            OutputMode::SeparateFiles => {
                // Write each node's output to node-N.log
                self.run_file_mode();
            }
        }
    }
}

Swarm.5 Build Path (Weeks 75-82)

WeekMilestoneDeliverablesTests
75-76QEMU launcherNode spawning, console sockets, monitor controlLaunch/teardown 5 nodes
77-78Virtual networkingTAP interfaces, bridge setup, vde_switchPing between nodes
79-80Orchestrator CLIswarm-ctl commands, config parsing, statusFull CLI functional
81-82Fault injectionNetwork partition, node kill, latency injectionConsensus survives faults

Phase Timeline Summary

┌─────────────────────────────────────────────────────────────────────────┐
│                    RUVIX DEVELOPMENT TIMELINE                            │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                          │
│  PHASE         │ WEEKS   │ DURATION │ KEY DELIVERABLES                  │
│  ══════════════│═════════│══════════│═══════════════════════════════    │
│                │         │          │                                    │
│  Phase A       │  1-18   │ 18 weeks │ Linux-hosted nucleus, 12 syscalls │
│  (Complete)    │         │          │ 760 tests passing                  │
│                │         │          │                                    │
│  Phase B       │ 19-42   │ 24 weeks │ Bare metal AArch64, QEMU virt      │
│                │         │          │ MMU, GIC, timer, WASM runtime      │
│                │         │          │                                    │
│  Phase C       │ 43-54   │ 12 weeks │ SMP (256 cores), DMA, DTB parser   │
│                │         │          │ Spinlocks, IPIs, per-CPU data      │
│                │         │          │                                    │
│  Phase D       │ 55-62   │  8 weeks │ Raspberry Pi 4/5 support           │
│                │         │          │ BCM2711/2712 drivers               │
│                │         │          │                                    │
│  Phase E       │ 63-74   │ 12 weeks │ Networking (UDP/ICMP), Filesystem  │
│                │         │          │ virtio-net, FAT32, RamFS           │
│                │         │          │                                    │
│  QEMU Swarm    │ 75-82   │  8 weeks │ Multi-QEMU orchestration           │
│                │         │          │ Fault injection, benchmarking      │
│                │         │          │                                    │
│  ══════════════│═════════│══════════│═══════════════════════════════    │
│  TOTAL         │  1-82   │ 82 weeks │ ~19 months                         │
│                │         │          │                                    │
└─────────────────────────────────────────────────────────────────────────┘