ADR-005: WASM Runtime Integration

February 14, 2026 · View on GitHub

FieldValue
StatusProposed
Date2026-01-18
AuthorsRuvLLM Architecture Team
Reviewers-
Supersedes-
Superseded by-

Note: The WASM runtime approach described here is complemented by ADR-029. The RVF WASM microkernel (rvf-wasm) provides a <8 KB Cognitum tile target that replaces ad-hoc WASM builds for vector operations.

1. Context

1.1 Problem Statement

RuvLLM requires a mechanism for executing user-provided and community-contributed compute kernels in a secure, sandboxed environment. These kernels implement performance-critical operations such as:

  • Rotary Position Embeddings (RoPE)
  • RMS Normalization (RMSNorm)
  • SwiGLU activation functions
  • KV cache quantization/dequantization
  • LoRA delta application

Without proper isolation, malicious or buggy kernels could:

  • Access unauthorized memory regions
  • Consume unbounded compute resources
  • Compromise the host system
  • Corrupt model state

1.2 Requirements

RequirementPriorityRationale
Sandboxed executionCriticalPrevent kernel code from accessing host resources
Execution budgetsCriticalPrevent runaway code and DoS conditions
Low overheadHighKernels are in the inference hot path
Cross-platformHighSupport x86, ARM, embedded devices
Framework agnosticMediumEnable ML inference without vendor lock-in
Hot-swappable kernelsMediumUpdate kernels without service restart

1.3 Constraints

  • Memory: Embedded targets have as little as 256KB RAM
  • Latency: Kernel invocation overhead must be <10us for small tensors
  • Compatibility: Must support existing Rust/C kernel implementations
  • Security: Kernel supply chain must be verifiable

2. Decision

We will adopt WebAssembly (WASM) as the sandboxed execution environment for compute kernels, with the following architecture:

2.1 Runtime Selection

Device ClassRuntimeRationale
Edge servers (x86/ARM64)WasmtimeMature, well-optimized, excellent tooling
Embedded/MCU (<1MB RAM)WAMR<85KB footprint, AOT compilation support
Browser/WASI Preview 2wasmtime/browserFuture consideration

2.2 Interruption Strategy: Epoch-Based (Not Fuel)

We choose epoch-based interruption over fuel-based metering:

AspectEpochFuel
Overhead~2-5%~15-30%
GranularityCoarse (polling points)Fine (per instruction)
DeterminismNon-deterministicDeterministic
ImplementationStore-level epoch counterInstruction instrumentation

Rationale: For inference workloads, coarse-grained interruption is acceptable. The 10-25% overhead reduction from avoiding fuel metering is significant for latency-sensitive operations.

// Epoch configuration example
let mut config = Config::new();
config.epoch_interruption(true);

let engine = Engine::new(&config)?;
let mut store = Store::new(&engine, ());

// Set epoch deadline (e.g., 100ms budget)
store.set_epoch_deadline(100);

// Increment epoch from async timer
engine.increment_epoch();

2.3 WASI-NN Integration

WASI-NN provides framework-agnostic ML inference capabilities:

+-------------------+
|   RuvLLM Host     |
+-------------------+
         |
         v
+-------------------+
|   WASI-NN API     |
+-------------------+
         |
    +----+----+
    |         |
    v         v
+-------+ +--------+
| ONNX  | | Custom |
| RT    | | Kernel |
+-------+ +--------+

WASI-NN Backends:

  • ONNX Runtime (portable)
  • Native kernels (performance-critical paths)
  • Custom quantized formats (memory efficiency)

3. WASM Boundary Design

3.1 ABI Strategy: Raw ABI (Not Component Model)

We use raw WASM ABI rather than the Component Model:

AspectRaw ABIComponent Model
MaturityStableEvolving (Preview 2)
OverheadMinimalHigher (canonical ABI)
ToolingExcellentImproving
AdoptionUniversalGrowing

Migration Path: Design interfaces to be Component Model-compatible for future migration.

3.2 Memory Layout

Host Linear Memory
+--------------------------------------------------+
| Tensor A    | Tensor B    | Output    | Scratch  |
| (read-only) | (read-only) | (write)   | (r/w)    |
+--------------------------------------------------+
     ^              ^            ^           ^
     |              |            |           |
   offset_a     offset_b    offset_out   offset_scratch

Shared Memory Protocol:

/// Kernel invocation descriptor passed to WASM
#[repr(C)]
pub struct KernelDescriptor {
    /// Input tensor A offset in linear memory
    pub input_a_offset: u32,
    /// Input tensor A size in bytes
    pub input_a_size: u32,
    /// Input tensor B offset (0 if unused)
    pub input_b_offset: u32,
    /// Input tensor B size in bytes
    pub input_b_size: u32,
    /// Output tensor offset
    pub output_offset: u32,
    /// Output tensor size in bytes
    pub output_size: u32,
    /// Scratch space offset
    pub scratch_offset: u32,
    /// Scratch space size in bytes
    pub scratch_size: u32,
    /// Kernel-specific parameters offset
    pub params_offset: u32,
    /// Kernel-specific parameters size
    pub params_size: u32,
}

3.3 Trap Handling

WASM traps are handled as non-fatal errors:

pub enum KernelError {
    /// Execution budget exceeded
    EpochDeadline,
    /// Out of bounds memory access
    MemoryAccessViolation {
        offset: u32,
        size: u32,
    },
    /// Integer overflow/underflow
    IntegerOverflow,
    /// Unreachable code executed
    Unreachable,
    /// Stack overflow
    StackOverflow,
    /// Invalid function call
    IndirectCallTypeMismatch,
    /// Custom trap from kernel
    KernelTrap {
        code: u32,
        message: Option<String>,
    },
}

impl From<wasmtime::Trap> for KernelError {
    fn from(trap: wasmtime::Trap) -> Self {
        match trap.trap_code() {
            Some(TrapCode::Interrupt) => KernelError::EpochDeadline,
            Some(TrapCode::MemoryOutOfBounds) => KernelError::MemoryAccessViolation {
                offset: 0, // Extract from trap info
                size: 0,
            },
            // ... other mappings
        }
    }
}

Recovery Strategy:

  1. Log trap with full context
  2. Release kernel resources
  3. Fall back to reference implementation (if available)
  4. Report degraded performance to metrics

4. Kernel Pack System

4.1 Kernel Pack Structure

kernel-pack-v1.0.0/
├── kernels.json          # Manifest
├── kernels.json.sig      # Ed25519 signature
├── rope/
│   ├── rope_f32.wasm
│   ├── rope_f16.wasm
│   └── rope_q8.wasm
├── rmsnorm/
│   ├── rmsnorm_f32.wasm
│   └── rmsnorm_f16.wasm
├── swiglu/
│   ├── swiglu_f32.wasm
│   └── swiglu_f16.wasm
├── kv/
│   ├── kv_pack_q4.wasm
│   ├── kv_pack_q8.wasm
│   ├── kv_unpack_q4.wasm
│   └── kv_unpack_q8.wasm
└── lora/
    ├── lora_apply_f32.wasm
    └── lora_apply_f16.wasm

4.2 Manifest Schema (kernels.json)

{
  "$schema": "https://ruvllm.dev/schemas/kernel-pack-v1.json",
  "version": "1.0.0",
  "name": "ruvllm-core-kernels",
  "description": "Core compute kernels for RuvLLM inference",
  "min_runtime_version": "0.5.0",
  "max_runtime_version": "1.0.0",
  "created_at": "2026-01-18T00:00:00Z",
  "author": {
    "name": "RuvLLM Team",
    "email": "kernels@ruvllm.dev",
    "signing_key": "ed25519:AAAA..."
  },
  "kernels": [
    {
      "id": "rope_f32",
      "name": "Rotary Position Embedding (FP32)",
      "category": "positional_encoding",
      "path": "rope/rope_f32.wasm",
      "hash": "sha256:abc123...",
      "entry_point": "rope_forward",
      "inputs": [
        {
          "name": "x",
          "dtype": "f32",
          "shape": ["batch", "seq", "heads", "dim"]
        },
        {
          "name": "freqs",
          "dtype": "f32",
          "shape": ["seq", "dim_half"]
        }
      ],
      "outputs": [
        {
          "name": "y",
          "dtype": "f32",
          "shape": ["batch", "seq", "heads", "dim"]
        }
      ],
      "params": {
        "theta": {
          "type": "f32",
          "default": 10000.0
        }
      },
      "resource_limits": {
        "max_memory_pages": 256,
        "max_epoch_ticks": 1000,
        "max_table_elements": 1024
      },
      "platforms": {
        "wasmtime": {
          "min_version": "15.0.0",
          "features": ["simd", "bulk-memory"]
        },
        "wamr": {
          "min_version": "1.3.0",
          "aot_available": true
        }
      },
      "benchmarks": {
        "seq_512_dim_128": {
          "latency_us": 45,
          "throughput_gflops": 2.1
        }
      }
    }
  ],
  "fallbacks": {
    "rope_f32": "rope_reference",
    "rmsnorm_f32": "rmsnorm_reference"
  }
}

4.3 Included Kernel Packs

CategoryKernelsNotes
PositionalRoPE (f32, f16, q8)Rotary embeddings
NormalizationRMSNorm (f32, f16)Pre-attention normalization
ActivationSwiGLU (f32, f16)Gated activation
KV Cachepack_q4, pack_q8, unpack_q4, unpack_q8Quantize/dequantize
AdapterLoRA apply (f32, f16)Delta weight application

Attention Note: Attention kernels remain native initially due to:

  • Complex memory access patterns
  • Heavy reliance on hardware-specific optimizations (Flash Attention, xformers)
  • Significant overhead from WASM boundary crossing for large tensors

5. Supply Chain Security

5.1 Signature Verification

use ed25519_dalek::{Signature, VerifyingKey, Verifier};

pub struct KernelPackVerifier {
    trusted_keys: Vec<VerifyingKey>,
}

impl KernelPackVerifier {
    /// Verify kernel pack signature
    pub fn verify(&self, manifest: &[u8], signature: &[u8]) -> Result<(), VerifyError> {
        let sig = Signature::try_from(signature)?;

        for key in &self.trusted_keys {
            if key.verify(manifest, &sig).is_ok() {
                return Ok(());
            }
        }

        Err(VerifyError::NoTrustedKey)
    }

    /// Verify individual kernel hash
    pub fn verify_kernel(&self, kernel_bytes: &[u8], expected_hash: &str) -> Result<(), VerifyError> {
        use sha2::{Sha256, Digest};

        let mut hasher = Sha256::new();
        hasher.update(kernel_bytes);
        let hash = format!("sha256:{:x}", hasher.finalize());

        if hash == expected_hash {
            Ok(())
        } else {
            Err(VerifyError::HashMismatch {
                expected: expected_hash.to_string(),
                actual: hash,
            })
        }
    }
}

5.2 Version Compatibility Gates

pub struct CompatibilityChecker {
    runtime_version: Version,
}

impl CompatibilityChecker {
    pub fn check(&self, manifest: &KernelManifest) -> CompatibilityResult {
        // Check runtime version bounds
        if self.runtime_version < manifest.min_runtime_version {
            return CompatibilityResult::RuntimeTooOld {
                required: manifest.min_runtime_version.clone(),
                actual: self.runtime_version.clone(),
            };
        }

        if self.runtime_version > manifest.max_runtime_version {
            return CompatibilityResult::RuntimeTooNew {
                max_supported: manifest.max_runtime_version.clone(),
                actual: self.runtime_version.clone(),
            };
        }

        // Check WASM feature requirements
        for kernel in &manifest.kernels {
            if let Some(platform) = kernel.platforms.get("wasmtime") {
                for feature in &platform.features {
                    if !self.has_feature(feature) {
                        return CompatibilityResult::MissingFeature {
                            kernel: kernel.id.clone(),
                            feature: feature.clone(),
                        };
                    }
                }
            }
        }

        CompatibilityResult::Compatible
    }
}

5.3 Safe Rollback Protocol

pub struct KernelManager {
    active_pack: Arc<RwLock<KernelPack>>,
    previous_pack: Arc<RwLock<Option<KernelPack>>>,
    metrics: KernelMetrics,
}

impl KernelManager {
    /// Upgrade to new kernel pack with automatic rollback on failure
    pub async fn upgrade(&self, new_pack: KernelPack) -> Result<(), UpgradeError> {
        // Step 1: Verify new pack
        self.verifier.verify(&new_pack)?;
        self.compatibility.check(&new_pack.manifest)?;

        // Step 2: Compile kernels (AOT if supported)
        let compiled = self.compile_pack(&new_pack).await?;

        // Step 3: Atomic swap with rollback capability
        {
            let mut active = self.active_pack.write().await;
            let mut previous = self.previous_pack.write().await;

            // Store current as rollback target
            *previous = Some(std::mem::replace(&mut *active, compiled));
        }

        // Step 4: Health check with new kernels
        if let Err(e) = self.health_check().await {
            tracing::error!("Kernel health check failed: {}", e);
            self.rollback().await?;
            return Err(UpgradeError::HealthCheckFailed(e));
        }

        // Step 5: Clear rollback after grace period
        tokio::spawn({
            let previous = self.previous_pack.clone();
            async move {
                tokio::time::sleep(Duration::from_secs(300)).await;
                *previous.write().await = None;
            }
        });

        Ok(())
    }

    /// Rollback to previous kernel pack
    pub async fn rollback(&self) -> Result<(), RollbackError> {
        let mut active = self.active_pack.write().await;
        let mut previous = self.previous_pack.write().await;

        if let Some(prev) = previous.take() {
            *active = prev;
            tracing::info!("Rolled back to previous kernel pack");
            Ok(())
        } else {
            Err(RollbackError::NoPreviousPack)
        }
    }
}

6. Device Class Configurations

6.1 Edge Server Configuration (Wasmtime + Epoch)

pub fn create_server_runtime() -> Result<WasmRuntime, RuntimeError> {
    let mut config = Config::new();

    // Performance optimizations
    config.cranelift_opt_level(OptLevel::Speed);
    config.cranelift_nan_canonicalization(false);
    config.parallel_compilation(true);

    // SIMD support for vectorized operations
    config.wasm_simd(true);
    config.wasm_bulk_memory(true);
    config.wasm_multi_value(true);

    // Memory configuration
    config.static_memory_maximum_size(1 << 32); // 4GB max
    config.dynamic_memory_guard_size(1 << 16);  // 64KB guard

    // Epoch-based interruption
    config.epoch_interruption(true);

    let engine = Engine::new(&config)?;

    Ok(WasmRuntime {
        engine,
        epoch_tick_interval: Duration::from_millis(10),
        default_epoch_budget: 1000, // 10 seconds max
    })
}

6.2 Embedded Configuration (WAMR AOT)

pub fn create_embedded_runtime() -> Result<WamrRuntime, RuntimeError> {
    let mut config = WamrConfig::new();

    // Minimal footprint configuration
    config.set_stack_size(32 * 1024);        // 32KB stack
    config.set_heap_size(128 * 1024);        // 128KB heap
    config.enable_aot(true);                  // Pre-compiled modules
    config.enable_simd(false);                // Often unavailable on MCU
    config.enable_bulk_memory(true);

    // Interpreter fallback for debugging
    config.enable_interp(cfg!(debug_assertions));

    // Execution limits
    config.set_exec_timeout_ms(100);          // 100ms max per invocation

    Ok(WamrRuntime::new(config)?)
}

6.3 WASI Threads (Optional)

For platforms supporting WASI threads:

pub fn create_threaded_runtime() -> Result<WasmRuntime, RuntimeError> {
    let mut config = Config::new();

    // Enable threading support
    config.wasm_threads(true);
    config.wasm_shared_memory(true);

    // Thread pool configuration
    config.async_support(true);
    config.max_wasm_threads(4);

    let engine = Engine::new(&config)?;

    Ok(WasmRuntime {
        engine,
        thread_pool_size: 4,
    })
}

Platform Support Matrix:

PlatformWASI ThreadsNotes
Linux x86_64YesFull support
Linux ARM64YesFull support
macOSYesFull support
WindowsYesFull support
WAMRNoSingle-threaded only
BrowserYesVia SharedArrayBuffer

7. Performance Considerations

7.1 Invocation Overhead

OperationLatencyNotes
Kernel lookup~100nsHash table lookup
Instance creation~1usPre-compiled module
Memory setup~500nsShared memory mapping
Epoch check~2nsSingle atomic read
Return value~100nsRegister transfer
Total~2usPer invocation

7.2 Optimization Strategies

  1. Module Caching: Pre-compile and cache WASM modules
  2. Instance Pooling: Reuse instances across invocations
  3. Memory Sharing: Map host tensors directly into WASM linear memory
  4. Batch Invocations: Process multiple requests per kernel call

7.3 When to Bypass WASM

WASM sandboxing should be bypassed (with explicit opt-in) for:

  • Attention kernels (complex memory patterns)
  • Large matrix multiplications (>1000x1000)
  • Operations with <1ms latency requirements
  • Trusted, verified native kernels

8. Alternatives Considered

8.1 eBPF

AspecteBPFWASM
PlatformLinux onlyCross-platform
VerificationStatic, strictDynamic, flexible
Memory modelConstrainedLinear memory
ToolingImprovingMature

Decision: WASM chosen for cross-platform support.

8.2 Lua/LuaJIT

AspectLuaWASM
PerformanceGood (JIT)Excellent (AOT)
SandboxingManual effortBuilt-in
Type safetyDynamicStatic
EcosystemLargeGrowing

Decision: WASM chosen for type safety and native compilation.

8.3 Native Plugins with seccomp

AspectseccompWASM
IsolationProcess-levelIn-process
OverheadIPC costMinimal
PortabilityLinux onlyCross-platform
ComplexityHighModerate

Decision: WASM chosen for in-process efficiency and portability.

9. Consequences

9.1 Positive

  • Security: Strong isolation prevents kernel code from compromising host
  • Portability: Same kernels run on servers and embedded devices
  • Hot Updates: Kernels can be updated without service restart
  • Ecosystem: Large WASM toolchain and community support
  • Auditability: WASM modules can be inspected and verified

9.2 Negative

  • Overhead: ~2us per invocation vs. native direct call
  • Complexity: Additional abstraction layer to maintain
  • Tooling: WASM debugging tools less mature than native
  • Learning Curve: Team needs WASM expertise

9.3 Risks

RiskLikelihoodImpactMitigation
Performance regressionMediumHighBenchmark suite, native fallbacks
WASI-NN instabilityLowMediumAbstract behind internal API
Supply chain attackLowCriticalSignature verification, trusted keys
Epoch timing variabilityMediumLowGenerous budgets, monitoring

10. Implementation Plan

Phase 1: Foundation (Weeks 1-2)

  • Set up Wasmtime integration
  • Implement kernel descriptor ABI
  • Create basic kernel loader

Phase 2: Core Kernels (Weeks 3-4)

  • Implement RoPE kernel
  • Implement RMSNorm kernel
  • Implement SwiGLU kernel

Phase 3: KV Cache (Weeks 5-6)

  • Implement quantization kernels
  • Implement dequantization kernels
  • Integration with cache manager

Phase 4: Security (Weeks 7-8)

  • Implement signature verification
  • Create version compatibility checker
  • Build rollback system

Phase 5: Embedded (Weeks 9-10)

  • WAMR integration
  • AOT compilation pipeline
  • Resource-constrained testing

11. References

12. Appendix

A. Kernel Interface Definition

/// Standard kernel interface (exported by WASM modules)
#[link(wasm_import_module = "ruvllm")]
extern "C" {
    /// Initialize kernel with parameters
    fn kernel_init(params_ptr: *const u8, params_len: u32) -> i32;

    /// Execute kernel forward pass
    fn kernel_forward(desc_ptr: *const KernelDescriptor) -> i32;

    /// Execute kernel backward pass (optional)
    fn kernel_backward(desc_ptr: *const KernelDescriptor) -> i32;

    /// Get kernel metadata
    fn kernel_info(info_ptr: *mut KernelInfo) -> i32;

    /// Cleanup kernel resources
    fn kernel_cleanup() -> i32;
}

B. Error Codes

CodeNameDescription
0OKSuccess
1INVALID_INPUTInvalid input tensor
2INVALID_OUTPUTInvalid output tensor
3INVALID_PARAMSInvalid kernel parameters
4OUT_OF_MEMORYInsufficient memory
5NOT_IMPLEMENTEDOperation not supported
6INTERNAL_ERRORInternal kernel error

C. Benchmark Template

#[cfg(test)]
mod benchmarks {
    use criterion::{criterion_group, criterion_main, Criterion};

    fn bench_rope_f32(c: &mut Criterion) {
        let runtime = create_server_runtime().unwrap();
        let kernel = runtime.load_kernel("rope_f32").unwrap();

        let input = Tensor::random([1, 512, 32, 128], DType::F32);
        let freqs = Tensor::random([512, 64], DType::F32);

        c.bench_function("rope_f32_seq512", |b| {
            b.iter(|| {
                kernel.forward(&input, &freqs).unwrap()
            })
        });
    }

    criterion_group!(benches, bench_rope_f32);
    criterion_main!(benches);
}

  • ADR-001: Ruvector Core Architecture
  • ADR-002: RuvLLM Integration
  • ADR-003: SIMD Optimization Strategy
  • ADR-007: Security Review & Technical Debt

Security Status (v2.1)

ComponentStatusNotes
SharedArrayBuffer✅ SecureSafety documentation for race conditions
WASM Memory✅ SecureBounds checking via WASM sandbox
Kernel Loading⚠️ PlannedSignature verification pending

Fixes Applied:

  • Added comprehensive safety comments documenting race condition prevention in shared.rs
  • JavaScript/WASM coordination patterns documented

Outstanding Items:

  • TD-007 (P2): Embedded JavaScript should be extracted to separate files

See ADR-007 for full security audit trail.


Revision History

VersionDateAuthorChanges
1.02026-01-18RuVector Architecture TeamInitial version
1.12026-01-19Security Review AgentAdded security status, related decisions