NumKong Precision Tests
April 13, 2026 · View on GitHub
Custom test framework comparing NumKong kernels against high-precision f118_t double-double references.
Every kernel family has its own precision model with ULP-based error analysis.
C++
No GTest dependency — the framework is self-contained in test.hpp.
Building
cmake -B build_release -D CMAKE_BUILD_TYPE=Release -D NK_BUILD_TEST=1
cmake --build build_release --config Release --parallel
build_release/nk_test
To compile with BLAS cross-validation:
cmake -B build_release -D CMAKE_BUILD_TYPE=Release -D NK_BUILD_TEST=1 -D NK_COMPARE_TO_BLAS=1
Compiler requirements vary by ISA target — see CONTRIBUTING.md for the full table.
Running
build_release/nk_test --filter=dot # run only tests matching "dot"
build_release/nk_test --filter="dot|spatial" # regex filter
build_release/nk_test --assert # abort on first accuracy failure
build_release/nk_test --verbose # per-dimension ULP breakdown
build_release/nk_test --time-budget=5000 # 5 seconds per kernel (milliseconds)
Foreign flag mapping for muscle-memory compatibility:
| Foreign Flag | Maps To |
|---|---|
--gtest_filter=<regex> | forwards to --filter=<regex> with warning |
--benchmark_filter=<regex> | forwards to --filter=<regex> with warning |
--benchmark_min_time=<N>s | converts to ms and maps to --time-budget |
Environment Variables
| Variable | Default | Description |
|---|---|---|
NK_FILTER | .* | Regex to filter tests by name |
NK_SEED | 42 | RNG seed for reproducible inputs |
NK_BUDGET_SECS | 1 | Time budget per kernel in seconds |
NK_DENSE_DIMENSIONS | 1536 | Vector dimension for dot/spatial tests |
NK_CURVED_DIMENSIONS | 64 | Vector dimension for curved tests |
NK_SPARSE_DIMENSIONS | 256 | Vector dimension for sparse tests |
NK_MESH_POINTS | 1000 | Point count for mesh tests |
NK_MATRIX_HEIGHT | 1024 | GEMM M dimension |
NK_MATRIX_WIDTH | 128 | GEMM N dimension |
NK_MATRIX_DEPTH | 1536 | GEMM K dimension |
NK_MAX_COORD_ANGLE | 180 | Maximum angle in degrees for geospatial tests |
NK_IN_QEMU | unset | Relax accuracy thresholds for QEMU emulation |
NK_TEST_ASSERT | 0 | Assert (abort) on failed accuracy checks |
NK_TEST_VERBOSE | 0 | Show per-dimension ULP breakdown |
NK_ULP_THRESHOLD_F32 | 4 | Max allowed ULP distance for f32 |
NK_ULP_THRESHOLD_F16 | 32 | Max allowed ULP distance for f16 |
NK_ULP_THRESHOLD_BF16 | 256 | Max allowed ULP distance for bf16 |
NK_RANDOM_DISTRIBUTION | lognormal_k | Distribution: uniform_k, lognormal_k, cauchy_k |
NO_COLOR | unset | Disable colored output |
FORCE_COLOR | unset | Force colored output even without TTY |
Precision Families
Each kernel is assigned a comparison family that determines which error metrics are reported and what constitutes failure.
Families are defined in test.hpp as comparison_family_t.
All floating-point families report max_abs, max_rel, mean_ulp, max_ulp, and exact match counts; some also add mean_abs or mean_rel.
exact_k— integer and binary metrics: Hamming, Jaccard, set intersections, integer min/max. Reportsmax_dist,mean_dist,mismatch,exact. Fails on anymax_dist > 0.narrow_arithmetic_k— elementwise float ops: sum, scale, blend, fma, sin, cos, atan, cast. Fails onmax_ulp > NK_ULP_THRESHOLD_{F32,F16,BF16}.mixed_precision_reduction_k— reductions and dot products: dot, angular, euclidean, sqeuclidean, reduce_moments, reduce_minmax. Wider tolerance thannarrow_arithmetic_kdue to accumulation error.probability_k— probability divergences: KL, Jensen-Shannon. Also reportsmean_absandmean_rel.geospatial_k— geographic distances: Haversine, Vincenty. Also reportsmean_abs.external_baseline_k— cross-backend comparison against system BLAS — OpenBLAS, MKL, Accelerate.
Reference Baselines
C++ tests compare SIMD kernels against high-precision serial references.
The baseline type depends on the input dtype, selected by the reference_for<input, result> template in test.hpp.
- f32 and f64 inputs use
f118_t— a double-double type with ~103-bit mantissa, defined intypes.hpp. Twodoublevalues track a high and low component, capturing rounding errors that a singledoublewould lose. This is critical because f32 kernels use f64 accumulators internally — testing against plain f64 would not catch accumulation drift. - Complex f32c and f64c inputs use
f118c_t— a pair off118_tfor real and imaginary parts. - Half-precision, mini-floats, and integers — f16, bf16, e4m3, e5m2, i8, u8, etc. — use plain
f64_t. These types have at most 10-bit mantissas, so f64's 52-bit mantissa already provides >40 bits of headroom. - Complex halfs — f16c, bf16c — use
f64c_tfor the same reason.
WASM
Emscripten
source ~/emsdk/emsdk_env.sh
cmake -B build-wasm -DCMAKE_TOOLCHAIN_FILE=cmake/toolchain-wasm.cmake -DNK_BUILD_TEST=1
cmake --build build-wasm --parallel
For wasm64 — Memory64:
cmake -B build-wasm64 -DCMAKE_TOOLCHAIN_FILE=cmake/toolchain-wasm64.cmake -DNK_BUILD_TEST=1
cmake --build build-wasm64 --parallel
WASI
export WASI_SDK_PATH=~/wasi-sdk-24.0-x86_64-linux
cmake -B build-wasi -DCMAKE_TOOLCHAIN_FILE=cmake/toolchain-wasi.cmake -DNK_BUILD_TEST=1
cmake --build build-wasi --parallel
Running WASM Tests
wasmtime run -W simd=y,relaxed-simd=y,threads=y,shared-memory=y -S threads=y,inherit-env=y ./build-wasi/nk_test.wasm
wasmer run --enable-simd --enable-relaxed-simd ./build-wasi/nk_test.wasm
node ./build-wasm/nk_test.js
Memory Model
The toolchain files configure memory limits appropriate for each target:
| Target | Initial Memory | Max Memory | Pointer Width | Emscripten Version |
|---|---|---|---|---|
| wasm32 | 64 MB | 2 GB | 32-bit | 3.1.27+ |
| wasm64 | 256 MB | 16 GB | 64-bit | 3.1.35+ |
| WASI | — | 256 MB | 32-bit | — |
All three targets enable -msimd128 and -mrelaxed-simd automatically.
Stack size is 5 MB across all Emscripten targets.
WASI builds use wasm32-wasip1-threads with shared memory and -pthread for threading support.
SIMD and Relaxed SIMD Support
All WASM builds require fixed-width SIMD — 128-bit v128.
Relaxed SIMD instructions like f32x4.relaxed_madd and fused dot-product are used when available — the toolchain enables them unconditionally and the WASI test runner probes for support at startup via WebAssembly.validate().
Known-good minimum versions: Chrome 114+, Firefox 120+, Node.js 20+, Wasmtime 14+.
Safari supports v128 SIMD from 16.4 but has incomplete Relaxed SIMD coverage; Memory64 is not yet available in Safari or Wasmer.
For up-to-date engine support, see WebAssembly Roadmap and caniuse Relaxed SIMD.
Cross-Compilation
NumKong ships 8 toolchain files in cmake/ for cross-compiling to non-native targets.
Tests run transparently under QEMU via CMAKE_CROSSCOMPILING_EMULATOR.
Set NK_IN_QEMU=1 to relax half-precision accuracy thresholds under emulation.
ARM64 Linux
cmake -B build_arm64 -DCMAKE_TOOLCHAIN_FILE=cmake/toolchain-aarch64-gnu.cmake \
-DNK_BUILD_TEST=1
cmake --build build_arm64 --parallel
NK_IN_QEMU=1 ctest --test-dir build_arm64 # runs under qemu-aarch64 -cpu max
Default arch: armv9-a+sve2+fp16+bf16+i8mm+dotprod+fp16fml.
RISC-V 64 with GCC
export RISCV_TOOLCHAIN_PATH=/path/to/riscv-gnu-toolchain # optional
# export RISCV_SYSROOT=/path/to/riscv-sysroot # optional override
cmake -B build_riscv -DCMAKE_TOOLCHAIN_FILE=cmake/toolchain-riscv64-gnu.cmake \
-DNK_BUILD_TEST=1
cmake --build build_riscv --parallel
NK_IN_QEMU=1 ctest --test-dir build_riscv # runs under qemu-riscv64 -cpu max
Default arch: rv64gcv_zvfh_zvfbfwma_zvbb.
RISC-V 64 with LLVM
export RISCV_SYSROOT=/path/to/riscv-sysroot
export LLVM_ROOT=/path/to/llvm # optional
cmake -B build_riscv_llvm -DCMAKE_TOOLCHAIN_FILE=cmake/toolchain-riscv64-llvm.cmake \
-DNK_BUILD_TEST=1
cmake --build build_riscv_llvm --parallel
NK_IN_QEMU=1 ctest --test-dir build_riscv_llvm
Android ARM64
cmake -B build_android -DCMAKE_TOOLCHAIN_FILE=cmake/toolchain-android-arm64.cmake \
-DNK_BUILD_TEST=1
cmake --build build_android --parallel
adb push build_android/nk_test /data/local/tmp/
adb shell /data/local/tmp/nk_test
Rust
cargo test -p numkong
cargo test -p numkong -- --nocapture # with output
cargo test -p numkong --all-features # all optional features
cargo check -p numkong --no-default-features # no-std compatibility
WASM via Wasmtime
The wasm-runtime feature embeds a Wasmtime runtime to test WASM modules from within cargo test.
cargo test -p numkong --features wasm-runtime -- wasm_runtime
Python
pip install -e .
pip install pytest pytest-repeat pytest-randomly numpy scipy ml_dtypes tabulate
pytest test/ -s -x -Wd
Optional dependencies for extended test coverage:
| Package | What it unlocks |
|---|---|
numpy | Array interop, cdist, custom dtype registration |
scipy | Cross-validation against scipy.spatial.distance |
ml_dtypes | __array_interface__ fallback for bfloat16 / fp8 / fp6 |
tabulate | Formatted precision report tables |
Tests that require a missing optional dependency are skipped automatically.
pytest test/ -s -x -Wd -k dot # filter by name
pytest test/ -s -x -Wd -k "dot or spatial"
Environment Variables
| Variable | Default | Description |
|---|---|---|
NK_DENSE_DIMENSIONS | 1,2,3,...,1536 | Comma-separated vector dimensions |
NK_CURVED_DIMENSIONS | 11,97 | Dimensions for curved-space tests |
NK_MATRIX_HEIGHT | 1024 | GEMM M dimension |
NK_MATRIX_WIDTH | 128 | GEMM N dimension |
NK_MATRIX_DEPTH | 1536 | GEMM K dimension |
NK_SEED | OS entropy | Deterministic seed for np.random |
NK_REPETITIONS | 10 | Randomized test repeat count |
NK_IN_QEMU | unset | Relax accuracy thresholds |
NK_SPARSE_DIMENSIONS | 256 | Universe size for sparse tests |
NK_MESH_POINTS | 100 | Point count for mesh alignment tests |
NK_MAX_COORD_ANGLE | 180 | Maximum angle in degrees for geospatial |
The pytest-repeat plugin re-runs each test NK_REPETITIONS times with auto-seeding — each iteration gets a unique seed derived from the base NK_SEED, ensuring broader input coverage without sacrificing reproducibility.
Reference Baselines
Python tests use decimal.Decimal at 120-digit precision as ground truth for assertions.
Functions like precise_inner(), precise_sqeuclidean(), and precise_angular() convert each element to Decimal before accumulation, exceeding even f118_t accuracy.
A secondary NumPy baseline at native precision — f64 for floats, i64 for integers — is used for error statistics collection.
JavaScript
npm test # Node.js native addon
WASM Runtimes
JavaScript tests support multiple WASM runtimes via the NK_RUNTIME environment variable.
NK_RUNTIME=emscripten node --test test/test-wasm.mjs # Emscripten 32-bit
NK_RUNTIME=emscripten64 node --test test/test-wasm.mjs # Emscripten 64-bit, Memory64
NK_RUNTIME=wasi-node node --test test/test-wasm.mjs # WASI via Node.js
npx playwright test --config test/playwright.config.ts # Browser via Playwright
Environment Variables
| Variable | Default | Description |
|---|---|---|
NK_RUNTIME | native | Runtime: emscripten, emscripten64, wasi-node |
NK_SEED | 42 | Random seed for reproducible test data |
NK_DENSE_DIMENSIONS | 3,16,128,1536 | Comma-separated vector dimensions |
Swift
swift build && swift test -v
For iOS simulator testing:
xcodebuild test -scheme NumKong -destination 'platform=iOS Simulator,name=iPhone 16'
On Linux without a native Swift installation, use the official Docker image:
sudo docker run --rm -v "$PWD:/workspace" -w /workspace swift:5.9 \
/bin/bash -cl "swift build -c release --static-swift-stdlib && swift test -c release --enable-test-discovery"