MaxSim Benchmarks

March 23, 2026 · View on GitHub

ColBERT-style late-interaction scoring benchmarks comparing NumKong against ndarray.

Rust

LibraryPrecisionGSO/s
numkong::MaxSimPackedMatrix::scoref32 → f641483.41
numkong::MaxSimPackedMatrix::scorebf16 → f32983.57
numkong::MaxSimPackedMatrix::scoref16 → f32980.33
ndarray Q @ Dᵀ max-reducef32 → f3258.37

Python

LibraryPrecisionGSO/s
numkong.maxsim_packedf32 → f642425.72
numpy matmulf32 → f321525.56
numkong.maxsim_packedbf16 → f321236.30
numkong.maxsim_packedf16 → f32696.78

Run It

Rust

``$\text{bash}

\text{Default} 2048 \times 2048 \times 2048 \text{workload}

\text{cargo} \text{bench} --\text{bench} \text{bench_maxsim} --\text{features} \text{bench_maxsim}

\text{Smaller} 128 \times 128 \times 256 \text{workload}

\text{NUMWARS_DIMS_HEIGHT}=128 \text{NUMWARS_DIMS_WIDTH}=128 \text{NUMWARS_DIMS_DEPTH}=256
\text{cargo} \text{bench} --\text{bench} \text{bench_maxsim} --\text{features} \text{bench_maxsim}

\text{Focus} \text{on} \text{one} \text{dtype}

\text{NUMWARS_FILTER}="\text{maxsim}/\text{f32}"
\text{cargo} \text{bench} --\text{bench} \text{bench_maxsim} --\text{features} \text{bench_maxsim} $``

Python

uv run --with numkong,numpy,tabulate,ml_dtypes python maxsim/bench.py