MaxSim Benchmarks

March 23, 2026 · View on GitHub

ColBERT-style late-interaction scoring benchmarks comparing NumKong against ndarray.

Rust

Library	Precision	GSO/s
`numkong::MaxSimPackedMatrix::score`	f32 → f64	1483.41
`numkong::MaxSimPackedMatrix::score`	bf16 → f32	983.57
`numkong::MaxSimPackedMatrix::score`	f16 → f32	980.33
ndarray Q @ Dᵀ max-reduce	f32 → f32	58.37

Python

Library	Precision	GSO/s
`numkong.maxsim_packed`	f32 → f64	2425.72
`numpy` matmul	f32 → f32	1525.56
`numkong.maxsim_packed`	bf16 → f32	1236.30
`numkong.maxsim_packed`	f16 → f32	696.78

Run It

Rust

``$\text{bash}

\text{Default} 2048 \times 2048 \times 2048 \text{workload}

\text{cargo} \text{bench} --\text{bench} \text{bench_maxsim} --\text{features} \text{bench_maxsim}

\text{Smaller} 128 \times 128 \times 256 \text{workload}

\text{NUMWARS_DIMS_HEIGHT}=128 \text{NUMWARS_DIMS_WIDTH}=128 \text{NUMWARS_DIMS_DEPTH}=256
\text{cargo} \text{bench} --\text{bench} \text{bench_maxsim} --\text{features} \text{bench_maxsim}

\text{Focus} \text{on} \text{one} \text{dtype}

\text{NUMWARS_FILTER}="\text{maxsim}/\text{f32}"
\text{cargo} \text{bench} --\text{bench} \text{bench_maxsim} --\text{features} \text{bench_maxsim} $``

Python

uv run --with numkong,numpy,tabulate,ml_dtypes python maxsim/bench.py