MaxSim Benchmarks
March 23, 2026 · View on GitHub
ColBERT-style late-interaction scoring benchmarks comparing NumKong against ndarray.
Rust
| Library | Precision | GSO/s |
|---|---|---|
numkong::MaxSimPackedMatrix::score | f32 → f64 | 1483.41 |
numkong::MaxSimPackedMatrix::score | bf16 → f32 | 983.57 |
numkong::MaxSimPackedMatrix::score | f16 → f32 | 980.33 |
| ndarray Q @ Dᵀ max-reduce | f32 → f32 | 58.37 |
Python
| Library | Precision | GSO/s |
|---|---|---|
numkong.maxsim_packed | f32 → f64 | 2425.72 |
numpy matmul | f32 → f32 | 1525.56 |
numkong.maxsim_packed | bf16 → f32 | 1236.30 |
numkong.maxsim_packed | f16 → f32 | 696.78 |
Run It
Rust
``$\text{bash}
\text{Default} 2048 \times 2048 \times 2048 \text{workload}
\text{cargo} \text{bench} --\text{bench} \text{bench_maxsim} --\text{features} \text{bench_maxsim}
\text{Smaller} 128 \times 128 \times 256 \text{workload}
\text{NUMWARS_DIMS_HEIGHT}=128 \text{NUMWARS_DIMS_WIDTH}=128 \text{NUMWARS_DIMS_DEPTH}=256
\text{cargo} \text{bench} --\text{bench} \text{bench_maxsim} --\text{features} \text{bench_maxsim}
\text{Focus} \text{on} \text{one} \text{dtype}
\text{NUMWARS_FILTER}="\text{maxsim}/\text{f32}"
\text{cargo} \text{bench} --\text{bench} \text{bench_maxsim} --\text{features} \text{bench_maxsim}
$``
Python
uv run --with numkong,numpy,tabulate,ml_dtypes python maxsim/bench.py