rf-detr.cpp Model Manifest

May 27, 2026 ยท View on GitHub

Generated 2026-05-27. All models converted from rfdetr 1.7.0 PyTorch checkpoints via scripts/convert_rfdetr_to_gguf.py (F32) and re-quantized in-place by the C++ quantizer (build/bin/rfdetr-cli quantize).

The .gguf files themselves are gitignored; this manifest is the canonical record of what should exist and at what size. To rebuild from scratch:

# 1. Convert PyTorch -> F32 GGUF (one per variant)
for v in nano small base medium large \
         seg-nano seg-small seg-medium seg-large seg-xlarge seg-2xlarge; do
    scripts/convert_rfdetr_to_gguf.py --variant "$v" --dtype f32 \
        --output "models/rfdetr-${v}-f32.gguf"
done

# 2. Materialize F16, Q8_0, Q4_K for every F32 source
scripts/build_all_quants.sh

All 44 models have been verified to load and run rfdetr-cli detect on /tmp/coco_sample.jpg without error.

Detection variants

VariantF32F16Q8_0Q4_K
Nano112.7 MB60.5 MB36.0 MB29.7 MB
Small119.0 MB64.0 MB38.2 MB31.2 MB
Base119.2 MB64.2 MB38.5 MB31.5 MB
Medium125.0 MB67.2 MB40.2 MB32.5 MB
Large125.9 MB68.2 MB41.1 MB33.4 MB

Segmentation variants

VariantF32F16Q8_0Q4_K
Seg-Nano127.1 MB67.8 MB39.9 MB31.8 MB
Seg-Small127.6 MB68.3 MB40.4 MB32.3 MB
Seg-Medium133.7 MB71.5 MB42.4 MB33.6 MB
Seg-Large134.3 MB72.2 MB43.1 MB34.3 MB
Seg-XLarge141.3 MB76.4 MB46.0 MB36.5 MB
Seg-2XLarge143.4 MB78.4 MB48.0 MB38.5 MB

Quant choice notes

  • F32: full precision reference, about 120 MB per variant.
  • F16: only matmul-multiplicand weights converted; non-matmul tensors (norms, conv kernels, embeddings) stay F32. Loader handles F16 pos_embed via bicubic resample in F32 (see commit 2145c7d).
  • Q8_0: best accuracy/size trade for production; about 3x smaller than F32.
  • Q4_K: smallest practical quant; rows with ne[0] % 256 != 0 (the decoder's 128-dim MLP halves, 60 tensors) silently fall back to Q8_0 per the C++ quantizer's logic. Net result is still about 3.8 to 4.0x compression.

Heavier K-quants (Q5_K, Q6_K) and the older legacy quants (Q4_0/Q4_1/Q5_0/Q5_1) are supported by rfdetr-cli quantize but are not part of the standard matrix; generate on demand if needed.