NNCF Compressed Model Zoo
March 12, 2026 · View on GitHub
Ready-to-use Compressed LLMs can be found on OpenVINO Hugging Face page. Each model card includes NNCF parameters that were used to compress the model.
INT8 Post-Training Quantization (PTQ) results for public Vision, NLP and GenAI models can be found on OpenVino Performance Benchmarks page. PTQ results for ONNX models are available in the ONNX section below.
PyTorch
PyTorch NLP (HuggingFace Transformers-powered models)
| PyTorch Model | Dataset | Accuracy (drop) % | |
|---|---|---|---|
| BERT-base-cased | • QAT: INT8 | CoNLL2003 | 99.18 (-0.01) |
| BERT-base-cased | • QAT: INT8 | MRPC | 84.8 (-0.24) |
| BERT-base-chinese | • QAT: INT8 | XNLI | 77.22 (0.46) |
| BERT-large (Whole Word Masking) |
• QAT: INT8 | SQuAD v1.1 | F1: 92.68 (0.53) |
| DistilBERT-base | • QAT: INT8 | SST-2 | 90.3 (0.8) |
| GPT-2 | • QAT: INT8 | WikiText-2 (raw) | perplexity: 20.9 (-1.17) |
| MobileBERT | • QAT: INT8 | SQuAD v1.1 | F1: 89.4 (0.58) |
| RoBERTa-large | • QAT: INT8 | MNLI | matched: 89.25 (1.35) |
ONNX
ONNX Classification
| ONNX Model | Compression algorithm | Dataset | Accuracy (drop) % |
|---|---|---|---|
| DenseNet-121 | PTQ | ImageNet | 60.16 (0.8) |
| GoogleNet | PTQ | ImageNet | 66.36 (0.3) |
| MobileNet V2 | PTQ | ImageNet | 71.38 (0.49) |
| ResNet-50 | PTQ | ImageNet | 74.63 (0.21) |
| ShuffleNet | PTQ | ImageNet | 47.25 (0.18) |
| SqueezeNet V1.0 | PTQ | ImageNet | 54.3 (0.54) |
| VGG‑16 | PTQ | ImageNet | 72.02 (0.0) |
ONNX Object Detection
| ONNX Model | Compression algorithm | Dataset | mAP (drop) % |
|---|---|---|---|
| SSD1200 | PTQ | COCO2017 | 20.17 (0.17) |
| Tiny-YOLOv2 | PTQ | VOC12 | 29.03 (0.23) |