First-Order Error Matters: Accurate Compensation for Quantized Large Language Models.

April 16, 2026 ยท View on GitHub

FOEM has been accepted at AAAI 2026.

We have completed the integration with GPTQModel.

Parts of this repository are now outdated, but we keep it available for developers who wish to debug or experiment with the algorithm.

The code snippets and results below are all obtained using GPTQModel.

Quant

from datasets import load_dataset
from gptqmodel import GPTQModel, QuantizeConfig, FOEMConfig

size = "8B"
model_id = f"Qwen/Qwen3-{size}"
quant_path = f"models/gptqmodel/Qwen3-{size}-foem-4bit"

calibration_dataset = load_dataset(
    "allenai/c4",
    data_files="en/c4-train.00001-of-01024.json.gz",
    split="train"
  ).select(range(256))["text"]

quant_config = QuantizeConfig(bits=4, group_size=128, foem=FOEMConfig(alpha=0, beta=0.2, device="auto"))

model = GPTQModel.load(model_id, quant_config)

model.quantize(calibration_dataset, batch_size=4)

model.save(quant_path)

Eval

lm-eval --model vllm --model_args pretrained=models/gptqmodel/Qwen3-8B-foem-4bit,tensor_parallel_size=1,dtype=auto,gpu_memory_utilization=0.6 --tasks wikitext --batch_size auto

Result

Note: The PPL evaluation on WikiText using lm-eval differs from that reported in our original paper.

ModelMethodBitsHyperparametersWikitext PPL
Qwen3-0.6BGPTQ4\30.0372
GPTAQ4alpha=0.2530.5776
FOEM (w/o GPTAQ)4alpha=0, beta=0.229.6199
FOEM (w/ GPTAQ)4alpha=0.25, beta=0.229.3823
Qwen3-8BGPTQ4\12.5488
GPTAQ4alpha=0.2512.7152
FOEM (w/o GPTAQ)4alpha=0, beta=0.212.5128
FOEM (w/ GPTAQ)4alpha=0.25, beta=0.212.6172

Citation

If you find this work useful, please cite:

@inproceedings{zheng2026first,
  title={First-order error matters: Accurate compensation for quantized large language models},
  author={Zheng, Xingyu and Qin, Haotong and Li, Yuye and Chu, Haoran and Wang, Jiakai and Guo, Jinyang and Magno, Michele and Liu, Xianglong},
  booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
  volume={40},
  number={34},
  pages={28883--28891},
  year={2026}
}