MoEQuant: Enhancing Quantization for Mixture-of-Experts Large Language Models via Expert-Balanced Sampling and Affinity Guidance
January 24, 2025 ยท View on GitHub

Overview
MoEQUANT, a novel post-training quantization framework tailored for Mixture-of-Experts (MoE) large language models, integrates Expert-Balanced Self-Sampling (EBSS) and Affinity-Guided Quantization (AGQ) to optimize both calibration and quantization processes. MoEQuant successfully quantizes MoE-based LLMs to low-bit precision with minimal accuracy loss, achieving near-floating-point performance and enhanced generalization across various models. This marks the first comprehensive PTQ solution specifically designed for MoE architectures.
This repository accompanies our ICML 2025 manuscript titled "Enhancing Quantization for Mixture-of-Experts Large Language Models via Expert-Balanced Sampling and Affinity Guidance".
Table of Contents
Features
- Expert-Balanced Self-Sampling:
- Affinity-Guided Quantization:
- Performance Optimization:
Installation
Coming Soon.
Detailed instructions for setting up the MoEQuant framework will be provided once the code is released. Stay tuned for updates!
Usage
Coming Soon.
Comprehensive usage examples and tutorials will be available with the code release to help you get started with MoEQuant effortlessly.
Contributing
We welcome contributions from the research and development community! Whether you're interested in improving the existing features, adding new functionalities, or reporting issues, your input is invaluable.
Citation
If you find MoEQuant useful in your research, please consider citing our paper:
@inproceedings{moequant2025,
title={MoEQuant: Enhancing Quantization for Mixture-of-Experts Large Language Models via Expert-Balanced Sampling and Affinity Guidance}
...
}