Allowing Oscillation Quantization: Overcoming Solution Space Limitation in Low Bit-Width Quantization
June 26, 2025 · View on GitHub
This repository containes the code of AOQ introduced in our work: "Allowing Oscillation Quantization: Overcoming Solution Space Limitation in Low Bit-Width Quantization"
In this work, we found that
- In traditional quantization-aware training (QAT) methods, weights infrequently cross quantization thresholds during training. This restricts the model to searching for quantized solutions only within a local region, making it prone to sub-optimal solutions.
- Moderate oscillations can actually serve as an effective exploration mechanism. By encouraging such oscillations in the early training phase, the model can escape local optima and explore better quantization configurations.
- By decoupling quantization thresholds and levels, the learnable quantization parameters exhibit greater stability during training.
Training Pipeline
-
For weight quantization, we define the intervals between quantization thresholds as and between quantization levels as , both initialized to , where is the standard deviation of weights in a layer, and is the quantization bit-width. In the first stage, we manually reduce and to appropriate values to encourage more weights to oscillate at the quantization thresholds.
-
In the second stage, we fix and set as a learnable parameter for optimization. As increases during training, the intervals between quantization levels widen, making the cost of mapping errors from floating-point weights to quantized values higher, thereby gradually reducing oscillation. The quantizer is:
thresh[i]andlevel[i]represent the (i+1)-th threshold and level, respectively, in ascending numerical order. The optimized quantized weights are denoted as:
-
In the third stage, we employ the Oscillation Dampening method outlined in [Nagel et al., 2022]. This method adds a penalty term to the loss function:
where is a hyperparameter. pulls weights oscillating near the quantization thresholds away to suppress oscillation.
Run
1. Requirements:
- python 3.9, pytorch>=2.0
- torchvision
- numpy
- timm=0.4.12
- pillow
- matplotlib
2. Run Command
bash run.sh architecture n_bits quantize_downsampling
Toy Example
Run Command
bash run.sh