opt_rtn.md

October 22, 2025 · View on GitHub

🧮 Evaluation Results (LM-Eval)

For 2/3bit, we strongly recommend not using iter=0 except for GGUF:Q2_K_S which has a different quantization algorithm.

4BIT=W4A16 3BIT=W3A16 2BIT=W2A16G64

RTN mode

auto-round --model xxx --disable_opt_rtn --iters 0 

OPT RTN mode

auto-round --model xxx  --iters 0 
ModelRNT/OPTAVGHellaSwagLAMBADAMMLUPIQAWinoGrande
Meta-Llama-3.1-8B-InstructRTN-4BIT0.693280.58960.70130.65380.79870.7230
OPT-4BIT0.695600.58820.70740.66310.79160.7277
RTN-3BIT0.645620.54100.66950.54490.77420.6985
OPT-3BIT0.659700.54900.68930.57110.76770.7214
RTN-2BIT0.330080.29180.04740.23210.57400.5051
OPT-2BIT0.389080.32410.15600.28220.62350.5596
Qwen2.5-7B-InstructRTN-4BIT0.695600.61140.67130.70110.78780.7064
OPT-4BIT0.700340.61430.69450.71150.78450.6969
RTN-3BIT0.641440.55850.60920.64550.74760.6464
OPT-3BIT0.667640.57560.70130.65970.74810.6535
RTN-2BIT0.318560.28040.03510.23790.52560.5138
OPT-2BIT0.451460.36450.29920.40430.64150.5478
Qwen3-8BRTN-4BIT0.662400.56190.61500.70770.75730.6701
OPT-4BIT0.669920.56190.63460.71020.76330.6796
RTN-3BIT0.573220.49920.42600.60020.73610.6046
OPT-3BIT0.636980.52260.58140.67180.74370.6654
RTN-2BIT0.311500.26790.00410.25360.52830.5036
OPT-2BIT0.442540.37490.20050.42020.66700.5501
Qwen3-14BRTN-4BIT0.704480.59990.65110.75650.79980.7151
OPT-4BIT0.707980.60310.66270.75340.80090.7198
RTN-3BIT0.658760.57460.54670.70650.76280.7032
OPT-3BIT0.686100.56830.66330.72580.76990.7032
RTN-2BIT0.393980.37640.06070.38360.64800.5012
OPT-2BIT0.500800.45540.24510.48990.71380.5998