README.md
March 11, 2026 ยท View on GitHub
PT2-LLM: Post-Training Ternarization for Large Language Models
ICLR 2026 ย |ย Xianglong Yan, Chengzhu Bao, Zhiteng Li, Tianao Zhang, Haotong Qin, Ruobing Xie, Xingwu Sun, Yulun Zhang
๐ฅ News
- 2026-01-26: PTยฒ-LLM is accepted at ICLR 2026. ๐
- 2025-09-27: This repository is released.
๐ Abstract
Large Language Models (LLMs) have shown impressive capabilities across diverse tasks, but their large memory and compute demands hinder deployment. Ternarization has gained attention as a promising compression technique, delivering substantial size reduction and high computational efficiency. However, its potential in the post-training quantization (PTQ) setting remains underexplored, due to the challenge of training-free parameter optimization and the quantization difficulty posed by outliers and dispersed weights.
To address these issues, we propose PT2-LLM, a post-training ternarization framework tailored for LLMs. At its core is an Asymmetric Ternary Quantizer equipped with a two-stage refinement pipeline:
- Iterative Ternary Fitting (ITF) โ alternates between optimal ternary grid construction and flexible rounding to minimize quantization error.
- Activation-aware Grid Alignment (AGA) โ further refines the ternary grid to better match full-precision outputs.
In addition, we propose a plug-and-play Structural Similarity-based Reordering (SSR) strategy that leverages inter-column structural similarity to ease quantization and mitigate outlier effects, further enhancing overall performance.
Extensive experiments demonstrate that PT2-LLM delivers competitive performance against state-of-the-art (SOTA) 2-bit PTQ methods with lower memory cost, while also accelerating both prefill and decoding to achieve end-to-end speedup.
๐ Results
LLaMA performance on 7 zero-shot Question Answering (QA) datasets. PT2-LLM yields the best accuracy at equal memory cost.
Detailed comparison against SOTA 2-bit PTQ methods (click to expand)
โ๏ธ TODO
- Release post-training ternarization code
- Release quantized models
- Results
- Citation
๐๏ธ Contents
- Post-training ternarization code
- Pre-quantized models
- Results
- Citation
- Acknowledgements
๐ Citation
If you find this work helpful in your research, please cite:
@article{yan2025pt2llmposttrainingternarizationlarge,
title = {PT$^2$-LLM: Post-Training Ternarization for Large Language Models},
author = {Xianglong Yan and Chengzhu Bao and Zhiteng Li and Tianao Zhang and Kaicheng Yang and Haotong Qin and Ruobing Xie and Xingwu Sun and Yulun Zhang},
year = {2025},
eprint = {2510.03267},
archivePrefix = {arXiv},
primaryClass = {cs.LG},
url = {https://arxiv.org/abs/2510.03267},
}
๐ก Acknowledgements
This work is released under the Apache 2.0 License.