README.md

March 11, 2026 ยท View on GitHub

PT2-LLM: Post-Training Ternarization for Large Language Models

arXiv Supplementary Material GitHub Stars Visitors License

ICLR 2026 ย |ย  Xianglong Yan, Chengzhu Bao, Zhiteng Li, Tianao Zhang, Haotong Qin, Ruobing Xie, Xingwu Sun, Yulun Zhang


๐Ÿ”ฅ News

  • 2026-01-26: PTยฒ-LLM is accepted at ICLR 2026. ๐ŸŽ‰
  • 2025-09-27: This repository is released.

๐Ÿ“– Abstract

Large Language Models (LLMs) have shown impressive capabilities across diverse tasks, but their large memory and compute demands hinder deployment. Ternarization has gained attention as a promising compression technique, delivering substantial size reduction and high computational efficiency. However, its potential in the post-training quantization (PTQ) setting remains underexplored, due to the challenge of training-free parameter optimization and the quantization difficulty posed by outliers and dispersed weights.

To address these issues, we propose PT2-LLM, a post-training ternarization framework tailored for LLMs. At its core is an Asymmetric Ternary Quantizer equipped with a two-stage refinement pipeline:

  1. Iterative Ternary Fitting (ITF) โ€” alternates between optimal ternary grid construction and flexible rounding to minimize quantization error.
  2. Activation-aware Grid Alignment (AGA) โ€” further refines the ternary grid to better match full-precision outputs.

In addition, we propose a plug-and-play Structural Similarity-based Reordering (SSR) strategy that leverages inter-column structural similarity to ease quantization and mitigate outlier effects, further enhancing overall performance.

Extensive experiments demonstrate that PT2-LLM delivers competitive performance against state-of-the-art (SOTA) 2-bit PTQ methods with lower memory cost, while also accelerating both prefill and decoding to achieve end-to-end speedup.

Method Overview


๐Ÿ“Š Results

LLaMA performance on 7 zero-shot Question Answering (QA) datasets. PT2-LLM yields the best accuracy at equal memory cost.

Teaser Results

Detailed comparison against SOTA 2-bit PTQ methods (click to expand)

Full Results Table


โš’๏ธ TODO

  • Release post-training ternarization code
  • Release quantized models
  • Results
  • Citation

๐Ÿ—‚๏ธ Contents


๐Ÿ“ Citation

If you find this work helpful in your research, please cite:

@article{yan2025pt2llmposttrainingternarizationlarge,
  title     = {PT$^2$-LLM: Post-Training Ternarization for Large Language Models},
  author    = {Xianglong Yan and Chengzhu Bao and Zhiteng Li and Tianao Zhang and Kaicheng Yang and Haotong Qin and Ruobing Xie and Xingwu Sun and Yulun Zhang},
  year      = {2025},
  eprint    = {2510.03267},
  archivePrefix = {arXiv},
  primaryClass  = {cs.LG},
  url       = {https://arxiv.org/abs/2510.03267},
}

๐Ÿ’ก Acknowledgements

This work is released under the Apache 2.0 License.