README.md

March 14, 2025 ยท View on GitHub

IPAD

IPAD, iteratively pruning and distillation to shrink model size.

News or Update ๐Ÿ”ฅ

  • [2024/05] We relase our code for IPAD.

Models we support

  • LLAMA
  • GLM
  • OPT

Introduction

Installation

  1. Clone this repository and navigate to PainlessInferenceAcceleration
git clone https://github.com/alipay/PainlessInferenceAcceleration.git
cd PainlessInferenceAcceleration/ipad
  1. Install Package
python setup.py install

Quick Start

Examples can be found in examples.

Citations

@inproceedings{10.1145/3589335.3648321, author = {Wang, Maolin and Zhao, Yao and Liu, Jiajia and Chen, Jingdong and Zhuang, Chenyi and Gu, Jinjie and Guo, Ruocheng and Zhao, Xiangyu}, title = {Large Multimodal Model Compression via Iterative Efficient Pruning and Distillation}, year = {2024}, isbn = {9798400701726}, publisher = {Association for Computing Machinery}, doi = {10.1145/3589335.3648321}, booktitle = {Companion Proceedings of the ACM Web Conference 2024}, pages = {235โ€“244}, series = {WWW '24} }