README.md

August 10, 2024 · View on GitHub

Introduction

LoRAPrune: Pruning Meets Low-Rank Parameter-Efficient Fine-Tuning [arXiv]

Mingyang Zhang^1,2, Hao Chen¹, Chunhua Shen^1,3, Zhen Yang¹, Linlin Ou², Xinyi Yu², Bohan Zhuang¹
Zhejiang University¹, Zhejiang University of Technology², Ant Group³

This repository contains code for reproducing LoRAPrune. LoRAPrune can iteratively prune LPMs in a memory-efficient manner. Specifically, LoRAPrune uses a LoRA-guided pruning criterion, which uses the weights and gradients of LoRA, rather than the gradients of pre-trained weights for importance estimation.

Updates:

June, 20, 2024: Code is released!
May, 20, 2024: LoRAPrune is accepted by ACL 2024 Findings!

TODO List:

Support more LLMs.

Quick Start

Installation

pip install -r requirement.txt

Prune LPMs

sh script/prune.sh

This script would compress the LLaMA-7B model. You need to download LLaMA-7B pretrained weights. The dataset would be automatically downloaded and sampled. You also can prune more larger LPMs, e.g., LLaMA-13B, LLaMA-30B and LLaMA-65B. To save GPU memory, you can optionally quantize the pre-trained weights to 8 bits by adding --load_in_8bit.

Evaluate results

sh script/evaluate.sh

After pruning, you can evalute the pruning resutls on Wixitext2 and PTB datasets.

License

For non-commercial academic use, this project is licensed under the 2-clause BSD License. For commercial use, please contact Chunhua Shen.

Citation

If you find this project useful, please cite

@misc{zhang2023pruning,
      title={Pruning Meets Low-Rank Parameter-Efficient Fine-Tuning}, 
      author={Mingyang Zhang and Hao Chen and Chunhua Shen and Zhen Yang and Linlin Ou and Xinyi Yu and Bohan Zhuang},
      year={2023},
      eprint={2305.18403},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}