README.md
August 10, 2024 ยท View on GitHub
Introduction
LoRAPrune: Pruning Meets Low-Rank Parameter-Efficient Fine-Tuning [arXiv]
Mingyang Zhang1,2, Hao Chen1, Chunhua Shen1,3, Zhen Yang1, Linlin Ou2, Xinyi Yu2, Bohan Zhuang1
Zhejiang University1, Zhejiang University of Technology2, Ant Group3
This repository contains code for reproducing LoRAPrune. LoRAPrune can iteratively prune LPMs in a memory-efficient manner. Specifically, LoRAPrune uses a LoRA-guided pruning criterion, which uses the weights and gradients of LoRA, rather than the gradients of pre-trained weights for importance estimation.
Updates:
- June, 20, 2024: Code is released!
- May, 20, 2024: LoRAPrune is accepted by ACL 2024 Findings!
TODO List:
- Support more LLMs.
Quick Start
Installation
pip install -r requirement.txt
Prune LPMs
sh script/prune.sh
This script would compress the LLaMA-7B model. You need to download LLaMA-7B pretrained weights. The dataset would be automatically downloaded and sampled. You also can prune more larger LPMs, e.g., LLaMA-13B, LLaMA-30B and LLaMA-65B.
To save GPU memory, you can optionally quantize the pre-trained weights to 8 bits by adding --load_in_8bit.
Evaluate results
sh script/evaluate.sh
After pruning, you can evalute the pruning resutls on Wixitext2 and PTB datasets.
License
For non-commercial academic use, this project is licensed under the 2-clause BSD License. For commercial use, please contact Chunhua Shen.
Citation
If you find this project useful, please cite
@misc{zhang2023pruning,
title={Pruning Meets Low-Rank Parameter-Efficient Fine-Tuning},
author={Mingyang Zhang and Hao Chen and Chunhua Shen and Zhen Yang and Linlin Ou and Xinyi Yu and Bohan Zhuang},
year={2023},
eprint={2305.18403},
archivePrefix={arXiv},
primaryClass={cs.LG}
}