README.md
May 28, 2024 ยท View on GitHub
FineInfer
| Paper |
FineInfer is a research prototype for fine-tuning and serving large language models.
FineInfer supports concurrent parameter-efficient fine-tuning and inference through the following features:
- Deferred continuous batching
- Hybrid system architecture
- Heterogeneous batching
Get Started
The current version removes some previous features and functionalities. If you need them, please download previous versions.
Citation
@inproceedings{FineInfer,
author = {He, Yongjun and Lu, Yao and Alonso, Gustavo},
title = {Deferred Continuous Batching in Resource-Efficient Large Language Model Serving},
year = {2024},
booktitle = {Proceedings of the 4th Workshop on Machine Learning and Systems},
pages = {98โ106},
series = {EuroMLSys '24}
}