HuggingFace Transformers inference for Stanford Alpaca (fine-tuned LLaMA)

March 19, 2023 · View on GitHub

Stanford Alpaca is a model fine-tuned from the LLaMA-7B.

The inference code is using Alpaca Native model, which was fine-tuned using the original tatsu-lab/stanford_alpaca repository. The fine-tuning process does not use LoRA, unlike tloen/alpaca-lora.

For the Alpaca-7B:

Linux, MacOS
1x GPU 24GB in fp16 or 1x GPU 12GB in int8
PyTorch with CUDA (not the CPU version)
HuggingFace Transformers library
```
pip install git+https://github.com/huggingface/transformers.git
```
Currently, the Transformers library only has support for LLaMA through the latest GitHub repository, and not through Python package.
If run in 8-bit (quantized model), install Bitsandbytes and set load_in_8bit=true