HuggingFace Transformers inference for Stanford Alpaca (fine-tuned LLaMA)

March 19, 2023 ยท View on GitHub

Stanford Alpaca is a model fine-tuned from the LLaMA-7B.

The inference code is using Alpaca Native model, which was fine-tuned using the original tatsu-lab/stanford_alpaca repository. The fine-tuning process does not use LoRA, unlike tloen/alpaca-lora.

Hardware and software requirements

For the Alpaca-7B:

  • Linux, MacOS

  • 1x GPU 24GB in fp16 or 1x GPU 12GB in int8

  • PyTorch with CUDA (not the CPU version)

  • HuggingFace Transformers library

    pip install git+https://github.com/huggingface/transformers.git
    

    Currently, the Transformers library only has support for LLaMA through the latest GitHub repository, and not through Python package.

  • If run in 8-bit (quantized model), install Bitsandbytes and set load_in_8bit=true