README.md

May 26, 2025 ยท View on GitHub

Guide on Inference Performance

1. First, we need to convert lightning checkpoint to gpt fast compatible checkpoint.

python scripts/convert_hamburger_checkpoint.py \
--lightning_checkpoint_path /data/data_persistent1/jingyu/hamburger/ckpts/hamburger-llama-1B-0506-finish.ckpt \
--save_checkpoint_path ./checkpoints/hamburger

2. Generate with Prompts

python generate.py \
--checkpoint_path checkpoints/hamburger/model.pth \
--is_hamburger \
--prompt "Who is Magnus Carlsen?"

python generate.py \
--checkpoint_path checkpoints/meta-llama/Llama-3.2-1B-Instruct/model.pth \
--prompt "Who is Magnus Carlsen?"

python generate.py \
--checkpoint_path checkpoints/meta-llama/Llama-3.2-1B-Instruct/model.pth \
--draft_checkpoint_path checkpoints/meta-llama/Llama-3.2-1B-Instruct/model_int8.pth \
--prompt "Who is Magnus Carlsen?"

3. Benchmarking with Different Tasks

bash run_all.sh