Instruction Fine-Tuning Guide
July 28, 2024 ยท View on GitHub
Data Downloading
The data will be uploaded to ๐ค VLM-SFT soon.
# assume you have installed git-lfs. If not, please run conda install git-lfs.
git clone https://huggingface.co/datasets/YangyiYY/VLM-SFT
Model Training
You should first specify the train_data, img_dir, proj_dir, checkpoint in the config/SFT.yml file:
train_data: the path to the training dataall_data.jsonl(downloaded in the previous step).img_dir: the path to the image directory. The images are downloaded in the previous step.proj_dir: the name for wandb logger. You can set it to your own project name.checkpoint: the path to the pre-trained model. See PRETRAIN_GUIDE.md for the pre-training instructions.
Then run:
scripts/sft/run.sh
The output will be saved to data/ckpts/SFT/.