Installation Guide
December 25, 2025 · View on GitHub
Environment Setup
# Clone the repository
git clone https://github.com/hustvl/DiffusionVL.git
cd DiffusionVL
# Create and activate a virtual environment
conda create -n diffusionvl python=3.10 -y
conda activate diffusionvl
# Install dependencies
bash init_env.sh
This will install:
eval/lmms-eval: Evaluation framework with metricstrain: Training framework with all dependencies
Data Preparation
We use the LLaVA pretraining and finetuning datasets for training.
1. Pretrain Data (LLaVA-Pretrain)
Download the LLaVA pretraining dataset from Hugging Face:
https://huggingface.co/datasets/liuhaotian/LLaVA-Pretrain/tree/main
Setup directory structure:
mkdir -p train/data/llava_pretrain/images
# Extract images.zip into the images folder
# Place blip_laion_cc_sbu_558k.json in llava_pretrain folder
Your directory should look like:
train/data/llava_pretrain/
├── images/
│ └── ... (extracted images)
└── blip_laion_cc_sbu_558k.json
2. Finetune Data (LLaVA-NeXT)
Download the LLaVA-NeXT dataset from Hugging Face:
https://huggingface.co/datasets/lmms-lab/LLaVA-NeXT-Data
Setup directory structure:
mkdir -p train/data/llava_next/images
# Extract all tar.gz files (llava_next_raw_format_images_1.tar.gz to llava_next_raw_format_images_11.tar.gz)
# from llava_next_raw_format folder into train/data/llava_next/images
# Move llava_next_raw_format_processed.json to train/data/llava_next/
Your directory should look like:
train/data/llava_next/
├── images/
│ └── ... (extracted images from all tar.gz files)
└── llava_next_raw_format_processed.json
Model Preparation
For DiffusionVL-QwenVL
-
Download Qwen2.5-VL-7B-Instruct from Hugging Face:
https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct -
Convert to DiffusionVL format:
python scripts/diffusionvl_prepare/convert_qwen2.5vl_to_diffusionvl.py \ --source_path /path/to/Qwen2.5-VL-7B-Instruct \ --dest_path /path/to/Qwen2.5-VL-7B-Instruct-DiffusionVLThis script converts the Qwen2.5-VL checkpoint to DiffusionVL-compatible format by reorganizing the model weights.
For DiffusionVL-Qwen / LLaVA-Qwen
Download the following models:
-
Qwen2.5-7B-Instruct:
https://huggingface.co/Qwen/Qwen2.5-7B-Instruct -
SigLIP2-so400m-patch14-384:
https://huggingface.co/google/siglip2-so400m-patch14-384
For LLaVA-LLaDA-BD3LM
Download the following models:
-
LLaDA-8B-Instruct:
https://huggingface.co/GSAI-ML/LLaDA-8B-Instruct -
SigLIP2-so400m-patch14-384:
https://huggingface.co/google/siglip2-so400m-patch14-384