Harmonizing Visual and Textual Embeddings for Zero-Shot Text-to-Image Customization (AAAI 2025)
January 23, 2026 ยท View on GitHub
This repository contains the official implementation of Harmonizing Visual and Textual Embeddings for Zero-Shot Text-to-Image Customization.
Run BLIP-Diffusion
- Run the following command:
python run_blip_diffusion.py \
--data_dir='./example/01.jpg' \
--data_class='cat' \
--prompt='jumping' \
--output_dir='./outputs'
Run ELITE
- Download gloabal and local mappers from ELITE homepage.
- Run basline or ours (run_elite_ours.py)
python run_elite.py \
--global_mapper_path="path-to-global_mapper.pt" \
--local_mapper_path="path-to-local_mapper.pt" \
--test_data_dir='./example/01.jpg' \
--template='a * riding a bike' \
--output_dir='./outputs' \
Computing Infrastructure
- GPU Model : TITAN RTX
- Memory : 13GB (BLIP-Diffusion) / 9GB (ELITE)
- Operating system: Ubuntu 18.04.5 LTS