ColorizeDiffusion XL

April 9, 2026 · View on GitHub

arXiv Paper WACV 2025 CVPR 2025 arXiv v2 Paper CVPR 2026 Model Weights Demo License

[CVPR 2026 Highlight] Official Implementation of "Towards High-resolution and Disentangled Reference-based Sketch Colorization"

SDXL-based implementation of ColorizeDiffusion, a reference-based sketch colorization framework built on Stable Diffusion. This repository contains the XL architecture (1024px) with enhanced embedding guidance for character colorization and geometry disentanglement. For the base SD2.1 implementation (512/768px), refer to the original repository.

Getting Started


conda env create -f environment.yml
conda activate hf

User Interface


python -u app.py

The default server address is http://localhost:7860.

Inference options

OptionDescription
Low-level injectionEnable low-level feature injection for backgrounds.
Attention injectionNoised low-level feature injection, 2x inference time.
Reference guidance scaleClassifier-free guidance scale for the reference image.
Reference strengthDecrease to increase semantic fidelity to sketch inputs.
Foreground strengthReference strength for the foreground region.
Background strengthReference strength for the background region.
Sketch guidance scaleClassifier-free guidance scale for the sketch image, suggested 1.
Sketch strengthControl scale of the sketch condition.
Background factorControls how background region is blended.
Merging scaleScale for merging foreground and background.
PreprocessorSketch preprocessing. Extract is suggested for complicated pencil drawings.
Line extractorLine extractors used when preprocessor is Extract.

Text manipulation is deactivated by default. To activate:

python -u app.py -manipulate

Use --full to expose additional advanced controls (per-level sketch strengths, cross-attention scales, injection fidelity, etc.). Refer to the base repository for details on manipulation options.

Training


Our implementation is based on Accelerate and DeepSpeed. Before starting a training, first collect data and organize your training dataset as follows:

[dataset_path]
├── image_list.json    # Optional, for image indexing
├── color/             # Color images (zip archives)
│   ├── 0001.zip
│   │   ├── 10001.png
│   │   ├── 100001.jpg
│   │   └── ...
│   ├── 0002.zip
│   └── ...
├── sketch/            # Sketch images (zip archives)
│   ├── 0001.zip
│   │   ├── 10001.png
│   │   ├── 100001.jpg
│   │   └── ...
│   ├── 0002.zip
│   └── ...
└── mask/              # Mask images (required for adapter training)
    ├── 0001.zip
    │   ├── 10001.png
    │   ├── 100001.jpg
    │   └── ...
    ├── 0002.zip
    └── ...

For details of dataset organization, see data/dataloader.py.

Training command:

accelerate launch --config_file [accelerate_config] \
    train.py \
    -n [experiment_name] \
    -d [dataset_path] \
    -bs 16 \
    -nt 4 \
    -cfg configs/training/sdxl-base.yaml \
    -pt [pretrained_model_path] \
    -lr 1e-5 \
    -fm

Note that the batch_size is micro batch size per GPU. If you run the command on 8 GPUs, the total batch size is 128. Use -fm to fit pretrained weights to the new model architecture. Refer to options.py for full arguments.

Inference & Validation


# Inference
python inference.py \
    --name inf \
    --dataroot [dataset_path] \
    --batch_size 64 \
    -cfg configs/inference/sdxl.yaml \
    -pt [pretrained_model_path] \
    -gs 5

# Validation (uses random reference images)
python inference.py \
    --name val \
    --dataroot [dataset_path] \
    --batch_size 64 \
    -cfg configs/inference/xl-val.yaml \
    -pt [pretrained_model_path] \
    -gs 5 \
    -val

The difference between inference and validation modes is that validation mode uses randomly selected images as reference inputs. Refer to options.py for full arguments.

Code Reference


  1. Stable Diffusion XL
  2. SD-webui-ControlNet
  3. Stable-Diffusion-webui
  4. K-diffusion
  5. DeepSpeed
  6. sketchKeras-PyTorch

Citation


@InProceedings{Yan_2025_CVPR,
    author    = {Yan, Dingkun and Wang, Xinrui and Li, Zhuoru and Saito, Suguru and Iwasawa, Yusuke and Matsuo, Yutaka and Guo, Jiaxian},
    title     = {Image Referenced Sketch Colorization Based on Animation Creation Workflow},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2025},
    pages     = {23391-23400}
}

@article{2026arXiv260305971Y,
    author = {{Yan}, Dingkun and {Wang}, Xinrui and {Iwasawa}, Yusuke and {Matsuo}, Yutaka and {Saito}, Suguru and {Guo}, Jiaxian},
    title = "{ColorizeDiffusion XL: Enhancing Embedding Guidance for Character Colorization and Geometry Disentanglement}",
    journal = {arXiv e-prints},
    year = {2026},
    doi = {10.48550/arXiv.2603.05971},
}