ColorizeDiffusion XL

April 9, 2026 · View on GitHub

[CVPR 2026 Highlight] Official Implementation of "Towards High-resolution and Disentangled Reference-based Sketch Colorization"

SDXL-based implementation of ColorizeDiffusion, a reference-based sketch colorization framework built on Stable Diffusion. This repository contains the XL architecture (1024px) with enhanced embedding guidance for character colorization and geometry disentanglement. For the base SD2.1 implementation (512/768px), refer to the original repository.

Getting Started

conda env create -f environment.yml
conda activate hf

User Interface

python -u app.py

The default server address is http://localhost:7860.

Inference options

Option	Description
Low-level injection	Enable low-level feature injection for backgrounds.
Attention injection	Noised low-level feature injection, 2x inference time.
Reference guidance scale	Classifier-free guidance scale for the reference image.
Reference strength	Decrease to increase semantic fidelity to sketch inputs.
Foreground strength	Reference strength for the foreground region.
Background strength	Reference strength for the background region.
Sketch guidance scale	Classifier-free guidance scale for the sketch image, suggested 1.
Sketch strength	Control scale of the sketch condition.
Background factor	Controls how background region is blended.
Merging scale	Scale for merging foreground and background.
Preprocessor	Sketch preprocessing. Extract is suggested for complicated pencil drawings.
Line extractor	Line extractors used when preprocessor is Extract.

Text manipulation is deactivated by default. To activate:

python -u app.py -manipulate

Use --full to expose additional advanced controls (per-level sketch strengths, cross-attention scales, injection fidelity, etc.). Refer to the base repository for details on manipulation options.

Training

Our implementation is based on Accelerate and DeepSpeed. Before starting a training, first collect data and organize your training dataset as follows:

[dataset_path]
├── image_list.json    # Optional, for image indexing
├── color/             # Color images (zip archives)
│   ├── 0001.zip
│   │   ├── 10001.png
│   │   ├── 100001.jpg
│   │   └── ...
│   ├── 0002.zip
│   └── ...
├── sketch/            # Sketch images (zip archives)
│   ├── 0001.zip
│   │   ├── 10001.png
│   │   ├── 100001.jpg
│   │   └── ...
│   ├── 0002.zip
│   └── ...
└── mask/              # Mask images (required for adapter training)
    ├── 0001.zip
    │   ├── 10001.png
    │   ├── 100001.jpg
    │   └── ...
    ├── 0002.zip
    └── ...

For details of dataset organization, see data/dataloader.py.

Training command:

accelerate launch --config_file [accelerate_config] \
    train.py \
    -n [experiment_name] \
    -d [dataset_path] \
    -bs 16 \
    -nt 4 \
    -cfg configs/training/sdxl-base.yaml \
    -pt [pretrained_model_path] \
    -lr 1e-5 \
    -fm

Note that the batch_size is micro batch size per GPU. If you run the command on 8 GPUs, the total batch size is 128. Use -fm to fit pretrained weights to the new model architecture. Refer to options.py for full arguments.

Inference & Validation

# Inference
python inference.py \
    --name inf \
    --dataroot [dataset_path] \
    --batch_size 64 \
    -cfg configs/inference/sdxl.yaml \
    -pt [pretrained_model_path] \
    -gs 5

# Validation (uses random reference images)
python inference.py \
    --name val \
    --dataroot [dataset_path] \
    --batch_size 64 \
    -cfg configs/inference/xl-val.yaml \
    -pt [pretrained_model_path] \
    -gs 5 \
    -val

The difference between inference and validation modes is that validation mode uses randomly selected images as reference inputs. Refer to options.py for full arguments.

Code Reference

Citation

@InProceedings{Yan_2025_CVPR,
    author    = {Yan, Dingkun and Wang, Xinrui and Li, Zhuoru and Saito, Suguru and Iwasawa, Yusuke and Matsuo, Yutaka and Guo, Jiaxian},
    title     = {Image Referenced Sketch Colorization Based on Animation Creation Workflow},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2025},
    pages     = {23391-23400}
}

@article{2026arXiv260305971Y,
    author = {{Yan}, Dingkun and {Wang}, Xinrui and {Iwasawa}, Yusuke and {Matsuo}, Yutaka and {Saito}, Suguru and {Guo}, Jiaxian},
    title = "{ColorizeDiffusion XL: Enhancing Embedding Guidance for Character Colorization and Geometry Disentanglement}",
    journal = {arXiv e-prints},
    year = {2026},
    doi = {10.48550/arXiv.2603.05971},
}