Dual Recursive Feedback on Generation and Appearance Latents for Pose-Robust Text-to-Image Diffusion (ICCV 2025)

November 27, 2025 · View on GitHub

This is the official implmentation of the paper "Dual Recursive Feedback on Generation and Appearance Latents for Pose-Robust Text-to-Image Diffusion (ICCV 2025)"

Jiwon Kim¹, Pureum Kim¹, SeonHwa Kim¹, Soobin Park², Eunju Cha², Kyong Hwan Jin¹

¹ Korea University ² Sookmyung Women's University

Algorithm

Environment Setup

To install the environment, please run the following.

conda env create -f environment.yaml
conda activate drf

Run

To run DRF, use a notebook or run the below code.

We use a single NVIDIA RTX 3090 GPU for our experiments.

python run.py \
    --structure_image dataset/structure/person_mesh.jpg \
    --appearance_image dataset/appearance/tiger.jpg \
    --prompt "a photo of a tiger standing on the snow field" \
    --structure_prompt "a mesh of a standing human" \
    --appearance_prompt "a photo of a tiger walking on the snow field"

Optional arguments

disable_refiner: If enabled, disables the refiner (and does not load it), reducing memory usage and inference time.
model (str): When provided a .safetensors checkpoint path, loads the checkpoint as the base model instead of the default one.
benchmark: If enabled, reports the inference time and peak memory usage for the current run.
structure_schedule (float, default 0.6): Ratio of diffusion steps during which structure control is active.
For example, with 50 sampling steps:
- 0.6 → control is used for the first 60% (first 30 steps), then turned off for the remaining 40%.
- 0.7 → control is used for the first 70% (first 35 steps), then turned off for the remaining 30%.
appearance_schedule (float, default 0.6): Same as structure_schedule, but for appearance control.
e.g., 0.6 with 50 steps → appearance control is applied for the first 30 steps and disabled for the last 20.
seed (int, default 90095): Random seed for sampling. Use the same value to reproduce results across runs; change it to obtain different random outputs.

Reference

If you find our work useful for your research, please cite our paper.

@InProceedings{Kim_2025_ICCV,
    author    = {Kim, Jiwon and Kim, Pureum and Kim, SeonHwa and Park, Soobin and Cha, Eunju and Jin, Kyong Hwan},
    title     = {Dual Recursive Feedback on Generation and Appearance Latents for Pose-Robust Text-to-Image Diffusion},
    booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
    month     = {October},
    year      = {2025},
    pages     = {15491-15500}
}

Acknowledgements

Our code is based on Ctrl-X. We thank the authors for sharing their works.