README.md
September 30, 2025 · View on GitHub
Mitigating Occlusions in Virtual Try-On via A Simple-Yet-Effective Mask-Free Framework
Chenghu Du1 · Shengwu Xiong2,3 · Junyin Wang1 · Yi Rong1✉ · Shili Xiong1✉
1 School of Computer Science and Artificial Intelligence, Wuhan University of Technology
2 Interdisciplinary Artificial Intelligence Research Institute, Wuhan College
3 Shanghai Artificial Intelligence Laboratory
✉ Corresponding authors
English | 简体中文
📄 Abstract
This work tackles occlusion issues in Virtual Try-On (VTON).
We taxonomize failures into:
- Inherent Occlusions – “ghost” garments from the reference image that remain in the result.
- Acquired Occlusions – distorted human anatomy that visually blocks the new outfit.
To remove both, we propose a mask-free VTON framework with two plug-and-play operations:
- Background Pre-Replacement – swaps the background before generation so the model never confuses clothes with body/background, suppressing inherent occlusions.
- Covering-and-Eliminating – enforces human-aware semantics, yielding anatomically plausible shapes and thus fewer acquired occlusions.
The operations are architecture-agnostic: drop them into GANs or diffusion models without re-design.

🆕 News
All dates are UTC.
- 2025-09-20 🚀 Project page & teaser image live.
- 2025-09-19 🔥 Paper accepted at NeurIPS 2025.
🚧 TODO List
The test code has been released, the training code will be released soon.
- [2025-00-00] Release the training script.
- [2025-00-00] Release the pretrained model.
- [2025-09-21] Release the testing script.
- [2025-09-21] Release the manuscript.
🏗 Model Architecture

🔧 Installation
pip3 install -r requirements.txt
or
conda create -n uscpfn python=3.6
source activate uscpfn or conda activate uscpfn
conda install pytorch=1.6.0 torchvision=0.7.0 cudatoolkit=11.7 -c pytorch
conda install cupy or pip install cupy==8.3.0
pip install opencv-python
git clone https://github.com/du-chenghu/OccFree-VTON.git
cd ./OccFree-VTON/
📋 Requirements
- Python>=3.9
- torch==2.3.0
- torchvision==0.18.0
- torchaudio==2.3.0
- cuda==12.1
⚙️ Setup
git clone https://github.com/HVision-NKU/DepthAnythingAC.git
cd DepthAnythingAC
conda create -n depth_anything_ac python=3.9
conda activate depth_anything_ac
pip install -r requirements.txt
📦 Dataset
We train and evaluate on two standard VTON datasets:
| Dataset | Images | Resolution | Download | Annotation |
|---|---|---|---|---|
| VITON-HD | ~13k | 1024×768 | Google Drive | keypoints, parse, cloth |
| DressCode | ~52k | 512×384 | Official | keypoints, parse, cloth |
🚀 Usage
Get the Model
Download the pre-trained checkpoints from Hugging Face:
mkdir checkpoints
cd checkpoints
# (Optional) Using huggingface mirrors
export HF_ENDPOINT=https://hf-mirror.com
# download OccFree-VTON model from huggingface
huggingface-cli download --resume-download ghost233lism/OccFree-VTON --local-dir ghost233lism/OccFree-VTON
We also provide the OccFree-VTON model on Google Drive: Download
Inference
- cd OccFree-VTON
cd OccFree-VTON-main
- First, you need to download the [Checkpoints for Test] and put these under the folder
checkpoints/. The foldercheckpoints/shold containngd_model_final.pthandsig_model_final.pth. - Please put the test set of the dataset in the
dataset/, i.e., thedataset/folder should contain thetest_pairs.txtandtest/. - To generate virtual try-on images, just run:
python test.py
- The results will be saved in the folder
results/.
During inference, only a person image and a clothes image are fed into the network to generate the try-on image. No human parsing results or human pose estimation results are needed for inference.
To reproduce our results from the saved model, your test environment should be the same as our test environment.
- Note that if you want to test paired data (paired clothing-person images), please download the replaceable list here (test_pairs.txt).
📊 Visualization Results
📄 Citation
If you find this work useful, please consider citing:
@article{du2025mitigating,
title={Mitigating Occlusions in Virtual Try-On via A Simple-Yet-Effective Mask-Free Framework},
author={Du, Chenghu and Xiong, Shengwu and Wang, Junyin and Rong, Yi and Xiong, Shili},
journal={Advances in Neural Information Processing Systems},
year={2025}
}
📬 Contact
For technical questions or commercial licensing, please contact duch@whut.edu.cn
🤝 Acknowledgements
Our code is based on the official implementation of [CatVTON]. We thank the authors of [CatVTON] for the foundational work. We also thank the authors of VITON-HD and DressCode datasets for their excellent benchmarks, and the open-source communities of PyTorch, HuggingFace Diffusers and xformers.
📜 License
This code is licensed under the Creative Commons Attribution-NonCommercial 4.0 International for non-commercial use only. Please note that any commercial use of this code requires formal permission prior to use.