README.md

September 30, 2025 · View on GitHub

Mitigating Occlusions in Virtual Try-On via A Simple-Yet-Effective Mask-Free Framework

Chenghu Du¹ · Shengwu Xiong^2,3 · Junyin Wang¹ · Yi Rong^1✉ · Shili Xiong^1✉

¹ School of Computer Science and Artificial Intelligence, Wuhan University of Technology
² Interdisciplinary Artificial Intelligence Research Institute, Wuhan College
³ Shanghai Artificial Intelligence Laboratory
^✉ Corresponding authors

English | 简体中文

📄 Abstract

This work tackles occlusion issues in Virtual Try-On (VTON).
We taxonomize failures into:

Inherent Occlusions – “ghost” garments from the reference image that remain in the result.
Acquired Occlusions – distorted human anatomy that visually blocks the new outfit.

To remove both, we propose a mask-free VTON framework with two plug-and-play operations:

Background Pre-Replacement – swaps the background before generation so the model never confuses clothes with body/background, suppressing inherent occlusions.
Covering-and-Eliminating – enforces human-aware semantics, yielding anatomically plausible shapes and thus fewer acquired occlusions.

The operations are architecture-agnostic: drop them into GANs or diffusion models without re-design.

teaser

🆕 News

All dates are UTC.

2025-09-20 🚀 Project page & teaser image live.
2025-09-19 🔥 Paper accepted at NeurIPS 2025.

🚧 TODO List

The test code has been released, the training code will be released soon.

[2025-00-00] Release the training script.
[2025-00-00] Release the pretrained model.
[2025-09-21] Release the testing script.
[2025-09-21] Release the manuscript.

🏗 Model Architecture

architecture

🔧 Installation

pip3 install -r requirements.txt

conda create -n uscpfn python=3.6

source activate uscpfn or conda activate uscpfn

conda install pytorch=1.6.0 torchvision=0.7.0 cudatoolkit=11.7 -c pytorch

conda install cupy or pip install cupy==8.3.0

pip install opencv-python

git clone https://github.com/du-chenghu/OccFree-VTON.git

cd ./OccFree-VTON/

📋 Requirements

Python>=3.9
torch==2.3.0
torchvision==0.18.0
torchaudio==2.3.0
cuda==12.1

⚙️ Setup

git clone https://github.com/HVision-NKU/DepthAnythingAC.git
cd DepthAnythingAC
conda create -n depth_anything_ac python=3.9
conda activate depth_anything_ac
pip install -r requirements.txt

📦 Dataset

We train and evaluate on two standard VTON datasets:

Dataset	Images	Resolution	Download	Annotation
VITON-HD	~13k	1024×768	Google Drive	keypoints, parse, cloth
DressCode	~52k	512×384	Official	keypoints, parse, cloth

🚀 Usage

Get the Model

Download the pre-trained checkpoints from Hugging Face:

mkdir checkpoints
cd checkpoints

# (Optional) Using huggingface mirrors
export HF_ENDPOINT=https://hf-mirror.com

# download OccFree-VTON model from huggingface
huggingface-cli download --resume-download ghost233lism/OccFree-VTON --local-dir ghost233lism/OccFree-VTON

We also provide the OccFree-VTON model on Google Drive: Download

Inference

cd OccFree-VTON

cd OccFree-VTON-main

First, you need to download the [Checkpoints for Test] and put these under the folder checkpoints/. The folder checkpoints/ shold contain ngd_model_final.pth and sig_model_final.pth.
Please put the test set of the dataset in the dataset/, i.e., the dataset/ folder should contain the test_pairs.txt and test/.
To generate virtual try-on images, just run:

python test.py

The results will be saved in the folder results/.

During inference, only a person image and a clothes image are fed into the network to generate the try-on image. No human parsing results or human pose estimation results are needed for inference.

To reproduce our results from the saved model, your test environment should be the same as our test environment.

Note that if you want to test paired data (paired clothing-person images), please download the replaceable list here (test_pairs.txt).

📊 Visualization Results

📄 Citation

If you find this work useful, please consider citing:

@article{du2025mitigating,
  title={Mitigating Occlusions in Virtual Try-On via A Simple-Yet-Effective Mask-Free Framework},
  author={Du, Chenghu and Xiong, Shengwu and Wang, Junyin and Rong, Yi and Xiong, Shili},
  journal={Advances in Neural Information Processing Systems},
  year={2025}
}

📬 Contact

For technical questions or commercial licensing, please contact duch@whut.edu.cn

🤝 Acknowledgements

Our code is based on the official implementation of [CatVTON]. We thank the authors of [CatVTON] for the foundational work. We also thank the authors of VITON-HD and DressCode datasets for their excellent benchmarks, and the open-source communities of PyTorch, HuggingFace Diffusers and xformers.

📜 License

This code is licensed under the Creative Commons Attribution-NonCommercial 4.0 International for non-commercial use only. Please note that any commercial use of this code requires formal permission prior to use.