README.md

March 30, 2026 · View on GitHub

[EMNLP 2025] Gradient-Attention Guided Dual-Masking Synergetic Framework for Robust Text-based Person Retrieval

Tianlu Zheng, Yifan Zhang, Xiang An, Ziyong Feng, Kaicheng Yang, Qichuan Ding,

🎺 News

[2026/03/11]: ✨We update the code of GA-DMS.
[2025/09/12]: ✨We public the paper of GA-DMS.
[2025/09/10]: ✨We release the Web-Person Dataset in 🤗 Huggingface
[2025/08/21]: ✨GA-DMS has been accepted by EMNLP2025 Main.

This work advances CLIP for person representation learning through synergistic improvements in data curation and model architecture. First, we develop a noise-resistant data construction pipeline that leverages the in-context learning capabilities of MLLMs to automatically filter and caption web-sourced images. This yields WebPerson, a large-scale dataset of 5M high-quality person-centric image-text pairs. Second, we introduce the GA-DMS (Gradient-Attention Guided Dual-Masking Synergetic) framework, which improves cross-modal alignment by adaptively masking noisy textual tokens based on the gradient-attention similarity score. Additionally, we incorporate masked token prediction objectives that compel the model to predict informative text tokens, enhancing fine-grained semantic representation learning. Extensive experiments show that GA-DMS achieves state-of-the-art performance across multiple benchmarks.

We utilize the COYO700M dataset, a large-scale dataset that contains 747M image-text pairs collected from CommonCrawl, as our web-crawled images source. The following is the details of person-centric image filtering and synthetic caption generation pipeline for constructing our WebPerson dataset.

WebPerson Dataset

The WebPerson dataset can be downloaded here, which includes both 5M and 1M scales. Both the images and their corresponding textual descriptions are available from this source.

Prepare Downstream Datasets

Download the CUHK-PEDES dataset from here, ICFG-PEDES dataset from here and RSTPReid dataset form here.

Environment installation

conda create -n ga_dms python=3.10 -y
conda activate ga_dms

pip install torch==2.1.2 torchvision==0.16.2 torchaudio==2.1.2 --index-url https://download.pytorch.org/whl/cu118
pip install -r requirements.txt

@misc{zheng2025gradientattentionguideddualmaskingsynergetic,
      title={Gradient-Attention Guided Dual-Masking Synergetic Framework for Robust Text-based Person Retrieval}, 
      author={Tianlu Zheng and Yifan Zhang and Xiang An and Ziyong Feng and Kaicheng Yang and Qichuan Ding},
      year={2025},
      eprint={2509.09118},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2509.09118}, 
}

README.md

[EMNLP 2025] Gradient-Attention Guided Dual-Masking Synergetic Framework for Robust Text-based Person Retrieval

🎺 News

💡 Highlights

WebPerson Dataset

Prepare Downstream Datasets

Environment installation

Pretrain Model Checkpoints

Acknowledgements

📖 Citation