README.md

October 17, 2025 · View on GitHub

Spatial Preference Rewarding for MLLMs Spatial Understanding

Han Qiu, Peng Gao, Lewei Lu, Xiaoqin Zhang, Ling Shao, Shijian Lu

We propose SPR, a Spatial Preference Rewarding~(SPR) approach that enhances MLLMs' spatial capabilities by rewarding MLLMs' detailed responses with precise object localization over vague or inaccurate responses. With randomly selected image regions and region descriptions from MLLMs, SPR introduces semantic and localization scores to comprehensively evaluate the text quality and localization quality in MLLM-generated descriptions. We also refine the MLLM descriptions with better localization accuracy and pair the best-scored refinement with the initial descriptions of the lowest score for direct preference optimization, thereby enhancing fine-grained alignment with visual input.

图片名称

Installation

  1. First Install Ferret
  2. Clone the current repository.
  3. Then run the following code to install additional packages.
pip install -r requirements.txt

Train

  1. General Responses of Grounded Region Description Download the images, object_annotation, and region queries. Then run the following code to generate responses of grounded region descriptions.
bash scripts/generate_region_description.sh
  1. Score and rank the generated descriptions by running the following code.
bash scripts/generate_region_description.sh
  1. Train the model.
bash scripts/run_train_ferret.sh

Citation

If you find the project helpful, please cite our paper

@misc{qiu2025spatialpreferencerewardingmllms,
      title={Spatial Preference Rewarding for MLLMs Spatial Understanding}, 
      author={Han Qiu and Peng Gao and Lewei Lu and Xiaoqin Zhang and Ling Shao and Shijian Lu},
      year={2025},
      eprint={2510.14374},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2510.14374}, 
}