ReVSeg: Incentivizing the Reasoning Chain for Video Segmentation with Reinforcement Learning

December 3, 2025 ยท View on GitHub

Yifan Li1,2, Yingda Yin3, Lingting Zhu3, Weikai Chen3, Shengju Qian3, Xin Wang3, Yanwei Fu1,2
1Fudan University, 2Shanghai Innovation Institute, 3LIGHTSPEED


Teaser Image

Project Page | arXiv Paper | Code

๐Ÿ“ข Latest Updates

  • [2025/12/2] Our paper is now available on arXiv.

Abstract

Reasoning-centric video object segmentation is an inherently complex task: the query often refers to dynamics, causality, and temporal interactions, rather than static appearances. Yet existing solutions generally collapse these factors into simplified reasoning with latent embeddings, rendering the reasoning chain opaque and essentially intractable. We therefore adopt an explicit decomposition perspective and introduce ReVSeg, which executes reasoning as sequential decisions in the native interface of pretrained vision language models (VLMs). Rather than folding all reasoning into a single-step prediction, ReVSeg executes three explicit operations -- semantics interpretation, temporal evidence selection, and spatial grounding -- aligning pretrained capabilities. We further employ reinforcement learning to optimize the multi-step reasoning chain, enabling the model to self-refine its decision quality from outcome-driven signals. Experimental results demonstrate that ReVSeg attains state-of-the-art performances on standard video object segmentation benchmarks and yields interpretable reasoning trajectories.

Method Overview

Framework

ReVSeg runs a two-turn reasoning chain over the input video and query.

Key Features

  • ๐Ÿง  Explicit Reasoning Chain: Decomposes complex reasoning into interpretable sequential decisions
  • ๐ŸŽฏ Reinforcement Learning: Optimizes reasoning quality through outcome-driven signals
  • ๐Ÿ“Š State-of-the-Art Performance: Achieves top results on video object segmentation benchmarks
  • ๐Ÿ” Interpretable Trajectories: Provides transparent reasoning process visualization

Example Cases

qualitative cases

Citation

If you find our work useful, please consider citing:

@article{li2025revseg,
    title={ReVSeg: Incentivizing the Reasoning Chain for Video Segmentation with Reinforcement Learning},
    author={Li, Yifan and Yin, Yingda and Zhu, Lingting and Chen, Weikai and Qian, Shengju and Wang, Xin and Fu, Yanwei},
    journal={arXiv preprint arXiv:2512.02835},
    year={2025}
}

โœจ Code will be released soon. Stay tuned for updates!