GUI-Shift: Enhancing VLM-based GUI Agents through Self-supervised Reinforcement Learning

March 2, 2026 ยท View on GitHub

[๐Ÿ“– Paper] [๐Ÿค— Models (Coming Soon)]

๐Ÿ“ Overview

K-step GUI Transition To unlock the potential of unlabeled GUI trajectories, we introduce K-step GUI Transition, a self-supervised inverse dynamics task in which VLMs learn GUI dynamics by predicting the initial action that causes a transition between two GUI states. Specifically, each training sample consists of two screenshots, StS_t and St+kS_{t+k}, where St+kS_{t+k} results from executing KK actions starting from state StS_t. The VLM is trained to predict the first action that transforms StS_t into St+1S_{t+1}.

GUI-Shift We introduce GUI-Shift, a self-supervised RL framework that applies GRPO to K-step GUI Transition. We apply it to train four VLMs: Qwen2.5-VL-7B, InternVL3-8B, MimoVL-7B-SFT, and MimoVL-7B-RL, each using 2K samples for four K-step GUI Transition variants (K โˆˆ\in {1, 2, 3, 4}). Experiments across four VLMs and five benchmarks show that VLMs enhanced with GUI-Shift exhibit generalization in both GUI automation and grounding tasks, with up to 11.2% accuracy gains.

GUI-Shift Overview

๐Ÿ”ฅ News

  • 2026/1/26 ๐ŸŽ‰ GUI-Shift was accepted by ICLR 2026. We will release the newest version of GUI-Shift code and models soon. Stay tuned!

  • 2025/6/18 We released UIShift-7B, the initial version of GUI-Shift, including model and code (here).

  • 2025/5/18 UIShift, the initial version of GUI-Shift, was released on arXiv.

๐Ÿ“ˆ Results

automation

GUI task automation performance on AndroidControl and GUI Odyssey

grounding

GUI grounding performance on ScreenSpot-V2 and ScreenSpot-Pro

Additional results on AndroidWorld and detailed ablation studies can be found in the GUI-Shift Paper.

โš™๏ธ Code and Usage

GUI-Shift has evolved from the initial UIShift version (v1) to the improved GUI-Shift framework (v2) presented in our ICLR 2026 paper.

The UIShift-7B (v1) training pipeline and scripts are available in legacy_readme.md and can be used as a reference implementation.

The GUI-Shift (v2) training code, improved reward design, and updated models will be released soon. Stay tuned.

๐ŸŒŸ Citation

If you find this work useful, please consider citing our paper.

@article{gao2025gui,
  title={GUI-Shift: Enhancing VLM-Based GUI Agents through Self-supervised Reinforcement Learning},
  author={Gao, Longxi and Zhang, Li and Gao, Pengzhi and Liu, Wei and Luan, Jian and Xu, Mengwei},
  journal={arXiv preprint arXiv:2505.12493},
  year={2025}
}