GUI-Shift: Enhancing VLM-based GUI Agents through Self-supervised Reinforcement Learning

March 2, 2026 · View on GitHub

[📖 Paper] [🤗 Models (Coming Soon)]

📝 Overview

K-step GUI Transition To unlock the potential of unlabeled GUI trajectories, we introduce K-step GUI Transition, a self-supervised inverse dynamics task in which VLMs learn GUI dynamics by predicting the initial action that causes a transition between two GUI states. Specifically, each training sample consists of two screenshots, $S_t$ and $S_{t+k}$ , where $S_{t+k}$ results from executing $K$ actions starting from state $S_t$ . The VLM is trained to predict the first action that transforms $S_t$ into $S_{t+1}$ .

GUI-Shift We introduce GUI-Shift, a self-supervised RL framework that applies GRPO to K-step GUI Transition. We apply it to train four VLMs: Qwen2.5-VL-7B, InternVL3-8B, MimoVL-7B-SFT, and MimoVL-7B-RL, each using 2K samples for four K-step GUI Transition variants (K $\in$ {1, 2, 3, 4}). Experiments across four VLMs and five benchmarks show that VLMs enhanced with GUI-Shift exhibit generalization in both GUI automation and grounding tasks, with up to 11.2% accuracy gains.

🔥 News

2026/1/26 🎉 GUI-Shift was accepted by ICLR 2026. We will release the newest version of GUI-Shift code and models soon. Stay tuned!
2025/6/18 We released UIShift-7B, the initial version of GUI-Shift, including model and code (here).
2025/5/18 UIShift, the initial version of GUI-Shift, was released on arXiv.

📈 Results

GUI task automation performance on AndroidControl and GUI Odyssey

GUI grounding performance on ScreenSpot-V2 and ScreenSpot-Pro

Additional results on AndroidWorld and detailed ablation studies can be found in the GUI-Shift Paper.

⚙️ Code and Usage

GUI-Shift has evolved from the initial UIShift version (v1) to the improved GUI-Shift framework (v2) presented in our ICLR 2026 paper.

The UIShift-7B (v1) training pipeline and scripts are available in legacy_readme.md and can be used as a reference implementation.

The GUI-Shift (v2) training code, improved reward design, and updated models will be released soon. Stay tuned.

🌟 Citation

If you find this work useful, please consider citing our paper.

@article{gao2025gui,
  title={GUI-Shift: Enhancing VLM-Based GUI Agents through Self-supervised Reinforcement Learning},
  author={Gao, Longxi and Zhang, Li and Gao, Pengzhi and Liu, Wei and Luan, Jian and Xu, Mengwei},
  journal={arXiv preprint arXiv:2505.12493},
  year={2025}
}