README.md

April 8, 2025 ยท View on GitHub

GUI Action Narrator: Where and When Did That Action Take Place?

Qinchen Wu, Difei Gao, Kevin Qinghong Lin, Zhuoyu Wu, Xiangwu Guo, Peiran Li, Weichen Zhang, Hengxu Wang, Mike Zheng Shou

๐Ÿค–: Introduction

We introduce GUI action dataset Act2Cap as well as an effective framework: GUI Narrator for GUI video captioning that utilizes the cursor detection to enhance the interpretation of high-resolution screenshots and keyframe extraction in GUI actions.

๐Ÿ“‹ ToDo List

  • Model for Cursor detector and Narrator
  • Code of conduct

-- Our model and test benchmark are availble on Hugging Face.