README.md
April 8, 2025 ยท View on GitHub
GUI Action Narrator: Where and When Did That Action Take Place?
Qinchen Wu, Difei Gao, Kevin Qinghong Lin, Zhuoyu Wu, Xiangwu Guo, Peiran Li, Weichen Zhang, Hengxu Wang, Mike Zheng Shou
๐ค: Introduction
We introduce GUI action dataset Act2Cap as well as an effective framework: GUI Narrator for GUI video captioning that utilizes the cursor detection to enhance the interpretation of high-resolution screenshots and keyframe extraction in GUI actions.
๐ ToDo List
- Model for Cursor detector and Narrator
- Code of conduct