README.md

January 27, 2026 · View on GitHub

Web-CogReasoner Overview

   📑 ICLR 2026    |    🤗 Models(Coming soon)    |    🤗 Dataset(Coming soon)    |    🤗 Bench(Coming soon)   

   🌐 Homepage    |    💬 Blog   

📝 Paper (ICLR 2026) 🤗 Model on Hugging Face 🐛 Open Issues ⭐ GitHub Stars

Web-CogReasoner Overview

Web-CogReasoner introduces a paradigm shift from simply enhancing web agents to systematically building their cognitive abilities from the ground up. Inspired by Bloom’s Taxonomy, we decomposes agent capabilities into knowledge content learning (Factual, Conceptual) and cognitive processes (Procedural), enabling interpretable and goal-directed behavior. Built upon large multimodal models, it performs knowledge-driven Chain-of-Thought (CoT) reasoning across complex web tasks, where each reasoning step is transparently grounded in a specific knowledge type, ensuring both interpretability and robust generalization.

To support this, we introduce:

  • Web-CogKnowledge Framework: A Bloom's Taxonomy-inspired two-stage training paradigm (Knowledge Content Learning → Cognitive Reasoning) for enhancing web agents' cognitive abilities.

  • Web-CogReasoner: A knowledge-driven multimodal agent trained via imitation learning in our Web-CogDataset.

  • Web-CogDataset: A curriculum-style dataset with 12 fine-grained tasks across 3 knowledge levels (Factual, Conceptual, Procedural), enabling stepwise skill acquisition.

  • Web-CogBench: A dedicated benchmark for evaluating whether a web agent possesses the requisite prior knowledge and cognitive capabilities for effective web navigation.

Web-CogReasoner Overview

News

[2025-08-05] Release the full research paper on arXiv.
[2026-01-26] 🎉 Our paper has been accepted to ICLR 2026!

To-Do List

Last Updated: 2025-08-05 13:08 UTC+8

  • Paper: Release the full research paper on arXiv.
  • Code: Open-source the complete code for training and inference.
  • Model: Publish the official Web-CogReasoner model weights.
  • Dataset: Make the Web-CogDataset publicly available for community research.
  • Benchmark: Launch a public online evaluation server for Web-CogBench to ensure fair comparisons.

Performance

Cognitive & Visual Benchmarks

This comparison highlights our model's strength in reasoning, a crucial capability that visual-centric models may lack.

ModelWeb-CogBench (Cognition)VisualWebBench (Vision)
Proprietary Models
Claude Sonnet 476.8%85.9%
Gemini 2.5 Pro80.2%86.6%
Open-Source Models
Qwen2.5-VL-7B69.8%76.0%
UI-TARS-7B-SFT46.4%86.0%
Web-CogReasoner (Ours)82.9%86.3%

Key Insight: While some models like UI-TARS excel at visual tasks (VisualWebBench: 86.0%), they struggle with reasoning-intensive tasks (Web-CogBench: 48.2%). This highlights that strong visual perception does not guarantee advanced cognitive capabilities—a gap our work aims to fill.

Online Web Task

This section evaluates the models' ability to perform complex, multi-step tasks in live web environments.

ModelWebVoyager (Generalization)Mind2Web (Cross-task)Mind2Web (Cross-web)
Proprietary Models
Claude Sonnet 447.7%40.2%21.7%
Gemini 2.5 Pro54.9%37.5%25.5%
Open-Source Models
Qwen2.5-VL-7B2.2%1.0%1.0%
OpenWebVoyagerIL18.1%6.3%6.6%
Web-CogReasoner (Ours)30.2%17.0%10.1%

Quickstart

Coming soon

Citation

@article{guo2025web,
title={Web-CogReasoner: Towards Knowledge-Induced Cognitive Reasoning for Web Agents},
author={Guo, Yuhan and Guo, Cong and Sun, Aiwen and He, Hongliang and Yang, Xinyu and Lu, Yue and Zhang, Yingji and Guo, Xuntao and Zhang, Dong and Liu, Jianzhuang and others},
journal={arXiv preprint arXiv:2508.01858},
year={2025}
}