README.md
July 23, 2024 Β· View on GitHub
π AssistantBench: Can Web Agents Solve Realistic and Time-Consuming Tasks?
AssistantBench evaluates the ability of AI agents to solve reaslistic and time-consuming web tasks such as βWhich gyms near me have fitness classes on the weekend, before 7AM?".
β°οΈ Dataset and leaderboard
To start working on AssistantBench, please check out our HuggingFace dataset and leaderboard, where you can also make new submissions.
π€ SPA
We also introduce SeePlanAct (SPA), a new web agent built to tackle tasks in AssistantAgent. Code to run SPA and additional resources will be released soon!
β Citation
@misc{yoran2024assistantbenchwebagentssolve,
title={AssistantBench: Can Web Agents Solve Realistic and Time-Consuming Tasks?},
author={Ori Yoran and Samuel Joseph Amouyal and Chaitanya Malaviya and Ben Bogin and Ofir Press and Jonathan Berant},
year={2024},
eprint={2407.15711},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2407.15711},
}