README.md

February 26, 2026 ยท View on GitHub

โš”๏ธ Computer Agent Arena

Toward Human-Centric Evaluation and Analysis of Computer-Use Agents

๐ŸŒ Website ย |ย  ๐Ÿ“‘ Paper (ICLR 2026) ย |ย  ๐Ÿ† Leaderboard ย |ย  ๐Ÿ“ Blog ย |ย  ๐Ÿค Contributing

ICLR 2026 License Python Node Contributions


Introduction

Computer Agent Arena is an open, crowdsourced evaluation platform for benchmarking computer-use agents (CUAs) on real-world tasks. Users interact with two AI agents side-by-side on live desktop environments (Ubuntu / Windows) and vote for the better one โ€” producing human preference data at scale that powers a continuously updated ELO leaderboard.

This repository releases the full platform stack: backend server, frontend UI, agent hub, and deployment infrastructure.


Platform Overview

  • Frontend (React 18 + TypeScript): Dual-agent chat panel, live VNC desktop viewer, leaderboard
  • Backend (Flask + Socket.IO): User sessions, VM pool orchestration, agent execution, and evaluation
  • Agent Hub: Pluggable implementations for 10+ frontier models
  • Infrastructure: AWS EC2 multi-region VM pool (Ubuntu / Windows) with adaptive auto-scaling
LayerTechnology
FrontendReact 18, TypeScript, Ant Design, Tailwind CSS, Socket.IO
BackendPython, Flask, Flask-SocketIO
DatabasePostgreSQL, Redis
InfrastructureAWS EC2 (multi-region), S3
AuthGoogle OAuth 2.0, JWT, Anonymous access, Prolific

Supported Agents

ModelOrganization
GPT-4.1, GPT-5OpenAI
Computer-Use-PreviewOpenAI
Claude 3.7 / 4 Sonnet, Claude Sonnet 4.5Anthropic
Gemini 2.5 ProGoogle
Qwen2.5-VL-72BAlibaba
UI-TARS-1.5ByteDance
OpenCUAXLang Lab
CoActโ€”

๐Ÿš€ Quick Start

Prerequisites

  • Python 3.8+, Node.js 16+
  • PostgreSQL, Redis
  • AWS account (for VM pool) or local VMware / VirtualBox

1. Clone

git clone https://github.com/xlang-ai/computer-agent-arena.git
cd computer-agent-arena

2. Backend

pip install -r backend/requirements.txt

Create a .env file with your database, Redis, AWS, API keys, and auth credentials. Configure config.yaml:

deployment: local   # or 'aws'

Start the server:

python -m backend.main   # listens on :8181

3. Frontend

cd frontend
npm install
npm start   # dev server on :3000

Adding a New Agent

  1. Create backend/agents/hub/YourAgent/ and implement a class extending BaseAgent.
  2. Register the model and method name in config.yaml under AVAILABLE_AGENT_OPTIONS.
  3. Test with python backend/agents/test/test_agents.py.

See existing implementations in backend/agents/hub/ (Anthropic, OpenAICUA, UI_TARS, OpenCUA, coact) for reference.


Repository Structure

computer-agent-arena/
โ”œโ”€โ”€ backend/
โ”‚   โ”œโ”€โ”€ main.py              # Entry point
โ”‚   โ”œโ”€โ”€ agents/              # Agent hub + base classes
โ”‚   โ”‚   โ””โ”€โ”€ hub/             # Per-model implementations
โ”‚   โ”œโ”€โ”€ api/                 # WebSocket / REST handlers
โ”‚   โ”œโ”€โ”€ desktop_env/         # VM abstraction (AWS, VMware, ...)
โ”‚   โ””โ”€โ”€ utils/               # DB, S3, socket utilities
โ”œโ”€โ”€ frontend/                # React 18 + TypeScript UI
โ”‚   โ””โ”€โ”€ src/
โ””โ”€โ”€ config.yaml              # Deployment configuration

License

MIT License โ€” see LICENSE for details.