README.md
February 26, 2026 ยท View on GitHub
โ๏ธ Computer Agent Arena
Toward Human-Centric Evaluation and Analysis of Computer-Use Agents
๐ Website ย |ย ๐ Paper (ICLR 2026) ย |ย ๐ Leaderboard ย |ย ๐ Blog ย |ย ๐ค Contributing
Introduction
Computer Agent Arena is an open, crowdsourced evaluation platform for benchmarking computer-use agents (CUAs) on real-world tasks. Users interact with two AI agents side-by-side on live desktop environments (Ubuntu / Windows) and vote for the better one โ producing human preference data at scale that powers a continuously updated ELO leaderboard.
This repository releases the full platform stack: backend server, frontend UI, agent hub, and deployment infrastructure.
Platform Overview
- Frontend (React 18 + TypeScript): Dual-agent chat panel, live VNC desktop viewer, leaderboard
- Backend (Flask + Socket.IO): User sessions, VM pool orchestration, agent execution, and evaluation
- Agent Hub: Pluggable implementations for 10+ frontier models
- Infrastructure: AWS EC2 multi-region VM pool (Ubuntu / Windows) with adaptive auto-scaling
| Layer | Technology |
|---|---|
| Frontend | React 18, TypeScript, Ant Design, Tailwind CSS, Socket.IO |
| Backend | Python, Flask, Flask-SocketIO |
| Database | PostgreSQL, Redis |
| Infrastructure | AWS EC2 (multi-region), S3 |
| Auth | Google OAuth 2.0, JWT, Anonymous access, Prolific |
Supported Agents
| Model | Organization |
|---|---|
| GPT-4.1, GPT-5 | OpenAI |
| Computer-Use-Preview | OpenAI |
| Claude 3.7 / 4 Sonnet, Claude Sonnet 4.5 | Anthropic |
| Gemini 2.5 Pro | |
| Qwen2.5-VL-72B | Alibaba |
| UI-TARS-1.5 | ByteDance |
| OpenCUA | XLang Lab |
| CoAct | โ |
๐ Quick Start
Prerequisites
- Python 3.8+, Node.js 16+
- PostgreSQL, Redis
- AWS account (for VM pool) or local VMware / VirtualBox
1. Clone
git clone https://github.com/xlang-ai/computer-agent-arena.git
cd computer-agent-arena
2. Backend
pip install -r backend/requirements.txt
Create a .env file with your database, Redis, AWS, API keys, and auth credentials. Configure config.yaml:
deployment: local # or 'aws'
Start the server:
python -m backend.main # listens on :8181
3. Frontend
cd frontend
npm install
npm start # dev server on :3000
Adding a New Agent
- Create
backend/agents/hub/YourAgent/and implement a class extendingBaseAgent. - Register the model and method name in
config.yamlunderAVAILABLE_AGENT_OPTIONS. - Test with
python backend/agents/test/test_agents.py.
See existing implementations in backend/agents/hub/ (Anthropic, OpenAICUA, UI_TARS, OpenCUA, coact) for reference.
Repository Structure
computer-agent-arena/
โโโ backend/
โ โโโ main.py # Entry point
โ โโโ agents/ # Agent hub + base classes
โ โ โโโ hub/ # Per-model implementations
โ โโโ api/ # WebSocket / REST handlers
โ โโโ desktop_env/ # VM abstraction (AWS, VMware, ...)
โ โโโ utils/ # DB, S3, socket utilities
โโโ frontend/ # React 18 + TypeScript UI
โ โโโ src/
โโโ config.yaml # Deployment configuration
License
MIT License โ see LICENSE for details.