Running CUA-RAG in Windows Agent Arena
January 28, 2026 · View on GitHub
Prerequisites
- Ensure
cua_skillis already set up on your system
Setup Instructions
1. Prepare WindowsAgentArena
- Pull the latest WindowsAgentArena code in the WAA submodule
- Follow the instructions in
WindowsAgentArena/internal/LOCALDEV.mdto:- Prepare and build the Docker image
- Create the
winarenaconda environment
- After building the image, create a clean backup copy:
cp -rf cua_skill/WindowsAgentArena/src/win-arena-container/vm/storage \ cua_skill/WindowsAgentArena/src/win-arena-container/vm/storage_gold
If you have a downloaded storage image, you can also name it storage_gold and place it in the same directory to use.
2. Configure CUA-RAG
Switch to the rag branch in cua_skill for the latest features
3. Set Up Environment Variables
Create a .env file in the ./agent directory with the following content:
UITARS_V1_BEARER_KEY="your_uitars_key"
AZURE_AD_TOKEN=""
Note: Leave AZURE_AD_TOKEN empty initially
4. Configure Azure Authentication
- Create a screen session named "token":
screen -S token - Login to Azure:
az login --scope https://cognitiveservices.azure.com/.default --use-device-code - Navigate to the evaluation directory and run the token refresh script:
cd cua_skill/evaluation/WindowsAgentArena ./refresh_token.sh - Keep this screen session running in the background
5. File Synchronization
Changes in cua_skill/agent/ are automatically synced to:
cua_skill/WindowsAgentArena/src/win-arena-container/client/mm_agents/rag_cua
File mappings:
requirements_waa.txt→requirements.txtagent_rag_waa.py→agent_waa.py
Model Configuration
Configure model settings in agent/config_rag.json:
| Setting | Description |
|---|---|
planner.model_class | Select planner model: "gpt" or "qwen" |
rag.rel_action_sample_path | Set action sampling percentage (e.g., "mm_agents/rag_cua/sample_actions/0percent.json"). Leave empty ("") to allow all actions. options can be found in cua_skill/agent/sample_actions |
Running Tests
Test Configuration
- Place test JSON files in
cua_skill/evaluation/WindowsAgentArena/test_jsons - Run tests using the
run_cua_rag.shscript
Command Syntax
sudo bash ./run_cua_rag.sh <test_json_filename> [options]
Available Options
--use_gold_image: Use the clean backup copy of the storage image--clean_mode: Reset environment between each test case (recommended)--reset_image: Remove current storage and regenerate fromsetup.isoby running:sudo "./run-local.sh" --prepare-image true
Examples
# Run with clean environment for each test case (recommended)
sudo bash ./run_cua_rag.sh "test_one.json" --use_gold_image --clean_mode
Tips:
- You can select different tasks within
test_one.json - If using a downloaded storage image, rename it to
storage_goldand use--use_gold_image - Using
--clean_modeis recommended to avoid display errors and ensure test isolation