SyncMind: Measuring Agent Out-of-Sync Recovery in Collaborative Software Engineering

March 3, 2025 · View on GitHub

SyncMind

Alt text

🍀1. Environment Setup

To use SyncMind and SyncBench for agent out-of-sync recovery

git clone https://github.com/xhguo7/SyncMind.git

Setup environment for SyncMind:

We are using OpenHands to implement interactive codebase environments for agent out-of-sync recovery.

Miniconda env setup: may refer to Development.md for further details

# Download and install Mamba (a faster version of conda)
curl -L -O "https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-$(uname)-$(uname -m).sh"
bash Miniforge3-$(uname)-$(uname -m).sh

# Install Python 3.12, nodejs, and poetry
mamba install python=3.12
mamba install conda-forge::nodejs
mamba install conda-forge::poetry

📝2. SyncBench

Dataset construction
- Customize your SyncBench: README.md
Data preparation
- Make sure you have SyncBench instances prepared before running
  - By default, SyncMind will directly load SyncBench from Hugging Face
  - If you would like to download SyncBench from Hugging Face, and then use SyncBench locally: set --data-path to your local data path (i.e., [evaluation data path] in the example below)
  - If would like to run a few instances: define --eval-n-limit (i.e., [evaluation limit] in the example below) to set dataset size for evaluation

📊3. SyncMind

Run SyncMind for agent out-of-sync recovery

cd SyncMind/syncmind/framework/OpenHands
bash ./evaluation/benchmarks/syncmind/scripts/run_infer.sh [llm configuration] [git version] [agent] [evaluation limit] [out-of-sync recovery method] [if using remote run] [max-turn limit] [num-workers] [evaluation data path] [resource-budget] [resource-coding cost] [resource-asking cost]

For example: Run SyncMind with GPT-4o as the agent tackling out-of-sync

[llm configuration]: llm.gpt_4o
[git version]: HEAD
[agent]: CodeActAgent
[evaluation limit]: 10
[out-of-sync recovery method]: independent
[if using remote run]: false
[max-turn limit]: 30
[num-workers]: 1

[evaluation data path]: set this field only if you have downloaded SyncBench locally

If loading SyncBench directly from Hugging Face, skip [evaluation data path]:

bash ./evaluation/benchmarks/syncmind/scripts/run_infer.sh llm.gpt_4o HEAD CodeActAgent 10 independent false 30 1

Or have already downloaded SyncBench locally: Run SyncMind on local dataset ./data/callee_11_whisper_instance.csv:

bash ./evaluation/benchmarks/syncmind/scripts/run_infer.sh llm.gpt_4o HEAD CodeActAgent 10 independent false 30 1 ./data/callee_11_whisper_instance.csv

Run SyncMind for resource-aware agent out-of-sync recovery:
- [max-turn limit]: 30
- [resource-budget]: 1000 (default)
- [resource-coding cost]: 100 (default)
- [resource-asking cost]: 100 (default)
- Continue with our example:
  
  If would like to define a different setting of resources, e.g.:
  - [max-turn limit]: 20
  - [resource-budget]: 3000
  - [resource-coding cost]: 50
  - [resource-asking cost]: 200
- If loading SyncBench directly from Hugging Face, skip [evaluation data path]:
```
bash ./evaluation/benchmarks/syncmind/scripts/run_infer.sh llm.gpt_4o HEAD CodeActAgent 10 independent false 20 1 3000 50 200
```
- Or have already downloaded SyncBench locally: Run SyncMind on local dataset ./data/callee_11_whisper_instance.csv:
```
bash ./evaluation/benchmarks/syncmind/scripts/run_infer.sh llm.gpt_4o HEAD CodeActAgent 10 independent false 20 1 3000 50 200 ./data/callee_11_whisper_instance.csv
```
Evaluation
- Results of agent out-of-sync will be automatically saved to OpenHands/evaluation/benchmarks/syncmind/tmps
- Run evaluation on agent out-of-sync results
```
cd ./SyncMind/syncmind/framework/OpenHands
bash ./evaluation/benchmarks/syncmind/scripts/run_eval.sh [path to eval data]
```
  The evaluation result will be saved to the same directory as your eval data, with the file name eval_summary_{timestamp}.json.
Our experiments in our paper are conducted on OpenHands 0.10.0
- Can directly use SyncMind on OpenHands 0.10.0:
  - Quick Use: May directly use the entire framework
```
cd SyncMind/syncmind/framework/OpenHands
```
  - OR: May clone OpenHands 0.10.0 to your desired local path
```
git clone https://github.com/xhguo7/OpenHands10.git
cp -rp SyncMind/syncmind/framework/syncmind OpenHands10/evaluation/
```
- Can also leverage our updated SyncMind on latest OpenHands
  - We will do our best to maintain the synchronized version of SyncMind that can be compatible with the latest OpenHands
  - Check our recent updates at SyncMind.md
    - Our latest version syncs with OpenHands 0.27.0
  - We will save updated versions of SyncMind to the following directory:
```
cd SyncMind/syncmind/updates
```

📋4. Version Archives

March 1st, 2025
- SyncMind: [SyncMind] [SyncMind with OpenHands]
- OpenHands Version: 0.27.0
All Updates
- V1: January 30th, 2025
  - SyncMind: [SyncMind]
  - OpenHands Version: 0.10.0
- V2: March 1st, 2025
  - SyncMind: [SyncMind]
  - OpenHands Version: 0.27.0