SyncMind: Measuring Agent Out-of-Sync Recovery in Collaborative Software Engineering

March 3, 2025 ยท View on GitHub

SyncMind

Alt text

๐Ÿ€1. Environment Setup

To use SyncMind and SyncBench for agent out-of-sync recovery

git clone https://github.com/xhguo7/SyncMind.git

Setup environment for SyncMind:

  • We are using OpenHands to implement interactive codebase environments for agent out-of-sync recovery.
    • Miniconda env setup: may refer to Development.md for further details
      # Download and install Mamba (a faster version of conda)
      curl -L -O "https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-$(uname)-$(uname -m).sh"
      bash Miniforge3-$(uname)-$(uname -m).sh
      
      # Install Python 3.12, nodejs, and poetry
      mamba install python=3.12
      mamba install conda-forge::nodejs
      mamba install conda-forge::poetry
      

More details can be found at README.md.

๐Ÿ“2. SyncBench

  • Dataset construction

  • Data preparation

    • Make sure you have SyncBench instances prepared before running
      • By default, SyncMind will directly load SyncBench from Hugging Face
      • If you would like to download SyncBench from Hugging Face, and then use SyncBench locally: set --data-path to your local data path (i.e., [evaluation data path] in the example below)
      • If would like to run a few instances: define --eval-n-limit (i.e., [evaluation limit] in the example below) to set dataset size for evaluation

๐Ÿ“Š3. SyncMind

  • Run SyncMind for agent out-of-sync recovery

    cd SyncMind/syncmind/framework/OpenHands
    bash ./evaluation/benchmarks/syncmind/scripts/run_infer.sh [llm configuration] [git version] [agent] [evaluation limit] [out-of-sync recovery method] [if using remote run] [max-turn limit] [num-workers] [evaluation data path] [resource-budget] [resource-coding cost] [resource-asking cost]
    

    For example: Run SyncMind with GPT-4o as the agent tackling out-of-sync

    • [llm configuration]: llm.gpt_4o

    • [git version]: HEAD

    • [agent]: CodeActAgent

    • [evaluation limit]: 10

    • [out-of-sync recovery method]: independent

    • [if using remote run]: false

    • [max-turn limit]: 30

    • [num-workers]: 1

    • [evaluation data path]: set this field only if you have downloaded SyncBench locally

      • If loading SyncBench directly from Hugging Face, skip [evaluation data path]:

        bash ./evaluation/benchmarks/syncmind/scripts/run_infer.sh llm.gpt_4o HEAD CodeActAgent 10 independent false 30 1
        
      • Or have already downloaded SyncBench locally: Run SyncMind on local dataset ./data/callee_11_whisper_instance.csv:

        bash ./evaluation/benchmarks/syncmind/scripts/run_infer.sh llm.gpt_4o HEAD CodeActAgent 10 independent false 30 1 ./data/callee_11_whisper_instance.csv 
        
  • Run SyncMind for resource-aware agent out-of-sync recovery:

    • [max-turn limit]: 30

    • [resource-budget]: 1000 (default)

    • [resource-coding cost]: 100 (default)

    • [resource-asking cost]: 100 (default)

    • Continue with our example:

      If would like to define a different setting of resources, e.g.:

      • [max-turn limit]: 20
      • [resource-budget]: 3000
      • [resource-coding cost]: 50
      • [resource-asking cost]: 200
    • If loading SyncBench directly from Hugging Face, skip [evaluation data path]:

      bash ./evaluation/benchmarks/syncmind/scripts/run_infer.sh llm.gpt_4o HEAD CodeActAgent 10 independent false 20 1 3000 50 200
      
    • Or have already downloaded SyncBench locally: Run SyncMind on local dataset ./data/callee_11_whisper_instance.csv:

      bash ./evaluation/benchmarks/syncmind/scripts/run_infer.sh llm.gpt_4o HEAD CodeActAgent 10 independent false 20 1 3000 50 200 ./data/callee_11_whisper_instance.csv
      
  • Evaluation

    • Results of agent out-of-sync will be automatically saved to OpenHands/evaluation/benchmarks/syncmind/tmps
    • Run evaluation on agent out-of-sync results
      cd ./SyncMind/syncmind/framework/OpenHands
      bash ./evaluation/benchmarks/syncmind/scripts/run_eval.sh [path to eval data]
      
      The evaluation result will be saved to the same directory as your eval data, with the file name eval_summary_{timestamp}.json.
  • Our experiments in our paper are conducted on OpenHands 0.10.0

    • Can directly use SyncMind on OpenHands 0.10.0:
      • Quick Use: May directly use the entire framework
        cd SyncMind/syncmind/framework/OpenHands
        
      • OR: May clone OpenHands 0.10.0 to your desired local path
        git clone https://github.com/xhguo7/OpenHands10.git
        cp -rp SyncMind/syncmind/framework/syncmind OpenHands10/evaluation/
        
    • Can also leverage our updated SyncMind on latest OpenHands
      • We will do our best to maintain the synchronized version of SyncMind that can be compatible with the latest OpenHands
      • Check our recent updates at SyncMind.md
      • We will save updated versions of SyncMind to the following directory:
        cd SyncMind/syncmind/updates
        

๐Ÿ“‹4. Version Archives

  • March 1st, 2025

  • All Updates

    • V1: January 30th, 2025

      • SyncMind: [SyncMind]
      • OpenHands Version: 0.10.0
    • V2: March 1st, 2025

      • SyncMind: [SyncMind]
      • OpenHands Version: 0.27.0