Robotwin Eval code for Vidar/Vidarc

December 18, 2025 ยท View on GitHub

Overview

We utilize a client-server architecture for evaluation. This repository serves as the client, responsible for managing the server, sending requests, and executing evaluations upon receiving actions. Before proceeding, please ensure that the server-side environment and code are properly set up.

Env Setup

Please refer to READMEEnv.md.

Evaluation

This module provides a unified evaluation script based on torch.distributed (DDP), designed to simplify the multi-GPU/multi-task evaluation workflow in Client-Server mode.

Features

  1. Unified Architecture: Uses torchrun to launch a single Python script.
  2. Automatic Task Distribution: Automatically splits task lists leveraging DDP's rank and world_size, removing the need for manual assignment.
  3. Robust Process Management: Utilizes Python Context Manager to manage the Server lifecycle, ensuring that the Server and its subprocesses are cleanly terminated regardless of normal completion or abnormal exit.
  4. Decoupled Design: All paths (Server script, model, task descriptions) are passed via arguments rather than being hardcoded.
  5. Resumable Execution: Automatically skips tasks that already have existing logs.

Dependencies

  • PyTorch (for torch.distributed)
  • Existing vidar Server script
  • Existing script/eval_policy.py Client script

Usage

conda activate RoboTwin-hb

# eval with vidarc
bash run_eval_ddp_causal.sh

# eval with vidar
bash run_eval_ddp.sh 

Parameters

ArgumentDescriptionDefault
--server_scriptPath to the Server startup script (Required)-
--modelPath to the model (Required)-
--idmPath to the Inverse Dynamics Model-
--prefixPrefix for the output directory (Required)"debug"
--task_dirDirectory containing task description files"./description/task_instruction"
--server_cwdWorking directory for the Server script"../cosmos-predict2"
--base_portStarting port number (Rank 0 uses base, Rank 1 uses base+1...)25400