ARX Inference

February 14, 2026 · View on GitHub

Deploy trained OpenPi policies on ARX-X5 dual-arm robots (RealSense cameras, ROS2). Inference runs by sending observations to a policy server (on a GPU host) and applying the returned action chunks on the robot, with optional temporal smoothing [2] or RTC (real-time chunking) [1].

Prerequisites: ARX-X5 and ROS2 setup

Follow the official ARX-X5 setup so arms and cameras work correctly:

ARX_X5 (ARX Robotics)
Clone the repo, follow the docs for:
- ROS2 workspace (e.g. ROS2/X5_ws), building packages
- Arm control and feedback topics
- RealSense camera setup

After the ARX-X5 and robot hardware are set up, install the Python 3.10 inference environment (below) and build the bimanual package in this repo.

Python 3.10 inference environment (one-time setup)

On the machine that runs the inference client (IPC or same host as ROS2), use a dedicated Python 3.10 environment.

1. Create and activate conda env

conda create -n kai0_inference python=3.10
conda activate kai0_inference

2. Install dependencies

Install PyTorch (if needed for local use), OpenCV, NumPy, and other deps. From the repository root you can reuse the Agilex IPC requirements if present:

pip install -r train_deploy_alignment/inference/agilex/requirements_inference_ipc.txt

(or install opencv-python, numpy, pyrealsense2, etc. as needed.)

3. Install OpenPi client (editable)

From the repository root:

cd packages/openpi-client
pip install -e .
cd ../..

The inference scripts use openpi_client to talk to the policy server.

4. Build the bimanual package (ARX)

From the ARX inference directory:

cd train_deploy_alignment/inference/arx
./build.sh

This builds the bimanual package (e.g. cd bimanual && ./build.sh). Ensure your environment can load the built libraries (e.g. LD_LIBRARY_PATH as in setup.sh).

5. ROS2 and ARX messages

Source your ROS2 workspace and ensure the arx5_arm_msg (or equivalent) package is built and sourced so that RobotStatus / RobotCmd are available. The inference scripts import arx5_arm_msg.msg.

Inference setup (startup sequence)

Inference uses two machines: a GPU host (policy server) and the client machine (ROS2 + inference script). Follow the order below.

On the GPU host — start the policy server

From the repository root on the GPU machine:

uv run scripts/serve_policy.py policy:checkpoint --policy.config=<train_config> --policy.dir=<checkpoint_dir> [--port=8000]

Use the same training config and checkpoint as your trained model. For RTC inference, use an RTC config (e.g. pi0_rtc_aloha_sim or pi05_rtc_flatten_fold_inference); see RTC mode below.

On the client — full startup steps

Before running inference (or DAGGER), CAN must be configured and up (per the ARX official repo; this repo does not provide CAN scripts), and you must enable both master and slave arms. Order:

Ensure CAN is configured and up (follow ARX official ARX_X5 / ARX_CAN setup).
Enable both master and slave arms. Start the master and slave controller nodes (e.g. in separate terminals, or use the DAGGER arx_start.sh in dagger/arx to start both):
```
ros2 launch arx_x5_controller open_remote_master.launch.py
ros2 launch arx_x5_controller open_remote_slave.launch.py
```
Wait until nodes are up (e.g. ros2 node list shows the arm nodes).

Source ROS2 and (if needed) your conda env:

source /path/to/ros2_ws/install/setup.bash
conda activate <your_inference_env>

Source ARX setup (for LD_LIBRARY_PATH so bimanual libs load). From the arx directory:
```
cd train_deploy_alignment/inference/arx
source setup.sh
```

Run the inference script from the arx directory, with --host set to the GPU host IP:

cd train_deploy_alignment/inference/arx
source setup.sh
cd inference
python arx_openpi_inference_rtc.py --host <policy_server_ip> --port 8000 --rtc_mode --chunk_size 50

Or run another script (see Inference scripts below). Replace <policy_server_ip> with your policy server IP.

RTC (real-time chunking) mode

RTC stands for real-time chunking [1]. For RTC inference, the policy server must load the RTC model (Pi0RTC). Start the server with an RTC config, e.g.:

uv run scripts/serve_policy.py policy:checkpoint --policy.config=pi0_rtc_aloha_sim --policy.dir=<path_to_jax_checkpoint> [--port=8000]

Then run the ARX inference script with --rtc_mode (e.g. arx_openpi_inference_rtc.py --rtc_mode). RTC uses JAX checkpoints only.

Prompt and AWBC

Set the language prompt in the inference script to match training. In the scripts, set the global lang_embeddings at the top (e.g. lang_embeddings = "hang the cloth"). For AWBC-trained models, use the same advantage format as in stage_advantage (e.g. "<task>, Advantage: positive").

Inference scripts

Run from train_deploy_alignment/inference/arx after source setup.sh, then cd inference:

Script	Description	Example command
`inference/arx_openpi_inference_rtc.py`	RTC [1] with `--rtc_mode`; without it, same as temporal smoothing. Server must use RTC config for RTC.	`python arx_openpi_inference_rtc.py --host <IP> --port 8000 --rtc_mode --chunk_size 50`
`inference/arx_openpi_inference_temporal_smooth.py`	Temporal smoothing; async inference + stream buffer.	`python arx_openpi_inference_temporal_smooth.py --host <IP> --port 8000`
`inference/arx_openpi_inference_sync.py`	Sync: blocking infer every chunk, then execute step-by-step (like Agilex sync).	`python arx_openpi_inference_sync.py --host <IP> --port 8000`
`inference/arx_openpi_inference_temporal_ensembling.py`	Temporal ensembling [2]: `--smooth_method naive_async` or `temporal_ensembling`, `--exp_weight_m` for aggregation.	`python arx_openpi_inference_temporal_ensembling.py --host <IP> --smooth_method temporal_ensembling --exp_weight_m 0.01`

--host: GPU host IP. --port: server port (default 8000).
lang_embeddings: Set in the script (or in arx_openpi_inference_rtc for scripts that import it) to match training.

References

Black, K., Galliker, M. Y., & Levine, S. (2025). Real-Time Execution of Action Chunking Flow Policies. arXiv preprint arXiv:2506.07339.
Zhao, T. Z., Kumar, V., Levine, S., & Finn, C. (2023). Learning fine-grained bimanual manipulation with low-cost hardware. arXiv preprint arXiv:2304.13705.

BibTeX:

@misc{black2025realtime,
  author    = {Black, Kevin and Galliker, Manuel Y. and Levine, Sergey},
  title     = {Real-Time Execution of Action Chunking Flow Policies},
  year      = {2025},
  eprint    = {2506.07339},
  archivePrefix = {arXiv},
  primaryClass  = {cs}
}

@misc{zhao2023learning,
  author    = {Zhao, Tony Z. and Kumar, V. and Levine, Sergey and Finn, Chelsea},
  title     = {Learning fine-grained bimanual manipulation with low-cost hardware},
  year      = {2023},
  eprint    = {2304.13705},
  archivePrefix = {arXiv},
  primaryClass  = {cs}
}