README.md

June 25, 2026 · View on GitHub

Dexbotic Logo

One-Stop VLA Development Toolbox for Embodied Intelligence

Paper Hugging Face Documentation License Chinese

Pretraining · Fine-tuning · Inference · Evaluation
Supporting mainstream policies such as π0, CogACT, OFT, MemVLA, and more

Introduction

Dexbotic is a VLA (Vision-Language-Action) development toolbox built on the PyTorch framework, designed to provide a unified and efficient solution for embodied intelligence research. It comes with built-in environment configurations for various mainstream VLA models, allowing users to reproduce, fine-tune, and inference cutting-edge VLA algorithms with simple setup.

  • Ready-to-Use VLA Framework: Centered around VLA models, integrating embodied manipulation and navigation capabilities, supporting multiple cutting-edge algorithms.
  • High-Performance Pre-trained Foundation Models: For mainstream VLA algorithms such as π0 and CogACT, Dexbotic provides multiple optimized pre-trained models.
  • Modular Development Architecture: Adopting a "layered configuration + factory registration + entry dispatch" architecture, users can easily modify configurations, change models, or add tasks by simply modifying experimental scripts.
  • Unified Cloud and Local Training: Fully supports both cloud and local training needs, supporting cloud training platforms such as Alibaba Cloud and Volcano Engine, while also accommodating consumer-grade GPUs for local training.
  • Extensive Robot Compatibility: For mainstream robots such as UR5, Franka, and ALOHA, Dexbotic provides a unified training data format and deployment scripts.

🔥 News

  • [2026-06-25] Added Co-training capability for the DM0 model. See Hybrid DM0 Co-Training for details.
  • [2026-06-18] Released a DOS-W1 inference tutorial, covering how to integrate DOS-W1 with Dexbotic.
  • [2026-06-18] Added a DM0 realtime inference guide for the Triton-backed realtime backend, including 5x core inference speedup results.
  • [2026-06-12] Added a unified v1 inference API for consistent VLA/VLM serving across model wrappers.
  • [2026-05-15] Supported FSDP2 training backend for faster distributed training.
  • [2026-05-09] Supported Uni-NaVid with a guide.
  • [2026-04-27] Supported RLinf as an RL backend for RL post-training.
  • [2026-03-30] Supported GR00TN1.
  • [2026-03-30] Added Co-training capability for the Pi05 model.
  • [2026-03-30] Released a tutorial on integrating XLeRobot with Dexbotic.
  • [2026-02-10] DM0 released! See the technical report for more information.
  • [2026-02-10] Partnership Announcement: We are excited to announce a strategic collaboration with RLinf. Together, our teams will advance VLA + RL research and applications.
  • [2026-01-15] Released a tutorial on integrating SO-101 with Dexbotic.
  • [2026-01-15] Supported GRPO
  • [2026-01-15] Supported NaVILA.
  • [2026-01-08] Added Co-training capability, enabling joint optimization of action experts and LLMs for the CogACT model.
  • [2026-01-08] Released a specialized image compatible with Blackwell GPUs.
  • [2025-12-29] Supported OFT and Pi0.5 models.
  • [2025-10-20] Dexbotic officially released! Check out the technical report and official documentation for details.

Quick Start

We strongly recommend using Docker for development or deployment to get the best experience.

1. Installation and Environment Setup

# 1. Clone the repository
git clone https://github.com/dexmal/dexbotic.git

# 2. Start Docker container
docker run -it --rm --gpus all --network host \
  -v $(pwd)/dexbotic:/dexbotic \
  dexmal/dexbotic \
  bash

# 3. Activate environment and install dependencies
cd /dexbotic
conda activate dexbotic
pip install -e .

System Requirements: Ubuntu 20.04/22.04, recommended GPUs: RTX 4090, A100, or H100 (8 GPUs recommended for training, 1 GPU for deployment).

Using on Blackwell GPUs

For users with Blackwell architecture GPUs (e.g., B100, RTX 5090), please use the specialized Docker image dexmal/dexbotic:c130t28.

# 1. Start Docker with Blackwell image
docker run -it --rm --gpus all --network host \
  -v /path/to/dexbotic:/dexbotic \
  dexmal/dexbotic:c130t28 \
  bash

# 2. Activate environment
cd /dexbotic
pip install -e .

2. Usage Guide

Benchmark Results

The following shows a comparison of evaluation results between models trained with Dexbotic and original models on mainstream simulation environments. View more detailed evaluation results: Benchmark Results

Libero

ModelAverageLibero-SpatialLibero-ObjectLibero-GoalLibero-10
CogACT93.697.298.090.288.8
DB-CogACT94.993.897.896.291.8
π094.296.898.895.885.2
DB-π093.99798.29486.4
MemVLA96.798.498.496.493.4
DB-MemVLA97.097.299.298.493.2
DB-GR00TN194.893.099.695.291.4

CALVIN

ModelAverage Length12345
CogACT3.24683.872.96455.948
DB-CogACT4.06393.586.780.37669.8
OFT3.47289.179.467.459.851.5
DB-OFT3.54092.880.769.260.251.1

SimplerEnv

ModelAverageSpoonCarrotStack BlocksEggplant
CogACT51.2571.750.81567.5
DB-CogACT69.4587.565.2829.1795.83
OFT30.2312.54.24.2100
DB-OFT76.3991.6776.3943.0694.44
MemVLA71.975.075.037.5100.0
DB-MemVLA84.4100.066.770.8100.0

ManiSkill2

ModelAveragePickCubeStackCubePickSingleYCBPickSingleEGADPickClutterYCB
CogACT405570302520
DB-CogACT589065654030
OFT214045550
DB-OFT639075556530
π0669585558510
DB-π0659585655030

RoboTwin2.0

ModelAverageAdjust BottleGrab RollerPlace Empty CupPlace Phone Stand
CogACT43.88772115
DB-CogACT58.599892818

FAQ

Q: Failed to install Flash-Attention

A: For detailed installation instructions and troubleshooting, please refer to the official documentation at https://github.com/Dao-AILab/flash-attention.

Q: Coverting RLDS/LeRobot to Dexdata

A: We provide a general data conversion guide in data conversion. An example of Lerobot data conversion can be found in convert_lerobot_to_dexdata, and an example for RLDS data conversion is available in convert_rlds_to_dexdata.

Q: Is 5090 supported?

A: Yes, please refer to Using on Blackwell GPUs.

Support Us

We are continuously improving, with more features coming soon. If you like this project, please give us a star on GitHub GitHub. Your support is our motivation to keep moving forward!

If Dexbotic has been helpful in your research work, please consider citing our technical report:

@article{dexbotic,
  title={Dexbotic: Open-Source Vision-Language-Action Toolbox},
  author={Dexbotic Contributors},
  journal={arXiv preprint arXiv:2510.23511},
  year={2025}
}

License

This project is licensed under the MIT License.