README.md

June 25, 2026 · View on GitHub

One-Stop VLA Development Toolbox for Embodied Intelligence

Pretraining · Fine-tuning · Inference · Evaluation
Supporting mainstream policies such as π0, CogACT, OFT, MemVLA, and more

Introduction

Dexbotic is a VLA (Vision-Language-Action) development toolbox built on the PyTorch framework, designed to provide a unified and efficient solution for embodied intelligence research. It comes with built-in environment configurations for various mainstream VLA models, allowing users to reproduce, fine-tune, and inference cutting-edge VLA algorithms with simple setup.

Ready-to-Use VLA Framework: Centered around VLA models, integrating embodied manipulation and navigation capabilities, supporting multiple cutting-edge algorithms.
High-Performance Pre-trained Foundation Models: For mainstream VLA algorithms such as π0 and CogACT, Dexbotic provides multiple optimized pre-trained models.
Modular Development Architecture: Adopting a "layered configuration + factory registration + entry dispatch" architecture, users can easily modify configurations, change models, or add tasks by simply modifying experimental scripts.
Unified Cloud and Local Training: Fully supports both cloud and local training needs, supporting cloud training platforms such as Alibaba Cloud and Volcano Engine, while also accommodating consumer-grade GPUs for local training.
Extensive Robot Compatibility: For mainstream robots such as UR5, Franka, and ALOHA, Dexbotic provides a unified training data format and deployment scripts.

🔥 News

[2026-06-25] Added Co-training capability for the DM0 model. See Hybrid DM0 Co-Training for details.
[2026-06-18] Released a DOS-W1 inference tutorial, covering how to integrate DOS-W1 with Dexbotic.
[2026-06-18] Added a DM0 realtime inference guide for the Triton-backed realtime backend, including 5x core inference speedup results.
[2026-06-12] Added a unified v1 inference API for consistent VLA/VLM serving across model wrappers.
[2026-05-15] Supported FSDP2 training backend for faster distributed training.
[2026-05-09] Supported Uni-NaVid with a guide.
[2026-04-27] Supported RLinf as an RL backend for RL post-training.
[2026-03-30] Supported GR00TN1.
[2026-03-30] Added Co-training capability for the Pi05 model.
[2026-03-30] Released a tutorial on integrating XLeRobot with Dexbotic.
[2026-02-10] DM0 released! See the technical report for more information.
[2026-02-10] Partnership Announcement: We are excited to announce a strategic collaboration with RLinf. Together, our teams will advance VLA + RL research and applications.
[2026-01-15] Released a tutorial on integrating SO-101 with Dexbotic.
[2026-01-15] Supported GRPO
[2026-01-15] Supported NaVILA.
[2026-01-08] Added Co-training capability, enabling joint optimization of action experts and LLMs for the CogACT model.
[2026-01-08] Released a specialized image compatible with Blackwell GPUs.
[2025-12-29] Supported OFT and Pi0.5 models.
[2025-10-20] Dexbotic officially released! Check out the technical report and official documentation for details.

Quick Start

We strongly recommend using Docker for development or deployment to get the best experience.

1. Installation and Environment Setup

# 1. Clone the repository
git clone https://github.com/dexmal/dexbotic.git

# 2. Start Docker container
docker run -it --rm --gpus all --network host \
  -v $(pwd)/dexbotic:/dexbotic \
  dexmal/dexbotic \
  bash

# 3. Activate environment and install dependencies
cd /dexbotic
conda activate dexbotic
pip install -e .

System Requirements: Ubuntu 20.04/22.04, recommended GPUs: RTX 4090, A100, or H100 (8 GPUs recommended for training, 1 GPU for deployment).

Using on Blackwell GPUs

For users with Blackwell architecture GPUs (e.g., B100, RTX 5090), please use the specialized Docker image dexmal/dexbotic:c130t28.

# 1. Start Docker with Blackwell image
docker run -it --rm --gpus all --network host \
  -v /path/to/dexbotic:/dexbotic \
  dexmal/dexbotic:c130t28 \
  bash

# 2. Activate environment
cd /dexbotic
pip install -e .

Model	Average	Libero-Spatial	Libero-Object	Libero-Goal	Libero-10
CogACT	93.6	97.2	98.0	90.2	88.8
DB-CogACT	94.9	93.8	97.8	96.2	91.8
π0	94.2	96.8	98.8	95.8	85.2
DB-π0	93.9	97	98.2	94	86.4
MemVLA	96.7	98.4	98.4	96.4	93.4
DB-MemVLA	97.0	97.2	99.2	98.4	93.2
DB-GR00TN1	94.8	93.0	99.6	95.2	91.4

CALVIN

Model	Average Length	1	2	3	4	5
CogACT	3.246	83.8	72.9	64	55.9	48
DB-CogACT	4.063	93.5	86.7	80.3	76	69.8
OFT	3.472	89.1	79.4	67.4	59.8	51.5
DB-OFT	3.540	92.8	80.7	69.2	60.2	51.1

SimplerEnv

Model	Average	Spoon	Carrot	Stack Blocks	Eggplant
CogACT	51.25	71.7	50.8	15	67.5
DB-CogACT	69.45	87.5	65.28	29.17	95.83
OFT	30.23	12.5	4.2	4.2	100
DB-OFT	76.39	91.67	76.39	43.06	94.44
MemVLA	71.9	75.0	75.0	37.5	100.0
DB-MemVLA	84.4	100.0	66.7	70.8	100.0

ManiSkill2

Model	Average	PickCube	StackCube	PickSingleYCB	PickSingleEGAD	PickClutterYCB
CogACT	40	55	70	30	25	20
DB-CogACT	58	90	65	65	40	30
OFT	21	40	45	5	5	0
DB-OFT	63	90	75	55	65	30
π0	66	95	85	55	85	10
DB-π0	65	95	85	65	50	30

RoboTwin2.0

Model	Average	Adjust Bottle	Grab Roller	Place Empty Cup	Place Phone Stand
CogACT	43.8	87	72	11	5
DB-CogACT	58.5	99	89	28	18

FAQ

Q: Failed to install Flash-Attention

A: For detailed installation instructions and troubleshooting, please refer to the official documentation at https://github.com/Dao-AILab/flash-attention.

Q: Coverting RLDS/LeRobot to Dexdata

A: We provide a general data conversion guide in data conversion. An example of Lerobot data conversion can be found in convert_lerobot_to_dexdata, and an example for RLDS data conversion is available in convert_rlds_to_dexdata.

Q: Is 5090 supported?

A: Yes, please refer to Using on Blackwell GPUs.

Support Us

We are continuously improving, with more features coming soon. If you like this project, please give us a star on GitHub . Your support is our motivation to keep moving forward!

If Dexbotic has been helpful in your research work, please consider citing our technical report:

@article{dexbotic,
  title={Dexbotic: Open-Source Vision-Language-Action Toolbox},
  author={Dexbotic Contributors},
  journal={arXiv preprint arXiv:2510.23511},
  year={2025}
}

License

This project is licensed under the MIT License.