Tutorial.md

June 22, 2026 · View on GitHub

We strongly recommend using the docker as a unified, consistent, and reproducible environment for training and deployment. This approach not only ensures reliability across workflows but also minimizes potential issues arising from CUDA version differences and Python dependency conflicts.

Please see the Dockerfile for details about the image contents.

Prerequisites

Ubuntu 20.04 or 22.04
NVIDIA GPU: RTX 4090 / RTX 5090 / A100 / H100 (8 GPUs recommended for training; 1 GPU for deployment)
NVIDIA Docker installed

Step 1: Clone the Repository

git clone https://github.com/Dexmal/dexbotic.git

Step 2: Start Docker

docker run -it --rm --gpus all --network host \
  -v /path/to/dexbotic:/dexbotic \
  dexmal/dexbotic \
  bash

Step 3: Activate Dexbotic Environment

cd /dexbotic
conda activate dexbotic
pip install -e .

The image built from this repo's Dockerfile ships with torch==2.6.2 and transformers==4.57.6.

Using on Blackwell GPUs

For users with Blackwell GPUs (e.g., B100, RTX 5090), please use the specialized Docker image dexmal/dexbotic:c130t28.

Step 1: Start Docker with Blackwell Image

docker run -it --rm --gpus all --network host \
  -v /path/to/dexbotic:/dexbotic \
  dexmal/dexbotic:c130t28 \
  bash

Step 2: Activate Environment

cd /dexbotic
pip install -e .

Conda Installation

Prerequisites

Ubuntu 20.04 or 22.04
NVIDIA GPU: RTX 4090 / A100 / H100 (8 GPUs recommended for training; 1 GPU for deployment)
CUDA 11.8 (tested; other versions may also work)
Anaconda

Step 1: Clone the Repository

git clone https://github.com/Dexmal/dexbotic.git

Step 2: Install Dependencies

conda create -n dexbotic python=3.10 -y
conda activate dexbotic

pip install torch==2.6.0 torchvision==0.21.0 xformers --index-url https://download.pytorch.org/whl/cu118
cd dexbotic
pip install -e .
pip install transformers==4.57.6

# FlashAttention kernels (e.g. cross-entropy used in RL training) are fetched
# on demand from the Hugging Face Hub via the `kernels` library, which is
# installed as a core dependency above. No local flash-attn build is required.
#
# Optionally, to use a locally compiled flash-attn (e.g. for the
# `flash_attention_2` HF attention implementation), install it explicitly:
# pip install ninja packaging
# pip install flash-attn --no-build-isolation

Evaluation

We provide pre-trained models for both simulation benchmarks and real-robot settings. Here we use the Libero pre-trained model as an example.

First, you should download the pre-trained models and put it in the checkpoints folder.

mkdir -p checkpoints/libero
cd checkpoints/libero
git clone https://huggingface.co/Dexmal/libero-db-cogact libero_cogact

We will demonstrate two ways to evaluate the model. The first is to directly infer one sample, which is the quick way to experience the model. The other is to first deploy the model server and then use a client to get the results, which is more practical in real-world deployment.

Inference One Sample

CUDA_VISIBLE_DEVICES=0 python playground/benchmarks/libero/libero_cogact.py --task inference_single --image_path test_data/libero_test.png --prompt 'What action should the robot take to put both moka pots on the stove?'

You will expect the model to output a set of actions.

Deploy Mode

Start Inference Server

CUDA_VISIBLE_DEVICES=0 python playground/benchmarks/libero/libero_cogact.py --task inference

Test Model Inference Results

curl -X POST \
  -F "text=What action should the robot take to put both moka pots on the stove?" \
  -F "image=@test_data/libero_test.png" \
  http://localhost:7891/process_frame

Test Libero Benchmark with Dexbotic-Benchmark

Set up the dexbotic-benchmark following its instructions and test the deployed model in the LIBERO-GOAL environment.

cd dexbotic-benchmark
docker run --gpus all --network host -v $(pwd):/workspace \
  dexmal/dexbotic_benchmark \
  bash /workspace/scripts/env_sh/libero.sh /workspace/evaluation/configs/libero/example_libero.yaml

dexbotic-benchmark also works without docker, see its documentation for further support

Training

Before starting training, please follow the instructions in ModelZoo.md to set up the pre-trained models, and download the Libero dataset as described in docs/Data.md.

Training a Model with Provided Data

We use Libero as an example to demonstrate how to train a model with Dexbotic. The experiment configuration file for this example is located at: playground/benchmarks/libero/libero_cogact.py

Experiment Configuration

# LiberoCogActTrainerConfig
output_dir = [Path to save checkpoints]

Launch Training

torchrun --nproc_per_node=8 playground/benchmarks/libero/libero_cogact.py

We recommend using 8 × NVIDIA A100/H100 GPUs for training. If you are using 8 × RTX 4090, please use the configuration file scripts/deepspeed/zero3_offload.json to reduce GPU memory utilization. For FSDP2 support, see FSDP2.md.

Training a Model with Your Own Data

Prepare Your Own Data

Refer to docs/Data.md for detailed instructions on data preparation. Once created, register your dataset under dexbotic/data/data_source.

Experiment Configuration

Create a new experiment configuration file (based on playground/example_exp.py) and set the required keys:

# CogActTrainerConfig
output_dir = [Path to save checkpoints]

# CogActDataConfig
dataset_name = [Name of your registered dataset]

Launch Training

torchrun --nproc_per_node=8 playground/benchmarks/example_exp.py

After training, please refer to the Evaluation section above to evaluate your model. Update the model_name_or_path in the inference config to your trained checkpoint, and run inference or start the inference server as described.