Dynamics Model

June 3, 2026 · View on GitHub

cd RISE/dynamics/dynamics_model 
# Navigate to the dynamics model directory before running the following commands

The framework expects data in the LeRobot format. For optimal training performance, we strongly recommend pre-resizing images to [256, 192] resolution for each view. We use three views (1 head view + 2 wrist views) for both pretraining and task-specific finetuning.

Directory Structure

All tasks should be organized in the dataset directory with the following structure:

# copy your dataset under the dataset directory
cp -r path/to/your/dataset dataset/

Each dataset is organized as follows:

task_A/
├── data/
│   └── chunk-000/
│       ├── episode_000000.parquet
│       ├── episode_000001.parquet
│       ├── episode_000002.parquet
│       └── ...
├── meta/
│   ├── info.json              
│   ├── episodes.jsonl        
│   ├── episodes_stats.jsonl   
│   └── tasks.jsonl        
└── videos/
    └── chunk-000/
        └── [video files]

Video Preprocessing

The preprocess.sh script resizes all videos in the dataset to 256x192 resolution using ffmpeg, preserving aspect ratio with center padding. Processed videos are saved in videos_small while maintaining the original directory structure.

Usage:

# Process specific datasets
./preprocess.sh dataset1 [dataset2](optional)

The output would be as follows with videos_small:

task_A/
├── data/
├── meta/
└── videos/
└── videos_small/
│    └── chunk-000/
│        └── [video files]

Model Checkpoints

Base LTX Backbone

Download the LTX-Video backbone components (Text Encoder, Tokenizer, and VAE) using the provided script:

./download.sh

This script automatically downloads all required components from the LTX-Video HuggingFace repository to the checkpoints directory.

Alternatively, you can manually download the following components:

Text Encoder: text_encoder
Tokenizer: tokenizer
VAE: vae
Pre-trained dynamics model: dynamics_model, pretrained on Galaxea Open World and AgiBot World Alpha jointly.

Place all downloaded weights in the same directory and update the pretrained_model_name_or_path field in your configuration file.

Training

Pre-training

Pre-training is performed on large-scale robotic datasets to learn general dynamics priors. We utilize the following datasets:

Galaxea Open World Dataset: Galaxea-Open-World-Dataset
AgiBot World Alpha: AgiBotWorld-Alpha

Steps

Prepare Data: Convert your datasets to the LeRobot format as described above.
Configure Training: Edit configs/ltx_model/pretrain.yaml according to the comments:
- Set pretrained_model_name_or_path to your LTX backbone checkpoint directory
- Set diffusion_model.model_path to your pre-trained diffusion checkpoint
- Configure data.train.data_roots and data.val.data_roots to point to your dataset directories
Launch Training:
```
bash train_task_centric.sh
```

Fine-tuning

Fine-tuning adapts the pre-trained model to specific task domains using domain-specific datasets.

Steps

Prepare Task-Specific Data: Organize your fine-tuning dataset in the LeRobot format.
Compute Action Normalization Statistics: Use norm.py to compute and save normalization statistics:
```
python norm.py --datasets <your_finetune_dataset> --save-config data/utils/action_norm.json
```
This automatically computes min and max values for each dataset and saves them to a JSON configuration file.
Configure Fine-tuning: Edit configs/ltx_model/finetune.yaml:
- Set pretrained_model_name_or_path to your LTX backbone checkpoint directory
- Set diffusion_model.model_path to your diffusion checkpoint
- Configure data.train.data_roots and data.val.data_roots for your fine-tuning dataset
- Add norm_config_path: data/utils/action_norm.json to both data.train and data.val sections
- The data loader will automatically use the normalization values from the config file based on dataset names
Launch Fine-tuning:
```
bash task_finetune.sh
```

Inference

The inference pipeline generates future video sequences conditioned on initial observations and action sequences.

Steps

Configure Inference: Edit configs/ltx_model/infer.yaml:
- Set pretrained_model_name_or_path your LTX backbone checkpoint directory
- Set diffusion_model.model_path to your diffusion checkpoint
Update Inference Script: Edit infer.sh with appropriate paths
Run Inference:
```
bash infer.sh
```

Inference Parameters

--config_file: Path to inference configuration file
--image_root: Directory containing input observation images
--output_path: Directory to save generated videos
--act_tokens_path: Path to action token file (.pt format)
--norm_constant: Normalization constant for action tokens (e.g., FINETUNE_TASK)

Dynamics Model

Generated Samples

Data Format

Directory Structure

Video Preprocessing

Model Checkpoints

Base LTX Backbone

Training

Pre-training

Steps

Fine-tuning

Steps

Inference

Steps

Inference Parameters