Dynamics Model
June 3, 2026 · View on GitHub
cd RISE/dynamics/dynamics_model
# Navigate to the dynamics model directory before running the following commands
Generated Samples
Data Format
The framework expects data in the LeRobot format. For optimal training performance, we strongly recommend pre-resizing images to [256, 192] resolution for each view. We use three views (1 head view + 2 wrist views) for both pretraining and task-specific finetuning.
Directory Structure
All tasks should be organized in the dataset directory with the following structure:
# copy your dataset under the dataset directory
cp -r path/to/your/dataset dataset/
Each dataset is organized as follows:
task_A/
├── data/
│ └── chunk-000/
│ ├── episode_000000.parquet
│ ├── episode_000001.parquet
│ ├── episode_000002.parquet
│ └── ...
├── meta/
│ ├── info.json
│ ├── episodes.jsonl
│ ├── episodes_stats.jsonl
│ └── tasks.jsonl
└── videos/
└── chunk-000/
└── [video files]
Video Preprocessing
The preprocess.sh script resizes all videos in the dataset to 256x192 resolution using ffmpeg, preserving aspect ratio with center padding. Processed videos are saved in videos_small while maintaining the original directory structure.
Usage:
# Process specific datasets
./preprocess.sh dataset1 [dataset2](optional)
The output would be as follows with videos_small:
task_A/
├── data/
├── meta/
└── videos/
└── videos_small/
│ └── chunk-000/
│ └── [video files]
Model Checkpoints
Base LTX Backbone
Download the LTX-Video backbone components (Text Encoder, Tokenizer, and VAE) using the provided script:
./download.sh
This script automatically downloads all required components from the LTX-Video HuggingFace repository to the checkpoints directory.
Alternatively, you can manually download the following components:
- Text Encoder: text_encoder
- Tokenizer: tokenizer
- VAE: vae
- Pre-trained dynamics model: dynamics_model, pretrained on Galaxea Open World and AgiBot World Alpha jointly.
Place all downloaded weights in the same directory and update the pretrained_model_name_or_path field in your configuration file.
Training
Pre-training
Pre-training is performed on large-scale robotic datasets to learn general dynamics priors. We utilize the following datasets:
- Galaxea Open World Dataset: Galaxea-Open-World-Dataset
- AgiBot World Alpha: AgiBotWorld-Alpha
Steps
-
Prepare Data: Convert your datasets to the LeRobot format as described above.
-
Configure Training: Edit
configs/ltx_model/pretrain.yamlaccording to the comments:- Set
pretrained_model_name_or_pathto your LTX backbone checkpoint directory - Set
diffusion_model.model_pathto your pre-trained diffusion checkpoint - Configure
data.train.data_rootsanddata.val.data_rootsto point to your dataset directories
- Set
-
Launch Training:
bash train_task_centric.sh
Fine-tuning
Fine-tuning adapts the pre-trained model to specific task domains using domain-specific datasets.
Steps
-
Prepare Task-Specific Data: Organize your fine-tuning dataset in the LeRobot format.
-
Compute Action Normalization Statistics: Use
norm.pyto compute and save normalization statistics:python norm.py --datasets <your_finetune_dataset> --save-config data/utils/action_norm.jsonThis automatically computes min and max values for each dataset and saves them to a JSON configuration file.
-
Configure Fine-tuning: Edit
configs/ltx_model/finetune.yaml:- Set
pretrained_model_name_or_pathto your LTX backbone checkpoint directory - Set
diffusion_model.model_pathto your diffusion checkpoint - Configure
data.train.data_rootsanddata.val.data_rootsfor your fine-tuning dataset - Add
norm_config_path: data/utils/action_norm.jsonto bothdata.trainanddata.valsections - The data loader will automatically use the normalization values from the config file based on dataset names
- Set
-
Launch Fine-tuning:
bash task_finetune.sh
Inference
The inference pipeline generates future video sequences conditioned on initial observations and action sequences.
Steps
-
Configure Inference: Edit
configs/ltx_model/infer.yaml:- Set
pretrained_model_name_or_pathyour LTX backbone checkpoint directory - Set
diffusion_model.model_pathto your diffusion checkpoint
- Set
-
Update Inference Script: Edit
infer.shwith appropriate paths -
Run Inference:
bash infer.sh
Inference Parameters
--config_file: Path to inference configuration file--image_root: Directory containing input observation images--output_path: Directory to save generated videos--act_tokens_path: Path to action token file (.ptformat)--norm_constant: Normalization constant for action tokens (e.g.,FINETUNE_TASK)