Fine-tune on Custom Embodiments ("NEW_EMBODIMENT")

April 18, 2026 ยท View on GitHub

This guide demonstrates how to finetune GR00T on your own robot data and configuration. We provide a complete example for the Huggingface SO-100 robot under examples/SO100, which uses demo_data/cube_to_bowl_5 as the demo dataset.

Step 1: Prepare Your Data

Prepare your data in GR00T-flavored LeRobot v2 format by following the data preparation guide.

Step 2: Prepare Your Modality Configuration

Define your own modality configuration by following the modality config guide. Below is an example configuration that corresponds to the demo data:

from gr00t.configs.data.embodiment_configs import register_modality_config
from gr00t.data.embodiment_tags import EmbodimentTag
from gr00t.data.types import (
    ActionConfig,
    ActionFormat,
    ActionRepresentation,
    ActionType,
    ModalityConfig,
)


so100_config = {
    # Video: use current frame only ([0]); list camera view names matching modality.json
    "video": ModalityConfig(
        delta_indices=[0],
        modality_keys=[
            "front",
            "wrist",
        ],
    ),
    # State: current proprioceptive reading; keys must match modality.json "state" entries
    "state": ModalityConfig(
        delta_indices=[0],
        modality_keys=[
            "single_arm",
            "gripper",
        ],
    ),
    # Action: 16-step prediction horizon; each key needs an ActionConfig
    "action": ModalityConfig(
        delta_indices=list(range(0, 16)),  # predict 16 future steps
        modality_keys=[
            "single_arm",
            "gripper",
        ],
        action_configs=[
            # single_arm: RELATIVE = delta from current state (better generalization)
            ActionConfig(
                rep=ActionRepresentation.RELATIVE,
                type=ActionType.NON_EEF,       # joint-space, not end-effector
                format=ActionFormat.DEFAULT,
            ),
            # gripper: ABSOLUTE = target position (binary open/close works better absolute)
            ActionConfig(
                rep=ActionRepresentation.ABSOLUTE,
                type=ActionType.NON_EEF,
                format=ActionFormat.DEFAULT,
            ),
        ],
    ),
    # Language: task instruction from annotation field in the dataset
    "language": ModalityConfig(
        delta_indices=[0],
        modality_keys=["annotation.human.task_description"],
    ),
}

# Important: always register under EmbodimentTag.NEW_EMBODIMENT for custom embodiments
register_modality_config(so100_config, embodiment_tag=EmbodimentTag.NEW_EMBODIMENT)

Step 3: Run Fine-tuning

We'll use gr00t/experiment/launch_finetune.py as the entry point. Ensure that the uv environment is enabled before launching. You can do this by running the command uv run bash <example_script_name>.

View Available Arguments

# Display all available arguments
uv run python gr00t/experiment/launch_finetune.py --help

Execute Fine-tuning

# Configure for single GPU
export NUM_GPUS=1
CUDA_VISIBLE_DEVICES=0 uv run python \
    gr00t/experiment/launch_finetune.py \
    --base-model-path nvidia/GR00T-N1.7-3B \
    --dataset-path ./demo_data/cube_to_bowl_5 \
    --embodiment-tag NEW_EMBODIMENT \
    --modality-config-path examples/SO100/so100_config.py \
    --num-gpus $NUM_GPUS \
    --output-dir /tmp/so100 \
    --save-total-limit 5 \
    --save-steps 2000 \
    --max-steps 2000 \
    --use-wandb \
    --global-batch-size 32 \
    --color-jitter-params brightness 0.3 contrast 0.4 saturation 0.5 hue 0.08 \
    --dataloader-num-workers 4

Key Parameters

ParameterDescription
--base-model-pathPath to the pre-trained base model checkpoint
--dataset-pathPath to your training dataset
--embodiment-tagTag to identify your robot embodiment
--modality-config-pathPath to user-specified modality config (required only for NEW_EMBODIMENT tag)
--output-dirDirectory where checkpoints will be saved
--save-stepsSave checkpoint every N steps
--max-stepsTotal number of training steps
--use-wandbEnable Weights & Biases logging for experiment tracking

Note: Validation during fine-tuning is disabled by default (eval_strategy="no" in the training config). To enable periodic validation, pass --eval-strategy steps --eval-steps 500 (runs validation every 500 steps) or --eval-strategy epoch (runs validation every epoch). You can also adjust --eval-batch-size (default: 2).

Step 4: Open Loop Evaluation

After finetuning, evaluate the model's performance using open loop evaluation:

uv run python gr00t/eval/open_loop_eval.py \
    --dataset-path ./demo_data/cube_to_bowl_5 \
    --embodiment-tag NEW_EMBODIMENT \
    --model-path /tmp/so100/checkpoint-2000 \
    --traj-ids 0 \
    --action-horizon 16 \
    --steps 400 \
    --modality-keys single_arm gripper

open_loop_eval.py Parameters

ParameterDefaultDescription
--dataset-pathdemo_data/cube_to_bowl_5/Path to LeRobot-format dataset
--embodiment-tagnew_embodimentRobot embodiment tag (case-insensitive)
--model-pathNonePath to checkpoint. If omitted, connects to a running server via --host/--port
--traj-ids[0]Episode indices to evaluate (space-separated, e.g., 0 1 2)
--action-horizon16Action steps predicted per inference call
--steps200Max steps per trajectory (capped by actual trajectory length)
--denoising-steps4Diffusion denoising iterations
--save-plot-pathNoneDirectory to save GT-vs-predicted comparison plots
--modality-keysNoneAction keys to plot. If omitted, plots all action dimensions
--host / --port127.0.0.1 / 5555Server address when --model-path is omitted

Example Evaluation Result

The evaluation generates visualizations comparing predicted actions against ground truth trajectories:

Open loop evaluation results showing predicted vs ground truth trajectories