Converting DROID from Scratch

February 16, 2026 ยท View on GitHub

If you want to reproduce the DreamZero DROID dataset conversion yourself (or modify the filtering), follow the steps below. This requires the raw DROID 1.0.1 dataset in RLDS format and the idle filter ranges JSON.

Most users should skip this and simply download the preprocessed dataset:

huggingface-cli download GEAR-Dreams/DreamZero-DROID-Data --repo-type dataset --local-dir ./data/droid_lerobot

Step 1: Install conversion dependencies

pip install tensorflow tensorflow-datasets polars av

Step 2: Download the raw DROID 1.0.1 dataset

This requires gsutil (Google Cloud CLI). The full dataset is ~1.7TB.

gsutil -m cp -r gs://gresearch/robotics/droid/1.0.1 ./data/droid/1.0.1

Important: Use version 1.0.1, not 1.0.0. Version 1.0.1 contains the complete set of language annotations (~75k episodes).

Step 3: Download the idle filter ranges

This JSON file maps each episode to the frame ranges that should be kept (non-idle frames). It was originally computed by Physical Intelligence for training pi0-DROID models.

gsutil cp gs://openpi-assets/droid/droid_sample_ranges_v1_0_1.json ./data/keep_ranges.json

Step 4: Run the conversion

python scripts/data/convert_droid.py \
    ./data/droid/1.0.1 \
    ./data/droid_lerobot \
    --keep-ranges-path ./data/keep_ranges.json \
    --filter-failed \
    -n 16

For a quick test with a small subset:

python scripts/data/convert_droid.py \
    ./data/droid/1.0.1 \
    ./data/droid_lerobot_test \
    --keep-ranges-path ./data/keep_ranges.json \
    --filter-failed \
    --first-n 5 \
    -n 4

Script reference

See scripts/data/convert_droid.py for full usage:

python scripts/data/convert_droid.py --help