๐น Demo
July 24, 2025 ยท View on GitHub
SkillMimic-V2: Learning Robust and Generalizable Interaction Skills from Sparse and Noisy Demonstrations
Runyi Yu*, Yinhuai Wang*, Qihan Zhao*, Hok Wai Tsui, Jingbo Wang, Ping Tan and Qifeng Chen
SIGGRAPH 2025

Our framework enables physically simulated robots to learn robust and generalizable interaction skills from sparse demonstrations: (top left) Learning sustained and robust dribbling from a single, brief demonstration; (top right) acquiring robust skill transitions from fragment skill demonstrations; (bottom left) generalizing book grasping to varied poses from one demonstration; and (bottom right) learning to reorientate a cube from a single grasp pose.
๐น Demo
TODOs
-
Release models on household manipulation task.
-
Release models on locomotion task.
-
Release models on ballplay task.
-
Release training and evaluation code.
Installation ๐ฝ
Please following the Installation instructions of SkillMimic. After that you should run the following commands:
pip install six
pip install pytorch-lightning==1.9.0
Pre-Trained Models
Pre-trained models are available at models/
HOI Dataset
Run following command to view our HOI dataset by adding --play_dataset.
python skillmimic/run.py \
--play_dataset \
--task SkillMimic2BallPlay \
--test \
--num_envs 1 \
--episode_length 1000 \
--state_init 2 \
--cfg_env skillmimic/data/cfg/skillmimic.yaml \
--cfg_train skillmimic/data/cfg/train/rlg/skillmimic.yaml \
--motion_file skillmimic/data/motions/BallPlay-Pick
- If your computer has more than 1 GPU, add
DRI_PRIME=1before the above command, otherwise you may encounter display issues. - You may specify
--motion_file - You can change
--state_init {frame_number}to initialize from a specific reference state (Default: random reference state initialization).
BallPlay Skill Policy โน๏ธโโ๏ธ
The skill policy can be trained from sctrach, without the need for designing case-by-case skill rewards. Our method allows a single policy to learn a large variety of basketball skills from a dataset that contains diverse skills.
Inference
Run the following command:
python skillmimic/run.py \
--test \
--task SkillMimic2BallPlay \
--num_envs 2 \
--cfg_env skillmimic/data/cfg/skillmimic_test.yaml \
--cfg_train skillmimic/data/cfg/train/rlg/skillmimic.yaml \
--history_embedding_size 3 \
--hist_length 60 \
--hist_ckpt hist_encoder/BallPlay/hist_model.ckpt \
--motion_file skillmimic/data/motions/BallPlay \
--state_init 2 \
--episode_length 600 \
--checkpoint models/BallPlay/SkillMimic-V2/model.pth
- If
--switch_motion_file {A}is added, the policy will switch from skillAto skillBspecified by--motion_file {B} - You may control the skill switching using your keyboard. By default, the key and skill correspondence are as follows:
โ: dribble left,โ: dribble forward,โ: dribble right,W: shoot,E: layup. - To save the images, add
--save_imagesto the command, and the images will be saved inskillmimic/data/images/{timestamp}. - To transform the images into a video, run the following command, and the video can be found in
skillmimic/data/videos.
python skillmimic/utils/make_video.py --image_path skillmimic/data/images/{timestamp} --fps 60
To evaluate the baseline model trained with SkillMimic1, run the following command:
python tasks/run.py --test --task SkillMimic1BallPlay \
--num_envs 2 \
--cfg_env skillmimic/data/cfg/skillmimic_test.yaml \
--cfg_train skillmimic/data/cfg/train/rlg/skillmimic.yaml \
--motion_file skillmimic/data/motions/BallPlay \
--state_init 2 \
--episode_length 600 \
--checkpoint models/BallPlay/SkillMimic/model.pth
Training
History Encoder
Before training the skill policy, you have to train the history encoder first.
python -m skillmimic.utils.state_prediction_ballplay --motion_dir skillmimic/data/motions/BallPlay
Offline Preprocess for Stitched Trajectory Graph
python skillmimic/run.py \
--test \
--task OfflineStateSearch \
--cfg_env skillmimic/data/cfg/skillmimic.yaml \
--cfg_train skillmimic/data/cfg/train/rlg/skillmimic.yaml \
--motion_file skillmimic/data/motions/BallPlay \
--graph_save_path skillmimic/data/preprocess/ballplay.pkl \
--headless
Policy
After completing the above two steps, you can run the following command to train a ballplay skill policy:
python skillmimic/run.py \
--task SkillMimic2BallPlay \
--episode_length 60 \
--cfg_env skillmimic/data/cfg/skillmimic.yaml \
--cfg_train skillmimic/data/cfg/train/rlg/skillmimic.yaml \
--motion_file skillmimic/data/motions/BallPlay \
--reweight --reweight_alpha 1.0 \
--state_init_random_prob 0.1 \
--state_switch_prob 0.1 \
--state_search_to_align_reward \
--graph_file skillmimic/data/preprocess/ballplay.pkl \
--enable_buffernode \
--hist_length 60 \
--history_embedding_size 3 \
--hist_ckpt hist_encoder/BallPlay/hist_model.ckpt \
--headless
- During the training, the latest checkpoint model.pth will be regularly saved to output/, along with a Tensorboard log.
--cfg_envspecifies the environment configurations, such as number of environments, dataFPS, ball properties, etc.--cfg_trainspecifies the training configurations, such as learning rate, maximum number of epochs, network settings, etc.--motion_filecan be changed to train on different data, e.g.,--motion_file tasks/data/motions/BallPlay-Pick.--state_init_random_probspecifies the state randomization probability.--state_switch_probspecifies the state transition probability.--graph_filespecifies offline constructed state transition graph (STG) file.--headlessis used to disable visualization.- It is strongly encouraged to use large
--num_envswhen training on a large dataset. Meanwhile,--minibatch_sizeis recommended to be set as8รnum_envs.
Household Manipulation Policy ๐คน
Locomotion Skill Policy ๐โโ๏ธ
The commands for Locomotion are similar to those for BallPlay, with no additional considerations. Here are the examples:
Inference
python skillmimic/run.py \
--test \
--task SkillMimic2BallPlay \
--num_envs 2 \
--cfg_env skillmimic/data/cfg/skillmimic_test.yaml \
--cfg_train skillmimic/data/cfg/train/rlg/skillmimic.yaml \
--history_embedding_size 3 \
--hist_length 60 \
--hist_ckpt hist_encoder/Locomotion/hist_model.ckpt \
--motion_file skillmimic/data/motions/Locomotion \
--state_init 2 \
--episode_length 600 \
--checkpoint models/Locomotion/model.pth
Training
History Encoder
python -m skillmimic.utils.state_prediction_ballplay --motion_dir skillmimic/data/motions/Locomotion
Offline Preprocess for Stitched Trajectory Graph
python skillmimic/run.py \
--test \
--task OfflineStateSearch \
--cfg_env skillmimic/data/cfg/skillmimic.yaml \
--cfg_train skillmimic/data/cfg/train/rlg/skillmimic.yaml \
--motion_file skillmimic/data/motions/Locomotion \
--graph_save_path skillmimic/data/preprocess/locomotion.pkl \
--headless
Policy
After completing the above two steps, you can run the following command to train a ballplay skill policy:
python skillmimic/run.py \
--task SkillMimic2BallPlay \
--episode_length 60 \
--cfg_env skillmimic/data/cfg/skillmimic.yaml \
--cfg_train skillmimic/data/cfg/train/rlg/skillmimic.yaml \
--motion_file skillmimic/data/motions/Locomotion \
--reweight --reweight_alpha 1.0 \
--state_init_random_prob 0.1 \
--state_switch_prob 0.1 \
--state_search_to_align_reward \
--graph_file skillmimic/data/preprocess/locomotion.pkl \
--enable_buffernode \
--hist_length 60 \
--history_embedding_size 3 \
--hist_ckpt hist_encoder/Locomotion/hist_model.ckpt \
--headless \
BallPlay Match Policy โน๏ธโโ๏ธ
Run the following code to control a player with keyboard inputs. โ: dribble left, โ: dribble forward, โ: dribble right, W: shoot, E: layup
You can dribble to a target position and shoot
DISABLE_METRICS=1 python skillmimic/run.py \
--test \
--task HRLVirtual \
--projtype Mouse \
--num_envs 6 \
--episode_length 600 \
--cfg_env skillmimic/data/cfg/skillmimic_test.yaml \
--cfg_train skillmimic/data/cfg/train/rlg/hrl_humanoid_virtual.yaml \
--motion_file skillmimic/data/motions/BallPlay/rrun \
--llc_checkpoint models/BallPlay/SkillMimic-V2/model.pth \
--checkpoint models/Matchup/model.pth \
--hist_length 60 \
--history_embedding_size 3 \
--hist_ckpt hist_encoder/BallPlay/hist_model.ckpt
Our works on HOI Imitation Learning ๐
PhysHOI:Physics-Based Imitation of Dynamic Human-Object Interaction
SkillMimic: Learning Basketball Interaction Skills from Demonstrations
References ๐
If you find this repository useful for your research, please cite the following work.
@misc{yu2025skillmimicv2,
title={SkillMimic-V2: Learning Robust and Generalizable Interaction Skills from Sparse and Noisy Demonstrations },
author={Runyi Yu, Yinhuai Wang, Qihan Zhao, Hok Wai Tsui, Jingbo Wang, Ping Tan and Qifeng Chen},
year={2025},
eprint={2505.02094},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
Acknowledgements ๐
The household manipulation dataset used in this work is derived from Parahome. If you utilize the processed motion data from our work in your research, please ensure proper citation of the original ParaHome publication.