Benchmarking 3D Pose and Shape Estimation Beyond Algorithms
February 13, 2026 · View on GitHub
Benchmarking 3D Pose and Shape Estimation Beyond Algorithms
[arXiv] • [Slides]
Getting started
Installation | Train | Evaluation | FLOPs |
Experiments
Single-datasets | Mixed-datasets | Augmentations | Backbones | Losses | Backbone-initialisation | Algorithms | Downloads |
Introduction
This repository builds upon MMHuman3D, an open source PyTorch-based codebase for the use of 3D human parametric models in computer vision and computer graphics. MMHuman3D is a part of the OpenMMLab project. The main branch works with PyTorch 1.7+.
These features will be contributed to MMHuman3D at a later date.
Major Features added to MMHuman3D
We have added multiple major features on top of MMHuman3D.
- Benchmarks on 31 datasets
- Benchmarks on 11 dataset combinations
- Benchmarks on 9 backbones and different initialisation
- Benchmarks on 9 augmentation techniques
- Provide trained models on optimal configurations for inference
- Evaluation on 5 test sets
- FLOPs calculation
Additional:
- Train annotation files for 31 datasets will be provided in the future
- Future works can easily obtain benchmarks on HMR for baseline comparison on their selected dataset mixes and partition using our provided pipeline and annotation files.
Experiments
Single-datasets
Supported datasets:
(click to collapse)
- AGORA (CVPR'2021)
- AI Challenger (ICME'2019)
- COCO (ECCV'2014)
- COCO-WholeBody (ECCV'2020)
- EFT-COCO-Part (3DV'2021)
- EFT-COCO (3DV'2021)
- EFT-LSPET (3DV'2021)
- EFT-OCHuman (3DV'2021)
- EFT-PoseTrack (3DV'2021)
- EFT-MPII (3DV'2021)
- Human3.6M (TPAMI'2014)
- InstaVariety (CVPR'2019)
- LIP (CVPR'2017)
- LSP (BMVC'2010)
- LSP-Extended (CVPR'2011)
- MPI-INF-3DHP (3DC'2017)
- MPII (CVPR'2014)
- MTP (CVPR'2021)
- MuCo-3DHP (3DV'2018)
- MuPoTs-3D (3DV'2018)
- OCHuman (CVPR'2019)
- 3DOH50K (CVPR'2020)
- Penn Action (ICCV'2012)
- 3D-People (ICCV'2019)
- PoseTrack18 (CVPR'2018)
- PROX (ICCV'2019)
- 3DPW (ECCV'2018)
- SURREAL (CVPR'2017)
- UP-3D (CVPR'2017)
- VLOG (CVPR'2019)
- CrowdPose (CVPR'2019)
Please refer to datasets.md for training configs and results.
- Benchmarks on different dataset combinations
Mixed-datasets
(click to collapse)
- Mix 1: H36M, MI, COCO
- Mix 2: H36M, MI, EFT-COCO
- Mix 3: H36M, MI, EFT-COCO, MPII
- Mix 4: H36M, MuCo, EFT-COCO
- Mix 5: H36M, MI, COCO, LSP, LSPET, MPII
- Mix 6: EFT-[COCO, MPII, LSPET], SPIN-MI, H36M
- Mix 7: EFT-[COCO, MPII, LSPET], MuCo, H36M, PROX
- Mix 8: EFT-[COCO, PT, LSPET], MI, H36M
- Mix 9: EFT-[COCO, PT, LSPET, OCH], MI, H36M
- Mix 10: PROX, MuCo, EFT-[COCO, PT, LSPET, OCH], UP-3D, MTP, Crowdpose
- Mix 11: EFT-[COCO, MPII, LSPET], MuCo, H36M
Please refer to mixed-datasets.md for training configs and results.
Backbones
(click to collapse)
- ResNet-50, -101, -152 (CVPR'2016)
- ResNeXt (CVPR'2017)
- HRNet (CVPR'2019)
- EfficientNet
- ViT
- Swin
- Twins
Please refer to backbone.md for training configs and results.
Backbone-initialisation
We find that transfering knowledge from a pose estimation model gives more competitive performance.
Initialised backbones:
(click to collapse)
- ResNet-50 ImageNet (default)
- ResNet-50 MPII
- ResNet-50 COCO
- HRNet-W32 ImageNet
- HRNet-W32 MPII
- HRNet-W32 COCO
- Twins-SVT ImageNet
- Twins-SVT MPII
- Twins-SVT COCO
Please refer to backbone.md for training configs and results.
Augmentations
New augmentations:
(click to collapse)
- Coarse dropout
- Grid dropout
- Photometric distortion
- Random crop
- Hard erasing
- Soft erasing
- Self-mixing
- Synthetic occlusion
- Synthetic occlusion over keypoints
Please refer to augmentation.md for training configs and results.
Losses
We find that training with L1 loss gives more competitive performance. Please refer to mixed-datasets-l1.md for training configs and results.
Downloads
We provide trained models from the optimal configurations for download and inference. Please refer to combine.md for training configs and results.
| Dataset | Backbone | 3DPW (PA-MPJPE) | Download |
|---|---|---|---|
| H36M, MI, COCO, LSP, LSPET, MPII | ResNet-50 | 51.66 | model |
| H36M, MI, COCO, LSP, LSPET, MPII | HRNet-W32 | 49.18 | model |
| H36M, MI, COCO, LSP, LSPET, MPII | Twins-SVT | 48.77 | model |
| H36M, MI, COCO, LSP, LSPET, MPII | Twins-SVT | 47.70 | model |
| EFT-[COCO, LSPET, MPII], H36M, SPIN-MI | HRNet-W32 | 47.68 | model |
| EFT-[COCO, LSPET, MPII], H36M, SPIN-MI | Twins-SVT | 47.31 | model |
| H36M, MI, EFT-COCO | HRNet-W32 | 48.08 | model |
| H36M, MI, EFT-COCO | Twins-SVT | 48.27 | model |
| H36M, MuCo, EFT-COCO | Twins-SVT | 47.92 | model |
Algorithms
We benchmarked our major findings on several algorithms and hope to add more in the future. Please refer to algorithms.md for training configs and logs.
(click to collapse)
- SPIN
- GraphCMR
- PARE
- Mesh Graphormer
Installation
General set-up instructions follow that of MMHuman3d. Please refer to install.md for installation.
Train
Training with a single / multiple GPUs
python tools/train.py ${CONFIG_FILE} ${WORK_DIR} --no-validate
Example: using 1 GPU to train HMR.
python tools/train.py ${CONFIG_FILE} ${WORK_DIR} --gpus 1 --no-validate
Training with Slurm
If you can run MMHuman3D on a cluster managed with slurm, you can use the script slurm_train.sh.
./tools/slurm_train.sh ${PARTITION} ${JOB_NAME} ${CONFIG_FILE} ${WORK_DIR} ${GPU_NUM} --no-validate
Common optional arguments include:
--resume-from ${CHECKPOINT_FILE}: Resume from a previous checkpoint file.--no-validate: Whether not to evaluate the checkpoint during training.
Example: using 8 GPUs to train HMR on a slurm cluster.
./tools/slurm_train.sh my_partition my_job configs/hmr/resnet50_hmr_pw3d.py work_dirs/hmr 8 --no-validate
You can check slurm_train.sh for full arguments and environment variables.
Evaluation
There's five benchmarks for evaluation:
- 3DPW-test (P2)
- H36m-test (P2)
- EFT-COCO-val
- EFT-LSPET-test
- EFT-OCHuman-test
Evaluate with a single GPU / multiple GPUs
python tools/test.py ${CONFIG} --work-dir=${WORK_DIR} ${CHECKPOINT} --metrics=${METRICS}
Example:
python tools/test.py configs/hmr/resnet50_hmr_pw3d.py --work-dir=work_dirs/hmr work_dirs/hmr/latest.pth --metrics pa-mpjpe mpjpe
Evaluate with slurm
If you can run MMHuman3D on a cluster managed with slurm, you can use the script slurm_test.sh.
./tools/slurm_test.sh ${PARTITION} ${JOB_NAME} ${CONFIG} ${WORK_DIR} ${CHECKPOINT} --metrics ${METRICS}
Example:
./tools/slurm_test.sh my_partition test_hmr configs/hmr/resnet50_hmr_pw3d.py work_dirs/hmr work_dirs/hmr/latest.pth 8 --metrics pa-mpjpe mpjpe
FLOPs
tools/get_flops.py is a script adapted from flops-counter.pytorch and MMDetection to compute the FLOPs and params of a given model.
python tools/get_flops.py ${CONFIG_FILE} [--shape ${INPUT_SHAPE}]
You will get the results like this.
==============================
Input shape: (3, 1280, 800)
Flops: 239.32 GFLOPs
Params: 37.74 M
==============================
Note: This tool is still experimental and we do not guarantee that the number is absolutely correct. You may well use the result for simple comparisons, but double check it before you adopt it in technical reports or papers.
- FLOPs are related to the input shape while parameters are not. The default input shape is (1, 3, 224, 224).
- Some operators are not counted into FLOPs like GN and custom operators. Refer to
mmcv.cnn.get_model_complexity_info()for details.
Citation
If you find our work useful for your research, please consider citing the paper:
@inproceedings{
title={Benchmarking and Analyzing 3D Human Pose and Shape Estimation Beyond Algorithms},
author={Pang, Hui En and Cai, Zhongang and Yang, Lei and Zhang, Tianwei and Liu, Ziwei},
booktitle={NeurIPS},
year={2022}
}
License
Distributed under the S-Lab License. See LICENSE for more information.
Acknowledgements
This study is supported by NTU NAP, MOE AcRF Tier 2 (T2EP20221-0033), and under the RIE2020 Industry Alignment Fund – Industry Collaboration Projects (IAF-ICP) Funding Initiative, as well as cash and in-kind contribution from the industry partner(s).
Explore More Motrix Projects
Motion Capture
- [SMPL-X] [TPAMI'25] SMPLest-X: An extended version of SMPLer-X with stronger foundation models.
- [SMPL-X] [NeurIPS'23] SMPLer-X: Scaling up EHPS towards a family of generalist foundation models.
- [SMPL-X] [ECCV'24] WHAC: World-grounded human pose and camera estimation from monocular videos.
- [SMPL-X] [CVPR'24] AiOS: An all-in-one-stage pipeline combining detection and 3D human reconstruction.
- [SMPL-X] [NeurIPS'23] RoboSMPLX: A framework to enhance the robustness of whole-body pose and shape estimation.
- [SMPL-X] [ICML'25] ADHMR: A framework to align diffusion-based human mesh recovery methods via direct preference optimization.
- [SMPL-X] MKA: Full-body 3D mesh reconstruction from single- or multi-view RGB videos.
- [SMPL] [ICCV'23] Zolly: 3D human mesh reconstruction from perspective-distorted images.
- [SMPL] [IJCV'26] PointHPS: 3D HPS from point clouds captured in real-world settings.
- [SMPL] [NeurIPS'22] HMR-Benchmarks: A comprehensive benchmark of HPS datasets, backbones, and training strategies.
Motion Generation
- [SMPL-X] [ICLR'26] ViMoGen: A comprehensive framework that transfers knowledge from ViGen to MoGen across data, modeling, and evaluation.
- [SMPL-X] [ECCV'24] LMM: Large Motion Model for Unified Multi-Modal Motion Generation.
- [SMPL-X] [NeurIPS'23] FineMoGen: Fine-Grained Spatio-Temporal Motion Generation and Editing.
- [SMPL] InfiniteDance: A large-scale 3D dance dataset and an MLLM-based music-to-dance model designed for robust in-the-wild generalization.
- [SMPL] [NeurIPS'23] InsActor: Generating physics-based human motions from language and waypoint conditions via diffusion policies.
- [SMPL] [ICCV'23] ReMoDiffuse: Retrieval-Augmented Motion Diffusion Model.
- [SMPL] [TPAMI'24] MotionDiffuse: Text-Driven Human Motion Generation with Diffusion Model.