Benchmarking 3D Pose and Shape Estimation Beyond Algorithms

February 13, 2026 · View on GitHub

Benchmarking 3D Pose and Shape Estimation Beyond Algorithms

Hui En Pang Zhongang Cai Lei Yang Tianwei Zhang Ziwei Liu

S-Lab, Nanyang Technological University

NeurIPS 2022

This repository builds upon MMHuman3D, an open source PyTorch-based codebase for the use of 3D human parametric models in computer vision and computer graphics. MMHuman3D is a part of the OpenMMLab project. The main branch works with PyTorch 1.7+.

These features will be contributed to MMHuman3D at a later date.

Major Features added to MMHuman3D

We have added multiple major features on top of MMHuman3D.

Benchmarks on 31 datasets
Benchmarks on 11 dataset combinations
Benchmarks on 9 backbones and different initialisation
Benchmarks on 9 augmentation techniques
Provide trained models on optimal configurations for inference
Evaluation on 5 test sets
FLOPs calculation

Additional:

Train annotation files for 31 datasets will be provided in the future
Future works can easily obtain benchmarks on HMR for baseline comparison on their selected dataset mixes and partition using our provided pipeline and annotation files.

Experiments

Single-datasets

Supported datasets:

(click to collapse)

AGORA (CVPR'2021)
AI Challenger (ICME'2019)
COCO (ECCV'2014)
COCO-WholeBody (ECCV'2020)
EFT-COCO-Part (3DV'2021)
EFT-COCO (3DV'2021)
EFT-LSPET (3DV'2021)
EFT-OCHuman (3DV'2021)
EFT-PoseTrack (3DV'2021)
EFT-MPII (3DV'2021)
Human3.6M (TPAMI'2014)
InstaVariety (CVPR'2019)
LIP (CVPR'2017)
LSP (BMVC'2010)
LSP-Extended (CVPR'2011)
MPI-INF-3DHP (3DC'2017)
MPII (CVPR'2014)
MTP (CVPR'2021)
MuCo-3DHP (3DV'2018)
MuPoTs-3D (3DV'2018)
OCHuman (CVPR'2019)
3DOH50K (CVPR'2020)
Penn Action (ICCV'2012)
3D-People (ICCV'2019)
PoseTrack18 (CVPR'2018)
PROX (ICCV'2019)
3DPW (ECCV'2018)
SURREAL (CVPR'2017)
UP-3D (CVPR'2017)
VLOG (CVPR'2019)
CrowdPose (CVPR'2019)

Please refer to datasets.md for training configs and results.

Benchmarks on different dataset combinations

Mixed-datasets

(click to collapse)

Mix 1: H36M, MI, COCO
Mix 2: H36M, MI, EFT-COCO
Mix 3: H36M, MI, EFT-COCO, MPII
Mix 4: H36M, MuCo, EFT-COCO
Mix 5: H36M, MI, COCO, LSP, LSPET, MPII
Mix 6: EFT-[COCO, MPII, LSPET], SPIN-MI, H36M
Mix 7: EFT-[COCO, MPII, LSPET], MuCo, H36M, PROX
Mix 8: EFT-[COCO, PT, LSPET], MI, H36M
Mix 9: EFT-[COCO, PT, LSPET, OCH], MI, H36M
Mix 10: PROX, MuCo, EFT-[COCO, PT, LSPET, OCH], UP-3D, MTP, Crowdpose
Mix 11: EFT-[COCO, MPII, LSPET], MuCo, H36M

Please refer to mixed-datasets.md for training configs and results.

Backbones

(click to collapse)

Please refer to backbone.md for training configs and results.

Backbone-initialisation

We find that transfering knowledge from a pose estimation model gives more competitive performance.

Initialised backbones:

(click to collapse)

ResNet-50 ImageNet (default)
ResNet-50 MPII
ResNet-50 COCO
HRNet-W32 ImageNet
HRNet-W32 MPII
HRNet-W32 COCO
Twins-SVT ImageNet
Twins-SVT MPII
Twins-SVT COCO

Please refer to backbone.md for training configs and results.

Augmentations

New augmentations:

(click to collapse)

Coarse dropout
Grid dropout
Photometric distortion
Random crop
Hard erasing
Soft erasing
Self-mixing
Synthetic occlusion
Synthetic occlusion over keypoints

Please refer to augmentation.md for training configs and results.

Losses

We find that training with L1 loss gives more competitive performance. Please refer to mixed-datasets-l1.md for training configs and results.

Downloads

We provide trained models from the optimal configurations for download and inference. Please refer to combine.md for training configs and results.

Dataset	Backbone	3DPW (PA-MPJPE)	Download
H36M, MI, COCO, LSP, LSPET, MPII	ResNet-50	51.66	model
H36M, MI, COCO, LSP, LSPET, MPII	HRNet-W32	49.18	model
H36M, MI, COCO, LSP, LSPET, MPII	Twins-SVT	48.77	model
H36M, MI, COCO, LSP, LSPET, MPII	Twins-SVT	47.70	model
EFT-[COCO, LSPET, MPII], H36M, SPIN-MI	HRNet-W32	47.68	model
EFT-[COCO, LSPET, MPII], H36M, SPIN-MI	Twins-SVT	47.31	model
H36M, MI, EFT-COCO	HRNet-W32	48.08	model
H36M, MI, EFT-COCO	Twins-SVT	48.27	model
H36M, MuCo, EFT-COCO	Twins-SVT	47.92	model

Algorithms

We benchmarked our major findings on several algorithms and hope to add more in the future. Please refer to algorithms.md for training configs and logs.

(click to collapse)

SPIN
GraphCMR
PARE
Mesh Graphormer

Installation

General set-up instructions follow that of MMHuman3d. Please refer to install.md for installation.

Train

Training with a single / multiple GPUs

python tools/train.py ${CONFIG_FILE} ${WORK_DIR} --no-validate

Example: using 1 GPU to train HMR.

python tools/train.py ${CONFIG_FILE} ${WORK_DIR} --gpus 1 --no-validate

Training with Slurm

If you can run MMHuman3D on a cluster managed with slurm, you can use the script slurm_train.sh.

./tools/slurm_train.sh ${PARTITION} ${JOB_NAME} ${CONFIG_FILE} ${WORK_DIR} ${GPU_NUM} --no-validate

Common optional arguments include:

--resume-from ${CHECKPOINT_FILE}: Resume from a previous checkpoint file.
--no-validate: Whether not to evaluate the checkpoint during training.

Example: using 8 GPUs to train HMR on a slurm cluster.

./tools/slurm_train.sh my_partition my_job configs/hmr/resnet50_hmr_pw3d.py work_dirs/hmr 8 --no-validate

You can check slurm_train.sh for full arguments and environment variables.

Evaluation

There's five benchmarks for evaluation:

3DPW-test (P2)
H36m-test (P2)
EFT-COCO-val
EFT-LSPET-test
EFT-OCHuman-test

Evaluate with a single GPU / multiple GPUs

python tools/test.py ${CONFIG} --work-dir=${WORK_DIR} ${CHECKPOINT} --metrics=${METRICS}

Example:

python tools/test.py configs/hmr/resnet50_hmr_pw3d.py --work-dir=work_dirs/hmr work_dirs/hmr/latest.pth --metrics pa-mpjpe mpjpe

Evaluate with slurm

If you can run MMHuman3D on a cluster managed with slurm, you can use the script slurm_test.sh.

./tools/slurm_test.sh ${PARTITION} ${JOB_NAME} ${CONFIG} ${WORK_DIR} ${CHECKPOINT} --metrics ${METRICS}

Example:

./tools/slurm_test.sh my_partition test_hmr configs/hmr/resnet50_hmr_pw3d.py work_dirs/hmr work_dirs/hmr/latest.pth 8 --metrics pa-mpjpe mpjpe

FLOPs

tools/get_flops.py is a script adapted from flops-counter.pytorch and MMDetection to compute the FLOPs and params of a given model.

python tools/get_flops.py ${CONFIG_FILE} [--shape ${INPUT_SHAPE}]

You will get the results like this.

==============================
Input shape: (3, 1280, 800)
Flops: 239.32 GFLOPs
Params: 37.74 M
==============================

Note: This tool is still experimental and we do not guarantee that the number is absolutely correct. You may well use the result for simple comparisons, but double check it before you adopt it in technical reports or papers.

FLOPs are related to the input shape while parameters are not. The default input shape is (1, 3, 224, 224).
Some operators are not counted into FLOPs like GN and custom operators. Refer to mmcv.cnn.get_model_complexity_info() for details.

Citation

If you find our work useful for your research, please consider citing the paper:

@inproceedings{
  title={Benchmarking and Analyzing 3D Human Pose and Shape Estimation Beyond Algorithms},
  author={Pang, Hui En and Cai, Zhongang and Yang, Lei and Zhang, Tianwei and Liu, Ziwei},
  booktitle={NeurIPS},
  year={2022}
}

License

Distributed under the S-Lab License. See LICENSE for more information.

Acknowledgements

This study is supported by NTU NAP, MOE AcRF Tier 2 (T2EP20221-0033), and under the RIE2020 Industry Alignment Fund – Industry Collaboration Projects (IAF-ICP) Funding Initiative, as well as cash and in-kind contribution from the industry partner(s).

Explore More Motrix Projects

Motion Capture

[SMPL-X] [TPAMI'25] SMPLest-X: An extended version of SMPLer-X with stronger foundation models.
[SMPL-X] [NeurIPS'23] SMPLer-X: Scaling up EHPS towards a family of generalist foundation models.
[SMPL-X] [ECCV'24] WHAC: World-grounded human pose and camera estimation from monocular videos.
[SMPL-X] [CVPR'24] AiOS: An all-in-one-stage pipeline combining detection and 3D human reconstruction.
[SMPL-X] [NeurIPS'23] RoboSMPLX: A framework to enhance the robustness of whole-body pose and shape estimation.
[SMPL-X] [ICML'25] ADHMR: A framework to align diffusion-based human mesh recovery methods via direct preference optimization.
[SMPL-X] MKA: Full-body 3D mesh reconstruction from single- or multi-view RGB videos.
[SMPL] [ICCV'23] Zolly: 3D human mesh reconstruction from perspective-distorted images.
[SMPL] [IJCV'26] PointHPS: 3D HPS from point clouds captured in real-world settings.
[SMPL] [NeurIPS'22] HMR-Benchmarks: A comprehensive benchmark of HPS datasets, backbones, and training strategies.

Motion Generation

[SMPL-X] [ICLR'26] ViMoGen: A comprehensive framework that transfers knowledge from ViGen to MoGen across data, modeling, and evaluation.
[SMPL-X] [ECCV'24] LMM: Large Motion Model for Unified Multi-Modal Motion Generation.
[SMPL-X] [NeurIPS'23] FineMoGen: Fine-Grained Spatio-Temporal Motion Generation and Editing.
[SMPL] InfiniteDance: A large-scale 3D dance dataset and an MLLM-based music-to-dance model designed for robust in-the-wild generalization.
[SMPL] [NeurIPS'23] InsActor: Generating physics-based human motions from language and waypoint conditions via diffusion policies.
[SMPL] [ICCV'23] ReMoDiffuse: Retrieval-Augmented Motion Diffusion Model.
[SMPL] [TPAMI'24] MotionDiffuse: Text-Driven Human Motion Generation with Diffusion Model.

Motion Dataset

[SMPL] [ECCV'22] HuMMan: Toolbox for HuMMan, a large-scale multi-modal 4D human dataset.
[SMPLX] [T-PAMI'24] GTA-Human: Toolbox for GTA-Human, a large-scale 3D human dataset generated with the GTA-V game engine.

Benchmarking 3D Pose and Shape Estimation Beyond Algorithms

Benchmarking 3D Pose and Shape Estimation Beyond Algorithms

[arXiv] • [Slides]

Getting started

Installation | Train | Evaluation | FLOPs |

Experiments

Single-datasets | Mixed-datasets | Augmentations | Backbones | Losses | Backbone-initialisation | Algorithms | Downloads |

Introduction

Major Features added to MMHuman3D

Experiments

Single-datasets

Mixed-datasets

Backbones

Backbone-initialisation

Augmentations

Losses

Downloads

Algorithms

Installation

Train

Training with a single / multiple GPUs

Training with Slurm

Evaluation

Evaluate with a single GPU / multiple GPUs

Evaluate with slurm

FLOPs

Citation

License

Acknowledgements

Explore More Motrix Projects

Motion Capture

Motion Generation

Motion Dataset