Forward pass on a single image (RGB; ImageNet normalization recommended)
May 24, 2026 · View on GitHub
Scale. Semantics. Fidelity.
Rawal Khirodkar · He Wen · Julieta Martinez · Yuan Dong · Su Zhaoen · Shunsuke Saito
ICLR 2026
A family of high-resolution transformers pretrained on 1 billion human images, achieving state-of-the-art performance across diverse human-centric tasks — pose estimation, body-part segmentation, surface normals, pointmaps, and human matting.
🤗 Demos: Pose · Seg · Normal · Pointmap · Matting
📣 News
- May 15, 2026: Sapiens2-1B human matting model is released.
- April 24, 2026: Initial Sapiens2 release — pose, body-part segmentation, surface normals, and pointmaps.
⚡ Quick Start
Run a pretrained backbone forward pass — only torch and safetensors needed:
import os
import torch
from safetensors.torch import load_file
from sapiens.backbones.standalone.sapiens2 import Sapiens2
# Build the model and load a pretrained checkpoint
model = Sapiens2(arch="sapiens2_1b", img_size=(1024, 768), patch_size=16).eval().cuda() # img_size is (H, W)
ckpt = os.path.expanduser("~/sapiens2_host/pretrain/sapiens2_1b_pretrain.safetensors")
model.load_state_dict(load_file(ckpt))
# Forward pass on a single image (RGB; ImageNet normalization recommended)
x = torch.randn(1, 3, 1024, 768).cuda()
with torch.no_grad():
features = model(x)[0] # dense backbone features
🪶 Zero-Dependency Usage
The Quick Start snippet above imports from a single self-contained file — torch (plus safetensors for checkpoint loading) is all you need. Drop the file into your project and you're done:
curl -O https://raw.githubusercontent.com/facebookresearch/sapiens2/main/sapiens/backbones/standalone/sapiens2.py
For Sapiens v1, grab sapiens.py instead.
🧬 Model Card
| Model | Params | FLOPs | Embed dim | Layers | Heads |
|---|---|---|---|---|---|
| Sapiens2-0.1B | 0.114 B | 0.342 T | 768 | 12 | 12 |
| Sapiens2-0.4B | 0.398 B | 1.260 T | 1024 | 24 | 16 |
| Sapiens2-0.8B | 0.818 B | 2.592 T | 1280 | 32 | 16 |
| Sapiens2-1B | 1.462 B | 4.715 T | 1536 | 40 | 24 |
| Sapiens2-1B (4K) | 1.607 B | — | 1536 | 40 | 24 |
| Sapiens2-5B | 5.071 B | 15.722 T | 2432 | 56 | 32 |
All models use patch size 16 and are trained at 1024×768 (H×W) resolution, except Sapiens2-1B (4K) which is trained at 4096×3072 with use_tokenizer=True.
📦 Getting Started
Clone the repository:
git clone https://github.com/facebookresearch/sapiens2.git
cd sapiens2
export SAPIENS_ROOT=$(pwd)
Install (requires Python ≥3.12 and PyTorch ≥2.7):
pip install -e .
Download checkpoints from MODEL_ZOO.md. Place downloaded files under $SAPIENS_CHECKPOINT_ROOT (default: ~/sapiens2_host):
sapiens2_host/
├── pretrain/
│ ├── sapiens2_{0.1b,0.4b,0.8b,1b,5b}_pretrain.safetensors
│ └── sapiens2_1b_4k_pretrain.safetensors
├── pose/
│ └── sapiens2_{0.4b,0.8b,1b,5b}_pose.safetensors
├── seg/
│ └── sapiens2_{0.4b,0.8b,1b,5b}_seg.safetensors
├── normal/
│ └── sapiens2_{0.4b,0.8b,1b,5b}_normal.safetensors
├── pointmap/
│ └── sapiens2_{0.4b,0.8b,1b,5b}_pointmap.safetensors
├── matting/
│ └── sapiens2_1b_matting.safetensors
└── detector/ # [optional] only needed for pose inference
└── detr-resnet-101-dc5/
🎯 Vision Tasks
| Task | Description | Inference | Train |
|---|---|---|---|
| Pose Estimation | 308 whole-body keypoints | docs/POSE.md | docs/train/POSE.md |
| Body-Part Segmentation | 29 body parts | docs/SEG.md | docs/train/SEG.md |
| Surface Normal Estimation | per-pixel normals | docs/NORMAL.md | docs/train/NORMAL.md |
| Pointmap Estimation | per-pixel 3D points | docs/POINTMAP.md | docs/train/POINTMAP.md |
| Human Matting | alpha matte + foreground | docs/MATTING.md | docs/train/MATTING.md |
✨ Acknowledgements
We would like to acknowledge the contributions of DINOv3, OpenMMLab, and Accelerate, which this project benefits from.
🤝 Contributing
For questions or issues, please open an issue on GitHub. See CONTRIBUTING and the Code of Conduct.
License
This project is licensed under the Sapiens2 License.
📚 Citation
If you use Sapiens2 in your research, please consider citing us.
@article{khirodkarsapiens2,
title={Sapiens2},
author={Khirodkar, Rawal and Wen, He and Martinez, Julieta and Dong, Yuan and Su, Zhaoen and Saito, Shunsuke},
journal={arXiv preprint arXiv:2604.21681},
year={2026}
}