Forward pass on a single image (RGB; ImageNet normalization recommended)

May 24, 2026 · View on GitHub

Sapiens2

Scale. Semantics. Fidelity.

Rawal Khirodkar · He Wen · Julieta Martinez · Yuan Dong · Su Zhaoen · Shunsuke Saito

ICLR 2026

Project Page Paper PDF HuggingFace Hub

01 03

02 04

A family of high-resolution transformers pretrained on 1 billion human images, achieving state-of-the-art performance across diverse human-centric tasks — pose estimation, body-part segmentation, surface normals, pointmaps, and human matting.

🤗 Demos: Pose · Seg · Normal · Pointmap · Matting

📣 News

  • May 15, 2026: Sapiens2-1B human matting model is released.
  • April 24, 2026: Initial Sapiens2 release — pose, body-part segmentation, surface normals, and pointmaps.

⚡ Quick Start

Run a pretrained backbone forward pass — only torch and safetensors needed:

import os
import torch
from safetensors.torch import load_file
from sapiens.backbones.standalone.sapiens2 import Sapiens2

# Build the model and load a pretrained checkpoint
model = Sapiens2(arch="sapiens2_1b", img_size=(1024, 768), patch_size=16).eval().cuda()  # img_size is (H, W)
ckpt = os.path.expanduser("~/sapiens2_host/pretrain/sapiens2_1b_pretrain.safetensors")
model.load_state_dict(load_file(ckpt))

# Forward pass on a single image (RGB; ImageNet normalization recommended)
x = torch.randn(1, 3, 1024, 768).cuda()
with torch.no_grad():
    features = model(x)[0]  # dense backbone features

🪶 Zero-Dependency Usage

The Quick Start snippet above imports from a single self-contained file — torch (plus safetensors for checkpoint loading) is all you need. Drop the file into your project and you're done:

curl -O https://raw.githubusercontent.com/facebookresearch/sapiens2/main/sapiens/backbones/standalone/sapiens2.py

For Sapiens v1, grab sapiens.py instead.

🧬 Model Card

ModelParamsFLOPsEmbed dimLayersHeads
Sapiens2-0.1B0.114 B0.342 T7681212
Sapiens2-0.4B0.398 B1.260 T10242416
Sapiens2-0.8B0.818 B2.592 T12803216
Sapiens2-1B1.462 B4.715 T15364024
Sapiens2-1B (4K)1.607 B15364024
Sapiens2-5B5.071 B15.722 T24325632

All models use patch size 16 and are trained at 1024×768 (H×W) resolution, except Sapiens2-1B (4K) which is trained at 4096×3072 with use_tokenizer=True.

📦 Getting Started

Clone the repository:

git clone https://github.com/facebookresearch/sapiens2.git
cd sapiens2
export SAPIENS_ROOT=$(pwd)

Install (requires Python ≥3.12 and PyTorch ≥2.7):

pip install -e .

Download checkpoints from MODEL_ZOO.md. Place downloaded files under $SAPIENS_CHECKPOINT_ROOT (default: ~/sapiens2_host):

sapiens2_host/
├── pretrain/
│   ├── sapiens2_{0.1b,0.4b,0.8b,1b,5b}_pretrain.safetensors
│   └── sapiens2_1b_4k_pretrain.safetensors
├── pose/
│   └── sapiens2_{0.4b,0.8b,1b,5b}_pose.safetensors
├── seg/
│   └── sapiens2_{0.4b,0.8b,1b,5b}_seg.safetensors
├── normal/
│   └── sapiens2_{0.4b,0.8b,1b,5b}_normal.safetensors
├── pointmap/
│   └── sapiens2_{0.4b,0.8b,1b,5b}_pointmap.safetensors
├── matting/
│   └── sapiens2_1b_matting.safetensors
└── detector/                  # [optional] only needed for pose inference
    └── detr-resnet-101-dc5/

🎯 Vision Tasks

TaskDescriptionInferenceTrain
Pose Estimation308 whole-body keypointsdocs/POSE.mddocs/train/POSE.md
Body-Part Segmentation29 body partsdocs/SEG.mddocs/train/SEG.md
Surface Normal Estimationper-pixel normalsdocs/NORMAL.mddocs/train/NORMAL.md
Pointmap Estimationper-pixel 3D pointsdocs/POINTMAP.mddocs/train/POINTMAP.md
Human Mattingalpha matte + foregrounddocs/MATTING.mddocs/train/MATTING.md

✨ Acknowledgements

We would like to acknowledge the contributions of DINOv3, OpenMMLab, and Accelerate, which this project benefits from.

🤝 Contributing

For questions or issues, please open an issue on GitHub. See CONTRIBUTING and the Code of Conduct.

License

This project is licensed under the Sapiens2 License.

📚 Citation

If you use Sapiens2 in your research, please consider citing us.

@article{khirodkarsapiens2,
  title={Sapiens2},
  author={Khirodkar, Rawal and Wen, He and Martinez, Julieta and Dong, Yuan and Su, Zhaoen and Saito, Shunsuke},
  journal={arXiv preprint arXiv:2604.21681},
  year={2026}
}