HAP

March 25, 2024 ยท View on GitHub

๐Ÿ“š Contents

๐Ÿ“‹ Introduction

This repository contains the implementation code for paper:

HAP: Structure-Aware Masked Image Modeling for Human-Centric Perception

Advances in Neural Information Processing Systems (NeurIPS) 2023

[arXiv] โ€‚ [project page]

HAP is the first masked image modeling framework for human-centric pre-training. It leverages body structure-aware training to learn general human visual representations. It achieves SOTA performance across several human-related benchmarks.

๐Ÿ“‚ Datasets

Pre-Training Data

We use LUPerson for pre-training. To make the pre-training more efficient, we only use half of the dataset with a list named "CFS_list.pkl" from TransReID-SSL. To extract the keypoint information of data, which is the masking guidance during pre-training, ViTPose is used to perform inference on LUPerson. You can download our pose dataset here.

Put the dataset directories outside the HAP project:

root
โ”œโ”€โ”€ HAP
โ”œโ”€โ”€ LUPerson-data  # LUPerson data
โ”‚   โ”œโ”€โ”€ xxx.jpg
โ”‚   โ””โ”€โ”€ ...
โ””โ”€โ”€ LUPerson-pose  # LUPerson with pose keypoints
    โ”œโ”€โ”€ xxx.npy
    โ””โ”€โ”€ ...

๐Ÿ› ๏ธ Environment

Conda is recommended for configuring the environment:

conda env create -f env-hap.yaml && conda activate env_hap

๐Ÿš€ Get Started

The default setting for pre-training is 400 epochs with total batch-size of 4096.

It may need 32 GPUs with memory larger than 32GB, such as NVIDIA V100, for pre-training.

# -------------------- Pre-Training HAP on LUPerson --------------------
cd HAP/

MODEL=pose_mae_vit_base_patch16

# Download official MAE model pre-trained on ImageNet and move it here
CKPT=mae_pretrain_vit_base.pth

# Download cfs list and move it here
CFS_PATH=cfs_list.pkl

OMP_NUM_THREADS=1 python -m torch.distributed.launch \
    --nnodes=${NNODES} \
    --node_rank=${RANK} \
    --master_addr=${ADDRESS} \
    --master_port=${PRETRAIN_PORT} \
    --nproc_per_node=${NPROC_PER_NODE} \
    main_pretrain.py \
    --dataset LUPersonPose \
    --data_path ../LUPerson-data \
    --pose_path ../LUPerson-pose \
    --sample_split_source ${CFS_PATH} \
    --batch_size 256 \
    --model ${MODEL} \
    --resume ${CKPT} \
    --ckpt_pos_embed 14 14 \
    --mask_ratio 0.5 \
    --align 0.05 \
    --epochs 400 \
    --blr 1.5e-4 \
    --ckpt_overwrite \
    --seed 0 \
    --tag default

๐Ÿ† Results

We evaluate HAP for the following downstream tasks. Click them to find implementation instructions.

You can download the checkpoint of the pre-trained HAP model here. The results are given below.

taskdatasetresolutionstructureresult
Person ReIDMSMT17(256, 128)ViT76.4 (mAP)
Person ReIDMSMT17(384, 128)ViT76.8 (mAP)
Person ReIDMSMT17(256, 128)ViT-lem78.0 (mAP)
Person ReIDMSMT17(384, 128)ViT-lem78.1 (mAP)
Person ReIDMarket-1501(256, 128)ViT91.7 (mAP)
Person ReIDMarket-1501(384, 128)ViT91.9 (mAP)
Person ReIDMarket-1501(256, 128)ViT-lem93.8 (mAP)
Person ReIDMarket-1501(384, 128)ViT-lem93.9 (mAP)
taskdatasetresolutiontrainingresult
2D Pose EstimationMPII(256, 192)single-dataset91.8 (PCKh)
2D Pose EstimationMPII(384, 288)single-dataset92.6 (PCKh)
2D Pose EstimationMPII(256, 192)multi-dataset93.4 (PCKh)
2D Pose EstimationMPII(384, 288)multi-dataset93.6 (PCKh)
2D Pose EstimationCOCO(256, 192)single-dataset75.9 (AP)
2D Pose EstimationCOCO(384, 288)single-dataset77.2 (AP)
2D Pose EstimationCOCO(256, 192)multi-dataset77.0 (AP)
2D Pose EstimationCOCO(384, 288)multi-dataset78.2 (AP)
2D Pose EstimationAIC(256, 192)single-dataset31.5 (AP)
2D Pose EstimationAIC(384, 288)single-dataset37.7 (AP)
2D Pose EstimationAIC(256, 192)multi-dataset32.2 (AP)
2D Pose EstimationAIC(384, 288)multi-dataset38.1 (AP)
taskdatasetresult
Pedestrian Attribute RecognitionPA-100K86.54 (mA)
Pedestrian Attribute RecognitionRAP82.91 (mA)
Pedestrian Attribute RecognitionPETA88.36 (mA)
taskdatasetresult
Text-to-Image Person ReIDCUHK-PEDES68.05 (Rank-1)
Text-to-Image Person ReIDICFG-PEDES61.80 (Rank-1)
Text-to-Image Person ReIDRSTPReid49.35 (Rank-1)
taskdatasetresult
3D Pose Estimation3DPW90.1 (MPJPE), 56.0 (PA-MPJPE), 106.3 (MPVPE)

๐Ÿ’— Acknowledgement

We acknowledge the following open source projects.

โœ… Citation

@article{yuan2023hap,
  title={HAP: Structure-Aware Masked Image Modeling for Human-Centric Perception},
  author={Yuan, Junkun and Zhang, Xinyu and Zhou, Hao and Wang, Jian and Qiu, Zhongwei and Shao, Zhiyin and Zhang, Shaofeng and Long, Sifan and Kuang, Kun and Yao, Kun and others},
  journal={Advances in Neural Information Processing Systems},
  volume={36},
  year={2023}
}

๐Ÿค Contribute & Contact

Feel free to star and contribute to our repository.

If you have any questions or advice, contact us through GitHub issues or email (yuanjk0921@outlook.com).