README.md

August 16, 2024 Ā· View on GitHub

X-Pose: Detecting Any Keypoints

PWC PWC PWC PWC PWC PWC PWC PWC

Online demo:Open in OpenXLab Quick Checkpoint download:Open in OpenXLab

Project Page | Paper | UniKPT Dataset |Video

Jie Yang1,2, Ailing Zeng1, Ruimao Zhang2, Lei Zhang1

1International Digital Economy Academy 2The Chinese University of Hong Kong, Shenzhen

🤩 News

  • 2024.07.12: X-Pose supports controllable animal face animation. See details here.

  • 2024.07.02: X-Pose is accepted to ECCV24 (We changed the model name from UniPose to X-Pose to avoid confusion with similarly named previous works).

  • 2024.02.14: We update a file to highlight all classes (1237 classes) in the UNIKPT dataset.

  • 2023.11.28: We are excited to highlight the 68 face keypoints detection ability of X-Pose across any categories in this figure. The definition of face keypoints follows this dataset.

  • 2023.11.9: Thanks to OpenXLab, you can try a quick online demo. Looking forward to the feedback!

  • 2023.11.1: We release the inference code, demo, checkpoints, and the annotation of the UniKPT dataset.

  • 2023.10.13: We release the arxiv version.

In-the-wild Test via X-Pose

X-Pose has strong fine-grained localization and generalization abilities across image styles, categories, and poses.


Detecting any Face Keypoints:


šŸ—’ TODO

  • Release inference code and demo.
  • Release checkpoints.
  • Release UniKPT annotations.
  • Release training codes.

šŸ’” Overview

• X-Pose is the first end-to-end prompt-based keypoint detection framework.


• It supports multi-modality prompts, including textual and visual prompts to detect arbitrary keypoints (e.g., from articulated, rigid, and soft objects).

Visual Prompts as Inputs:


Textual Prompts as Inputs:


šŸ”Ø Environment Setup

  1. Clone this repo
git clone https://github.com/IDEA-Rensearch/X-Pose.git
cd X-Pose
  1. Install the needed packages
pip install -r requirements.txt
  1. Compiling CUDA operators
cd models/UniPose/ops
python setup.py build install
# unit test (should see all checking is True)
python test.py
cd ../../..

ā–¶ Demo

1. Guidelines

• We have released the textual prompt-based branch for inference. As the visual prompt involves a substantial amount of user input, we are currently exploring more user-friendly platforms to support this functionality.

• Since X-Pose has learned strong structural prior, it's best to use the predefined skeleton as the keypoint textual prompts, which are shown in predefined_keypoints.py.

• If users don't provide a keypoint prompt, we'll try to match the appropriate skeleton based on the user's instance category. If unsuccessful, we'll default to using the animal's skeleton, which covers a wider range of categories and testing requirements.

2. Run

Replace {GPU ID}, image_you_want_to_test.jpg, and "dir you want to save the output" with appropriate values in the following command

CUDA_VISIBLE_DEVICES={GPU ID} python inference_on_a_image.py \
-c config/UniPose_SwinT.py \
-p weights/unipose_swint.pth \
-i image_you_want_to_test.jpg \
-o "dir you want to save the output" \
-t "instance categories" \ (e.g., "person", "face", "left hand", "horse", "car", "skirt", "table")
-k "keypoint_skeleton_text" (If necessary, please select an option from the 'predefined_keypoints.py' file.)

We also support the inference using gradio.

python app.py

Checkpoints

name backbone Keypoint AP on COCO Checkpoint Config
1 X-Pose Swin-T 74.4 Google Drive / OpenXLab GitHub Link
2 X-Pose Swin-L 76.8 Coming Soon Coming Soon

The UniKPT Dataset


DatasetsKPTClassImagesInstancesUnify ImagesUnify Instance
COCO17158,945156,16558,945156,165
300W-Face6813,8374,4373,8374,437
OneHand10K21111,70311,2892,0002000
Human-Art17150,000123,13150,000123,131
AP-10K175410,01513,02810,01513,028
APT-36K173036,00053,00636,00053,006
MacaquePose17113,08316,3932,0002,320
Animal Kingdom2385033,09933,09933,09933,099
AnimalWeb933222,45121,92122,45121,921
Vinegar Fly3111,5001,5001,5001,500
Desert Locust341700700700700
Keypoint-555/3158,6498,6492,0002,000
MP-100561/29310016,94318,00016,94318,000
UniKPT3381237--226,547418,487

• UniKPT is a unified dataset from 13 existing datasets, which is only for non-commercial research purposes.

• All images included in the UniKPT dataset originate from the datasets listed in the table above. To access these images, please download them from the original repository.

• We provide the annotations with precise keypoints' textual descriptions for effective training. More conveniently, you can find the text annotations in the link.

Citing X-Pose

If you find this repository useful for your work, please consider citing it as follows:

@article{xpose,
  title={X-Pose: Detection Any Keypoints},
  author={Yang, Jie and Zeng, Ailing and Zhang, Ruimao and Zhang, Lei},
  journal={ECCV},
  year={2024}
}
@inproceedings{yang2023neural,
  title={Neural Interactive Keypoint Detection},
  author={Yang, Jie and Zeng, Ailing and Li, Feng and Liu, Shilong and Zhang, Ruimao and Zhang, Lei},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
  pages={15122--15132},
  year={2023}
}
@inproceedings{yang2022explicit,
  title={Explicit Box Detection Unifies End-to-End Multi-Person Pose Estimation},
  author={Yang, Jie and Zeng, Ailing and Liu, Shilong and Li, Feng and Zhang, Ruimao and Zhang, Lei},
  booktitle={The Eleventh International Conference on Learning Representations},
  year={2022}
}