README.md
August 16, 2024 Ā· View on GitHub
X-Pose: Detecting Any Keypoints
Online demo:
Quick Checkpoint download:
Project Page | Paper | UniKPT Dataset |Video
Jie Yang1,2, Ailing Zeng1, Ruimao Zhang2, Lei Zhang1
1International Digital Economy Academy 2The Chinese University of Hong Kong, Shenzhen
𤩠News
-
2024.07.12: X-Pose supports controllable animal face animation. See details here.

-
2024.07.02: X-Pose is accepted to ECCV24 (We changed the model name from UniPose to X-Pose to avoid confusion with similarly named previous works).
-
2024.02.14: We update a file to highlight all classes (1237 classes) in the UNIKPT dataset.
-
2023.11.28: We are excited to highlight the 68 face keypoints detection ability of X-Pose across any categories in this figure. The definition of face keypoints follows this dataset.
-
2023.11.9: Thanks to OpenXLab, you can try a quick online demo. Looking forward to the feedback!
-
2023.11.1: We release the inference code, demo, checkpoints, and the annotation of the UniKPT dataset.
-
2023.10.13: We release the arxiv version.
In-the-wild Test via X-Pose
X-Pose has strong fine-grained localization and generalization abilities across image styles, categories, and poses.
Detecting any Face Keypoints:
š TODO
- Release inference code and demo.
- Release checkpoints.
- Release UniKPT annotations.
- Release training codes.
š” Overview
⢠X-Pose is the first end-to-end prompt-based keypoint detection framework.
⢠It supports multi-modality prompts, including textual and visual prompts to detect arbitrary keypoints (e.g., from articulated, rigid, and soft objects).
Visual Prompts as Inputs:
Textual Prompts as Inputs:
šØ Environment Setup
- Clone this repo
git clone https://github.com/IDEA-Rensearch/X-Pose.git
cd X-Pose
- Install the needed packages
pip install -r requirements.txt
- Compiling CUDA operators
cd models/UniPose/ops
python setup.py build install
# unit test (should see all checking is True)
python test.py
cd ../../..
ā¶ Demo
1. Guidelines
⢠We have released the textual prompt-based branch for inference. As the visual prompt involves a substantial amount of user input, we are currently exploring more user-friendly platforms to support this functionality.
⢠Since X-Pose has learned strong structural prior, it's best to use the predefined skeleton as the keypoint textual prompts, which are shown in predefined_keypoints.py.
⢠If users don't provide a keypoint prompt, we'll try to match the appropriate skeleton based on the user's instance category. If unsuccessful, we'll default to using the animal's skeleton, which covers a wider range of categories and testing requirements.
2. Run
Replace {GPU ID}, image_you_want_to_test.jpg, and "dir you want to save the output" with appropriate values in the following command
CUDA_VISIBLE_DEVICES={GPU ID} python inference_on_a_image.py \
-c config/UniPose_SwinT.py \
-p weights/unipose_swint.pth \
-i image_you_want_to_test.jpg \
-o "dir you want to save the output" \
-t "instance categories" \ (e.g., "person", "face", "left hand", "horse", "car", "skirt", "table")
-k "keypoint_skeleton_text" (If necessary, please select an option from the 'predefined_keypoints.py' file.)
We also support the inference using gradio.
python app.py
Checkpoints
| name | backbone | Keypoint AP on COCO | Checkpoint | Config | |
|---|---|---|---|---|---|
| 1 | X-Pose | Swin-T | 74.4 | Google Drive / OpenXLab | GitHub Link |
| 2 | X-Pose | Swin-L | 76.8 | Coming Soon | Coming Soon |
The UniKPT Dataset
| Datasets | KPT | Class | Images | Instances | Unify Images | Unify Instance |
|---|---|---|---|---|---|---|
| COCO | 17 | 1 | 58,945 | 156,165 | 58,945 | 156,165 |
| 300W-Face | 68 | 1 | 3,837 | 4,437 | 3,837 | 4,437 |
| OneHand10K | 21 | 1 | 11,703 | 11,289 | 2,000 | 2000 |
| Human-Art | 17 | 1 | 50,000 | 123,131 | 50,000 | 123,131 |
| AP-10K | 17 | 54 | 10,015 | 13,028 | 10,015 | 13,028 |
| APT-36K | 17 | 30 | 36,000 | 53,006 | 36,000 | 53,006 |
| MacaquePose | 17 | 1 | 13,083 | 16,393 | 2,000 | 2,320 |
| Animal Kingdom | 23 | 850 | 33,099 | 33,099 | 33,099 | 33,099 |
| AnimalWeb | 9 | 332 | 22,451 | 21,921 | 22,451 | 21,921 |
| Vinegar Fly | 31 | 1 | 1,500 | 1,500 | 1,500 | 1,500 |
| Desert Locust | 34 | 1 | 700 | 700 | 700 | 700 |
| Keypoint-5 | 55/31 | 5 | 8,649 | 8,649 | 2,000 | 2,000 |
| MP-100 | 561/293 | 100 | 16,943 | 18,000 | 16,943 | 18,000 |
| UniKPT | 338 | 1237 | - | - | 226,547 | 418,487 |
⢠UniKPT is a unified dataset from 13 existing datasets, which is only for non-commercial research purposes.
⢠All images included in the UniKPT dataset originate from the datasets listed in the table above. To access these images, please download them from the original repository.
⢠We provide the annotations with precise keypoints' textual descriptions for effective training. More conveniently, you can find the text annotations in the link.
Citing X-Pose
If you find this repository useful for your work, please consider citing it as follows:
@article{xpose,
title={X-Pose: Detection Any Keypoints},
author={Yang, Jie and Zeng, Ailing and Zhang, Ruimao and Zhang, Lei},
journal={ECCV},
year={2024}
}
@inproceedings{yang2023neural,
title={Neural Interactive Keypoint Detection},
author={Yang, Jie and Zeng, Ailing and Li, Feng and Liu, Shilong and Zhang, Ruimao and Zhang, Lei},
booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
pages={15122--15132},
year={2023}
}
@inproceedings{yang2022explicit,
title={Explicit Box Detection Unifies End-to-End Multi-Person Pose Estimation},
author={Yang, Jie and Zeng, Ailing and Liu, Shilong and Li, Feng and Zhang, Ruimao and Zhang, Lei},
booktitle={The Eleventh International Conference on Learning Representations},
year={2022}
}