README.md

August 16, 2024 · View on GitHub

X-Pose: Detecting Any Keypoints

Online demo: Quick Checkpoint download:

`Project Page` | `Paper` | `UniKPT Dataset` |`Video`

Jie Yang^1,2, Ailing Zeng¹, Ruimao Zhang², Lei Zhang¹

¹International Digital Economy Academy ²The Chinese University of Hong Kong, Shenzhen

🤩 News

2024.07.12: X-Pose supports controllable animal face animation. See details here.
2024.07.02: X-Pose is accepted to ECCV24 (We changed the model name from UniPose to X-Pose to avoid confusion with similarly named previous works).
2024.02.14: We update a file to highlight all classes (1237 classes) in the UNIKPT dataset.
2023.11.28: We are excited to highlight the 68 face keypoints detection ability of X-Pose across any categories in this figure. The definition of face keypoints follows this dataset.
2023.11.9: Thanks to OpenXLab, you can try a quick online demo. Looking forward to the feedback!
2023.11.1: We release the inference code, demo, checkpoints, and the annotation of the UniKPT dataset.
2023.10.13: We release the arxiv version.

In-the-wild Test via X-Pose

X-Pose has strong fine-grained localization and generalization abilities across image styles, categories, and poses.

🗒 TODO

Release inference code and demo.
Release checkpoints.
Release UniKPT annotations.
Release training codes.

💡 Overview

• X-Pose is the first end-to-end prompt-based keypoint detection framework.

• It supports multi-modality prompts, including textual and visual prompts to detect arbitrary keypoints (e.g., from articulated, rigid, and soft objects).

🔨 Environment Setup

Clone this repo

git clone https://github.com/IDEA-Rensearch/X-Pose.git
cd X-Pose

Install the needed packages

pip install -r requirements.txt

Compiling CUDA operators

cd models/UniPose/ops
python setup.py build install
# unit test (should see all checking is True)
python test.py
cd ../../..

• We have released the textual prompt-based branch for inference. As the visual prompt involves a substantial amount of user input, we are currently exploring more user-friendly platforms to support this functionality.

• Since X-Pose has learned strong structural prior, it's best to use the predefined skeleton as the keypoint textual prompts, which are shown in predefined_keypoints.py.

• If users don't provide a keypoint prompt, we'll try to match the appropriate skeleton based on the user's instance category. If unsuccessful, we'll default to using the animal's skeleton, which covers a wider range of categories and testing requirements.

2. Run

Replace {GPU ID}, image_you_want_to_test.jpg, and "dir you want to save the output" with appropriate values in the following command

CUDA_VISIBLE_DEVICES={GPU ID} python inference_on_a_image.py \
-c config/UniPose_SwinT.py \
-p weights/unipose_swint.pth \
-i image_you_want_to_test.jpg \
-o "dir you want to save the output" \
-t "instance categories" \ (e.g., "person", "face", "left hand", "horse", "car", "skirt", "table")
-k "keypoint_skeleton_text" (If necessary, please select an option from the 'predefined_keypoints.py' file.)

We also support the inference using gradio.

python app.py

Checkpoints

	name	backbone	Keypoint AP on COCO	Checkpoint	Config
1	X-Pose	Swin-T	74.4	Google Drive / OpenXLab	GitHub Link
2	X-Pose	Swin-L	76.8	Coming Soon	Coming Soon

The UniKPT Dataset

Datasets	KPT	Class	Images	Instances	Unify Images	Unify Instance
COCO	17	1	58,945	156,165	58,945	156,165
300W-Face	68	1	3,837	4,437	3,837	4,437
OneHand10K	21	1	11,703	11,289	2,000	2000
Human-Art	17	1	50,000	123,131	50,000	123,131
AP-10K	17	54	10,015	13,028	10,015	13,028
APT-36K	17	30	36,000	53,006	36,000	53,006
MacaquePose	17	1	13,083	16,393	2,000	2,320
Animal Kingdom	23	850	33,099	33,099	33,099	33,099
AnimalWeb	9	332	22,451	21,921	22,451	21,921
Vinegar Fly	31	1	1,500	1,500	1,500	1,500
Desert Locust	34	1	700	700	700	700
Keypoint-5	55/31	5	8,649	8,649	2,000	2,000
MP-100	561/293	100	16,943	18,000	16,943	18,000
UniKPT	338	1237	-	-	226,547	418,487

• UniKPT is a unified dataset from 13 existing datasets, which is only for non-commercial research purposes.

• All images included in the UniKPT dataset originate from the datasets listed in the table above. To access these images, please download them from the original repository.

• We provide the annotations with precise keypoints' textual descriptions for effective training. More conveniently, you can find the text annotations in the link.

Citing X-Pose

If you find this repository useful for your work, please consider citing it as follows:

@article{xpose,
  title={X-Pose: Detection Any Keypoints},
  author={Yang, Jie and Zeng, Ailing and Zhang, Ruimao and Zhang, Lei},
  journal={ECCV},
  year={2024}
}

@inproceedings{yang2023neural,
  title={Neural Interactive Keypoint Detection},
  author={Yang, Jie and Zeng, Ailing and Li, Feng and Liu, Shilong and Zhang, Ruimao and Zhang, Lei},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
  pages={15122--15132},
  year={2023}
}

@inproceedings{yang2022explicit,
  title={Explicit Box Detection Unifies End-to-End Multi-Person Pose Estimation},
  author={Yang, Jie and Zeng, Ailing and Liu, Shilong and Li, Feng and Zhang, Ruimao and Zhang, Lei},
  booktitle={The Eleventh International Conference on Learning Representations},
  year={2022}
}

README.md

X-Pose: Detecting Any Keypoints

`Project Page` | `Paper` | `UniKPT Dataset` |`Video`

🤩 News

In-the-wild Test via X-Pose

Detecting any Face Keypoints:

🗒 TODO

💡 Overview

Visual Prompts as Inputs:

Textual Prompts as Inputs:

🔨 Environment Setup

▶ Demo

1. Guidelines

2. Run

Checkpoints

The UniKPT Dataset

Citing X-Pose

X-Pose: Detecting Any Keypoints

Project Page | Paper | UniKPT Dataset |Video

`Project Page` | `Paper` | `UniKPT Dataset` |`Video`