[ICCV 2023] Efficient Adaptive Human-Object Interaction Detection with Concept-guided Memory
March 8, 2024 ยท View on GitHub
Dataset
Follow the process of UPT.
The downloaded files should be placed as follows. Otherwise, please replace the default path to your custom locations.
|- ADA-CM
| |- hicodet
| | |- hico_20160224_det
| | |- annotations
| | |- images
| |- vcoco
| | |- mscoco2014
| | |- train2014
| | |-val2014
: :
Dependencies
-
Follow the environment setup in UPT.
-
Our code is built upon CLIP. Install the local package of CLIP:
cd CLIP && python setup.py develop && cd ..
- Download the CLIP weights to
checkpoints/pretrained_clip.
|- ADA-CM
| |- checkpoints
| | |- pretrained_clip
| | |- ViT-B-16.pt
| | |- ViT-L-14-336px.pt
: :
- Download the weights of DETR and put them in
checkpoints/.
| Dataset | DETR weights |
|---|---|
| HICO-DET | weights |
| V-COCO | weights |
|- ADA-CM
| |- checkpoints
| | |- detr-r50-hicodet.pth
| | |- detr-r50-vcoco.pth
: : :
Pre-extracted Features
Download the pre-extracted features from HERE and the pre-extracted bboxes from HERE. The downloaded files have to be placed as follows.
|- ADA-CM
| |- hicodet_pkl_files
| | |- union_embeddings_cachemodel_crop_padding_zeros_vitb16.p
| | |- hicodet_union_embeddings_cachemodel_crop_padding_zeros_vit336.p
| | |- hicodet_train_bbox_R50.p
| | |- hicodet_test_bbox_R50.p
| |- vcoco_pkl_files
| | |- vcoco_union_embeddings_cachemodel_crop_padding_zeros_vit16.p
| | |- vcoco_union_embeddings_cachemodel_crop_padding_zeros_vit336.p
| | |- vcoco_train_bbox_R50.p
| | |- vcoco_test_bbox_R50.p
: :
TrainingFree Mode
HICO-DET
python main_tip_ye.py --world-size 1 --pretrained checkpoints/detr-r50-hicodet.pth --output-dir checkpoints/test --eval --post_process --use_multi_hot --logits_type HO+U+T --num_shot 8 --file1 hicodet_pkl_files/union_embeddings_cachemodel_crop_padding_zeros_vitb16.p --clip_dir_vit checkpoints/pretrained_clip/ViT-B-16.pt
V-COCO
Cache detection results for evaluation on V-COCO:
python main_tip_ye.py --world-size 1 --dataset vcoco --data-root vcoco/ --partitions trainval test --pretrained checkpoints/detr-r50-vcoco.pth --output-dir matlab/TF_vcoco/ --num-workers 4 --cache --post_process --dic_key verb --use_multi_hot --num_shot 8 --logits_type HO+U+T --file1 vcoco_pkl_files/vcoco_union_embeddings_cachemodel_crop_padding_zeros_vit16.p
For V-COCO, we did not implement evaluation utilities, and instead use the utilities provided by Gupta et al.. Refer to these instructions for more details.
FineTuning Mode
HICO-DET
Train on HICO-DET:
python main_tip_finetune.py --world-size 1 --pretrained checkpoints/detr-r50-hicodet.pth --output-dir checkpoints/hico --use_insadapter --num_classes 117 --use_multi_hot --file1 hicodet_pkl_files/union_embeddings_cachemodel_crop_padding_zeros_vitb16.p --clip_dir_vit checkpoints/pretrained_clip/ViT-B-16.pt
Test on HICO-DET:
python main_tip_finetune.py --world-size 1 --pretrained checkpoints/detr-r50-hicodet.pth --output-dir checkpoints/hico --use_insadapter --num_classes 117 --use_multi_hot --file1 hicodet_pkl_files/union_embeddings_cachemodel_crop_padding_zeros_vitb16.p --clip_dir_vit checkpoints/pretrained_clip/ViT-B-16.pt --eval --resume CKPT_PATH
V-COCO
Training on V-COCO
python main_tip_finetune.py --world-size 1 --dataset vcoco --data-root vcoco/ --partitions trainval test --pretrained checkpoints/detr-r50-vcoco.pth --output-dir checkpoints/vcoco-injector-r50 --use_insadapter --num_classes 24 --use_multi_hot --file1 vcoco_pkl_files/vcoco_union_embeddings_cachemodel_crop_padding_zeros_vit16.p --clip_dir_vit checkpoints/pretrained_clip/ViT-B-16.pt
Cache detection results for evaluation on V-COCO
python main_tip_finetune.py --world-size 1 --dataset vcoco --data-root vcoco/ --partitions trainval test --pretrained checkpoints/detr-r50-vcoco.pth --output-dir checkpoints/vcoco-injector-r50 --use_insadapter --num_classes 24 --use_multi_hot --file1 vcoco_pkl_files/vcoco_union_embeddings_cachemodel_crop_padding_zeros_vit16.p --clip_dir_vit checkpoints/pretrained_clip/ViT-B-16.pt --cache --resume CKPT_PATH
Model Zoo
| Dataset | Backbone | mAP | Rare | Non-rare | Weights |
|---|---|---|---|---|---|
| HICO-DET | ResNet-50+ViT-B | 33.80 | 31.72 | 34.42 | weights |
| HICO-DET | ResNet-50+ViT-L | 38.40 | 37.52 | 38.66 | weights |
| Dataset | Backbone | Scenario 1 | Scenario 2 | Weights |
|---|---|---|---|---|
| V-COCO | ResNet-50+ViT-B | 56.12 | 61.45 | weights |
| V-COCO | ResNet-50+ViT-L | 58.57 | 63.97 | weights |
Citation
If you find our paper and/or code helpful, please consider citing:
@article{ting2023hoi,
title={Efficient Adaptive Human-Object Interaction Detection with Concept-guided Memory},
author={Ting Lei and Fabian Caba and Qingchao Chen and Hailin Ji and Yuxin Peng and Yang Liu},
year={2023},
booktitle={ICCV},
organization={IEEE},
}
Acknowledgement
We gratefully thank the authors from UPT for open-sourcing their code.