Customized Human Object Interaction Image Generation

August 31, 2025 · View on GitHub

arXiv Hugging Face

By Zhu Xu, Zhaowen Wang, Yuxin Peng, Yang Liu*

Accepted by ACM-MM 2025

Quick start

our pre-trained model for iamg and mgig are stored in LINK, download and place them at ./ckpts.

1. Mask Generation

For mask generation, first follow ./iamg/environment.yml to build the virtual environment and activate it.

cd ./iamg
python main_demo.py --hoi_category 'a person is riding a bicycle' --demo_sample ./demo_data/1.jpg --position [0.3,0.8,0.3,0.8]
#### demo_sample and position are used to specified the background image and union location of human-object.

2. HOI Image Generation

For hoi image generation, alter the virtual environment to following environment, which is constructed via

pip install -r ./mgig/requirements.txt
pip install git+https://github.com/cocodataset/panopticapi.git
pip install pycocotools -i https://pypi.douban.com/simple
pip install lvis

Then generate with demo sample

cd ./mgig
python run_inference_demo.py

Data prepration

Our data is stored in LINK, download and place the data like:

-data
   |--train
        |--image
        |--video
        |--video_2
   |--test
   |--annos
   |--hico_det_clip_instance

Evaluation

Firstly, use our model to generate hoi samples on our testset.

### for mask generation ####
cd ./iamg
python main_eval.py --OUTPUT_ROOT ./OUTPUT/eval

The generated masks will be saved in ./OUTPUT/eval folder. Then use the masks to generate HOI images

cd ../mgig
python run_inference_hoi_w_one_stage_mask_eval.py

Generated HOI images will be saved in ./iamg/OUTPUT/eval folder. Then we separately evaluate the quality of generated image in terms of interaction semantic control and subject customization.

1.Interaction Semantic Control

For spatial-sensitive semantic evaluation, you should additionally follow FGAHOI to construct the environment for evaluation.

cd ./FGAHOI
python main.py

For Holistic Semantic, need to install LLaVA follow LLaVA

cd ./eval/LLaVA/llava/eval
python hico_test_hoid_w_llava.py
ModelHolistic SemanticInteraction SemanticInteraction SemanticInteraction Semantic
FullRareNon-rare
AnyDoor82.0417.5410.6319.18
Ours86.0222.0711.8723.87

2.Subject Customization

cd ./eval
python eval_dino.py
python eval_clip.py
ModelCLIP-IDINO-humanDINO-objectDINO-pair
AnyDoor82.3170.0872.2774.14
Ours87.6078.9081.3983.27

Training

1.iamg training

cd ./iamg
python main.py --OUTPUT_ROOT ./OUTPUT/train

2. mgig training

cd ../mgig
python run_train_anydoor.py