Diagnosing Human-object Interaction Detectors

August 17, 2023 · View on GitHub

Code for "Diagnosing Human-object Interaction Detectors".

Contributed by Fangrui Zhu, Yiming Xie, Weidi Xie, Huaizu Jiang.

Installation

Installl the dependencies.

pip install -r requirements.txt

Due to the inconsistency in the current evaluation protocol, we remove all no interaction annotations from ground truth annotations and predictions. There are two precomputed data files regarding HICO-DET.

data
 └─ hicodet
     |─ sum_gts_filtered.pkl # dict: {HOI triplet: number of it in the ground truth} (test set)
     └─ gt_wo_nointer.pkl # ground truth HOIs for all test images

They can be easily obtained from current HOI benchmarks or you can download them here and put it under data/ folder.

V-COCO

Data files can be obtained from the original repo here. Or you can download files for evaluation here and put it under data/ folder.

data
 └─ vcoco
     |─ vcoco_test.json
     |─ instances_vcoco_all_2014.json 
     └─ vcoco_test.ids

Usage

Standard Evaluation

Run commands below to get standard $mAP$ results without no interaction class. Need to specify --preds_file in the config. This file is the output of the HOI detector, whose format can be found in the examples data/CDN/preds_wo_nointer.pkl and data/CDN/vcoco_r50.pickle. Or you can follow CDN to save this kind of output.

sh eval_hicodet.sh # evaluate on HICO-DET
sh eval_vcoco.sh # evaluate on V-COCO

mAP Improvement

Simply run the following commands to obtain the new mAP after fixing one type of error. You can specify --fix_type and --model_name while changing the error type and HOI model.

sh map_hicodet.sh 
sh map_vcoco.sh

Pair Localization

In order to diagnose the model's performance on pair localization, we need to extract intermediate~(the first stage) results from HOI detectors, i.e. extracting detected pairs before they are passed for interaction classification. Example results can be found here (data/CDN/hicodet_pair_preds.pkl).

Then run commands below to obtain the average number of pairs, recall and precision regarding detected pairs.

sh pair_loc_hicodet.sh
sh pair_loc_vcoco.sh

Action Analysis

Binary Classification for "no_interaction" class

We compute $AP$ of classifying negative human-object pairs to see if the model is able to give low confidence scores to incorrect localized human-object pairs. We compute the no interaction $AP$ on HICO-DET as it contains no interaction annotations. To get the output from HOI detectors, we need to save the classification score belonging to the negative class as $1 - \max_i (p_i) $, where$ p_i $is the classification score of the$ i$-th actual action class. Examples can be found here (data/CDN/pair_pred_wscore.pkl).

With the desired model output, we can obtain the $AP$ of this binary classification problem by running commands below.

sh binary_cls_hicodet.sh

Action mAP on actual classes

For correctly localized pairs, there may be multiple action labels associated with them. We therefore compute the $mAP$ score for the classification over all actual action categories.

In order to get the correct result, one need to save the action score for each predicted triplet, rather than the overall score (human_score * object_score * action_score). Then, running commands below.

sh action_map_hicodet.sh
sh action_map_vcoco.sh

Acknowledgment

Some of the code is borrowed from CDN and v-coco.