Learn to Understand Negation in Video Retrieval

September 13, 2022 · View on GitHub

This is the official source code of our paper: Learn to Understand Negation in Video Retrieval.

Requirements

We used Anaconda to setup a deep learning workspace that supports PyTorch. Run the following script to install all the required packages.

conda create -n py37 python==3.7 -y
conda activate py37
git clone git@github.com:ruc-aimc-lab/nT2VR.git
cd nT2VR
pip install -r requirements.txt

Prepare Data

Download official video data

  • For MSRVTT, the official data can be found in link. The raw videos can be found in sharing from Frozen️ in Time.

    We follow the official MSRVTT3k split and MSRVTT1k split (described in the paper JSFUSION)

  • For vatex, the official data can be found in this link

    We follow the split of HGR

  • We extract frames from the video at a frame rate of 0.5s before training, using scrip from video-cnn-feat. Each data folder should also contain a file indicates frame id and the image path.(See the example of id.imagepath.txt. The prefix of frame id should be consistent with video id.)

Download text data for Training & Evaluation in nT2V

Download data for training & evaluation in nT2V. We use the prefix "msrvtt10k" and "msrvtt1kA" to distinguish MSR-VTT3k split and MSR-VTT1k split.

  • The training data augumented by negator is named as "**.caption.neagtion.txt". The negated and composed test query sets are named as "**.negated.txt" and "**.composed.txt".

Evaluation on test queries of nT2V

We provide script for evaluting zero-shot CLIP, CLIP* and CLIP-bnl on nT2V.

  • CLIP: original model, used in a zero-shot setting
  • CLIP*: Fine-tuned CLIP on text-to-video retrieval data using retrieval loss.
  • CLIP-bnl: Fine-tuned CLIP using proposed negation leraning. Here are the checkpoints and performances of CLIP, CLIP* and CLIP-bnl:

MSR-VTT3k

Model CheckpointOriginalNegatedComposed
R1R1R5R5R10R10MIRMIRΔR1\Delta R1ΔR5\Delta R5ΔR10\Delta R10ΔMIR\Delta MIRR1R1R5R5R10R10MIRMIR
CLIP20.840.349.70.3051.52.52.90.0206.924.235.60.160
CLIP*27.753.064.20.3980.51.11.10.00811.433.346.20.225
CLIP (boolean)--------18.837.546.25.916.723.90.1180.116
CLIP* (boolean)--------25.347.156.113.533.745.50.2360.243
CLIP-bnl28.453.764.60.4045.06.96.90.05715.340.053.30.274

MSR-VTT1k

Model CheckpointOriginalNegatedComposed
R1R1R5R5R10R10MIRMIRΔR1\Delta R1ΔR5\Delta R5ΔR10\Delta R10ΔMIR\Delta MIRR1R1R5R5R10R10MIRMIR
CLIP31.654.264.20.4221.41.41.50.01712.935.046.20.237
CLIP*41.169.879.90.5430.01.71.00.00617.346.861.20.310
CLIP (boolean)--------26.446.256.80.3546.318.425.90.129
CLIP* (boolean)--------35.959.565.20.46317.642.052.00.291
CLIP-bnl42.168.479.60.54612.211.714.40.12124.857.668.80.391

VATEX

Model CheckpointOriginalNegatedComposed
R1R1R5R5R10R10MIRMIRΔR1\Delta R1ΔR5\Delta R5ΔR10\Delta R10ΔMIR\Delta MIRR1R1R5R5R10R10MIRMIR
CLIP41.472.982.70.5551.92.12.20.01810.528.341.30.201
CLIP*56.888.494.40.7030.20.40.70.00414.239.253.30.266
CLIP (boolean)--------32.557.264.50.4315.018.025.60.116
CLIP* (boolean)--------25.347.156.10.35314.134.445.10.243
CLIP-bnl57.688.394.00.70814.011.78.60.12516.639.953.90.284
  • To evaluate zero-shot CLIP, run the script clip.sh
# use 'rootpath' to specify the path to the data folder
cd shell/test
bash clip.sh
  • To evaluate CLIP*, run the script clipft.sh
# use 'rootpath' to specify the path to the data folder
# use 'model_path' to specify the path of model
cd shell/test
bash clipft.sh
  • To evaluate zero-shot CLIP+boolean, run the script clip_bool.sh
cd shell/test
bash clip_bool.sh
cd shell/test
bash clipft_bool.sh
cd shell/test
bash clip_bnl.sh

Train CLIP-bnl from scratch

  • train CLIP-bnl on MSR-VTT3k split, run
# use 'rootpath' to specify the path to the data folder
cd shell/train
bash msrvtt7k_clip_bnl.sh
  • train CLIP-bnl on MSR-VTT1k split, run
cd shell/train
bash msrvtt9k_clip_bnl.sh
  • train CLIP-bnl on VATEX, run
cd shell/train
bash vatex_clip_bnl.sh
  • Additionally, training script of CLIP* is clipft.sh

Produce new negated & composed data

  1. install additional packages:
cd negationdata
pip install -r requirements.txt
  1. download checkpoint of negation scope detection model,which is built on NegBERT
  2. run the script prepare_data.sh
# use 'rootpath' to specify the path to the data folder
#use 'cache_dir'to specify the path to path of models used in negation scope detection model 
cd negationdata
bash prepare_data.sh

Citation

@inproceedings{mm22-nt2vr,
title = {Learn to Understand Negation in Video Retrieval},
author = {Ziyue Wang and Aozhu Chen and Fan Hu and Xirong Li},
year = {2022},
booktitle = {ACMMM},
}

Contact

If you enounter issues when running the code, please feel free to reach us.