CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval
November 20, 2023 · View on GitHub
The implementation of paper CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval.
CLIP4Clip is a video-text retrieval model based on CLIP (ViT-B). We investigate three similarity calculation approaches: parameter-free type, sequential type, and tight type, in this work. The model achieve SOTA results on MSR-VTT, MSVD, LSMDC, ActivityNet, and DiDeMo.

Requirement
-
Run
pip install -r requirement.textto install the exactly same dependencies. -
Or use
conda-packcommand to install the environment downloaded from here with [0dhw]:pip install conda-pack mkdir -p [path_to_conda_env] # (e.g., ~/anaconda/envs/ENV_NAME) tar -zxvf [ENV_NAME].tar.gz -C [path_to_conda_env]
Data Preparing
1. For MSRVTT
The official data and video links can be found in link.
For the convenience, you can also download the splits and captions by,
wget https://github.com/ArrowLuo/CLIP4Clip/releases/download/v0.0/msrvtt_data.zip
Besides, the raw videos can be found in sharing from Frozen️ in Time, i.e.,
wget https://www.robots.ox.ac.uk/~maxbain/frozen-in-time/data/MSRVTT.zip
2. For MSVD
Raw videos can be download from link.
The splits and raw_captions can be found in the wonderful job collaborative-experts. For the convenience, you can also download them by,
wget https://github.com/ArrowLuo/CLIP4Clip/releases/download/v0.0/msvd_data.zip
Compress Video (optional)
Our UniPT adopts this operation for Speed-up.
python preprocess/compress_video.py --input_root [raw_video_path] --output_root [compressed_video_path]
This script will compress the video to 3fps with width 224 (or height 224). Modify the variables for your customization.
Training and Testing
- Download CLIP (ViT-B/32) weight into
CLIP4Clip/modules/ViT-B-32.pt.
wget -P ./modules https://openaipublic.azureedge.net/clip/models/40d365715913c9da98579312b702a82c18be219cc2a73407c4526f58eba950af/ViT-B-32.pt
-
Then run
./train_xxxx_tuning.shto obtain the corresponding model inckpts/. -
One can download our best checkpoints of MSR-VTT and MSVD with [0dhw].