Joint Event Detection and Description in Continuous Video Streams
November 4, 2020 ยท View on GitHub
Code released by Huijuan Xu (Boston University).
Introduction
We present the Joint Event Detection and Description Network (JEDDi-Net) that solves the dense captioning task in an end-to-end fashion. Our model continuously encodes the input video stream with three-dimensional convolutional layers, proposes variable-length temporal events based on pooled features, and transcribes the event proposals into captions with the consideration of visual and language context.
License
JEDDi-Net is released under the MIT License (refer to the LICENSE file for details).
Citing JEDDi-Net
If you find JEDDi-Net useful in your research, please consider citing:
@article{xu2019joint,
title={Joint Event Detection and Description in Continuous Video Streams},
author={Xu, Huijuan and Li, Boyang and Ramanishka, Vasili and Sigal, Leonid and Saenko, Kate},
journal={2019 IEEE Winter Conference on Applications of Computer Vision (WACV)},
year={2019}
}
Contents
Installation:
-
Clone the JEDDi-Net repository.
git clone --recursive git@github.com:VisionLearningGroup/JEDDi-Net.git -
Build
Caffe3dwithpycaffe(see: Caffe installation instructions).Note: Caffe must be built with Python support!
cd ./caffe3d
# If have all of the requirements installed and your Makefile.config in place, then simply do:
make -j8 && make pycaffe
-
Build JEDDi-Net lib folder.
cd ./lib make
Preparation:
-
Download the ground truth annatations and videos in ActivityNet Captions dataset.
-
Extract frames from downloaded videos in 25 fps.
-
Generate the pickle data for training and testing JEDDi-Net model.
cd ./preprocess # generate training data python generate_train_roidb_sorted.py # generate validation data python generate_val_roidb.py
Training:
-
Download the separately-trained segment proposal network(SPN) and captioning models ./pretrain/ .
-
In JEDDi-Net root folder, run:
bash ./experiments/denseCap_jeddiNet_end2end/script_train.sh
Testing:
-
Download one sample JEDDi-Net model to ./snapshot/ .
One JEDDi-Net model on ActivityNet Captions dataset is provided in: caffemodel .
The provided JEDDi-Net model has the METEOR score ~8.58% on the validation set.
-
In JEDDi-Net root folder, generate the prediction log file on the validation set.
bash ./experiments/denseCap_jeddiNet_end2end/test/script_test.sh -
Generate the results.json file from the prediction log file.
cd ./experiments/denseCap_jeddiNet_end2end/test/ bash bash.sh -
Follow the evaluation code to get the evaluation results.