Sequence to Sequence - Video to Text (S2VT)

February 4, 2016 · View on GitHub

##Sequence to Sequence -- Video to Text

Paper : ICCV 2015 PDF

Download Model: S2VT_VGG_RGB_MODEL (333MB)

Description

This is the S2VT (RGB) model described in the ICCV 2015 paper "Sequence to Sequence -- Video to Text". It uses video frame features from the VGG-16 layer model. This is trained only on the Youtube video dataset.

Sequence to Sequence - Video to Text
S. Venugopalan, M. Rohrbach, J. Donahue, T. Darrell, R. Mooney, K. Saenko
The IEEE International Conference on Computer Vision (ICCV) 2015

Please consider citing the above paper if you use this model.

Performance

The METEOR score of this model is 29.2% on the Youtube (MSVD) video test dataset. (refer to Table 2 in the Sequence to Sequence - Video to Text paper).

Caffe compatibility

The models are currently supported by the recurrent branch of the Caffe fork by Jeff Donahue and Subhashini Venugopalan, but are not yet compatible with master branch of Caffe.

Training

More details on the code and data can be found on this Project Page.

The prototxts for the network and solver can also be found here: https://github.com/vsubhashini/caffe/tree/recurrent/examples/s2vt