reconstruction-network-for-video-captioning

September 19, 2019 ยท View on GitHub

NOTE: the new repository is opened (link)


reconstruction-network-for-video-captioning

This project tries to implement RecNet proposed on Reconstruction Network for Video Captioning in CVPR 2018.

Requirements

  • Ubuntu 16.04
  • CUDA 9.0
  • cuDNN 7.3.1
  • Java 1.8
  • Python 2.7.12
    • PyTorch 1.0
    • Other python libraries specified in requirements.txt

How to use

Step 1. Setup python virtual environment

$ pip install virtualenv
$ virtualenv .env
$ source .env/bin/activate
(.env) $ pip install --upgrade pip
(.env) $ pip install -r requirements.txt

Step 2. Prepare Data

  1. Extract feature vectors of datasets (e.g. MSVD), and locate them at ~/<dataset>/features/<network>.hdf5

    e.g. InceptionV4 feature vectors of MSVD dataset will be located at ~/data/MSVD/features/InceptionV4.hdf5.

  2. Set hyperparameters in config.py and split the dataset into train / val / test dataset by running following command.

    (.env) $ python -m scripts.split
    

Step 3. Train

  1. Set hyperparameters in config.py.
  2. Run
    (.env) $ python train.py
    

Step 4. Inference

  1. Set hyperparameters in config.py.
  2. Run
    (.env) $ python run.py
    

Result

Comparison with original paper

NOTE: For now, only 2D features are used for evaluating our model (3D features are missing).

  • MSVD

    BLEU4METEORCIDErROUGE_L
    Ours (wo. reconstructor)39.427.237.861.8
    Ours (global)40.727.334.461.9
    Ours (local)35.327.335.261.9
    Paper (global)51.134.069.479.7
    Paper (local)52.334.169.880.7
  • MSR-VTT

    BLEU4METEORCIDErROUGE_L
    Ours----
    Paper (global)38.326.259.141.7
    Paper (local)39.126.659.342.7

TODO

  • Add qualitative results.
  • Add C3D feature vectors.
  • Add MSR-VTT dataset.