reconstruction-network-for-video-captioning
September 19, 2019 ยท View on GitHub
NOTE: the new repository is opened (link)
reconstruction-network-for-video-captioning
This project tries to implement RecNet proposed on Reconstruction Network for Video Captioning in CVPR 2018.
Requirements
- Ubuntu 16.04
- CUDA 9.0
- cuDNN 7.3.1
- Java 1.8
- Python 2.7.12
- PyTorch 1.0
- Other python libraries specified in requirements.txt
How to use
Step 1. Setup python virtual environment
$ pip install virtualenv
$ virtualenv .env
$ source .env/bin/activate
(.env) $ pip install --upgrade pip
(.env) $ pip install -r requirements.txt
Step 2. Prepare Data
-
Extract feature vectors of datasets (e.g. MSVD), and locate them at
~/<dataset>/features/<network>.hdf5e.g. InceptionV4 feature vectors of MSVD dataset will be located at
~/data/MSVD/features/InceptionV4.hdf5. -
Set hyperparameters in
config.pyand split the dataset into train / val / test dataset by running following command.(.env) $ python -m scripts.split
Step 3. Train
- Set hyperparameters in
config.py. - Run
(.env) $ python train.py
Step 4. Inference
- Set hyperparameters in
config.py. - Run
(.env) $ python run.py
Result
Comparison with original paper
NOTE: For now, only 2D features are used for evaluating our model (3D features are missing).
-
MSVD
BLEU4 METEOR CIDEr ROUGE_L Ours (wo. reconstructor) 39.4 27.2 37.8 61.8 Ours (global) 40.7 27.3 34.4 61.9 Ours (local) 35.3 27.3 35.2 61.9 Paper (global) 51.1 34.0 69.4 79.7 Paper (local) 52.3 34.1 69.8 80.7
-
MSR-VTT
BLEU4 METEOR CIDEr ROUGE_L Ours - - - - Paper (global) 38.3 26.2 59.1 41.7 Paper (local) 39.1 26.6 59.3 42.7
TODO
- Add qualitative results.
- Add C3D feature vectors.
- Add MSR-VTT dataset.