Weakly-Supervised-Dense-Video-Captioning
November 3, 2018 ยท View on GitHub
This repo try to implement Weakly Supervised Dense Video Captioning in tensorflow but not complete yet.
Requirement
- Python 3
- Keras 2.2
- Tensorflow 1.8
Usage
- Run
lexical_Res.pyfor training FCN with MIML loss while saving weights with the lowest loss. - Run
region_selection.pyto generate most informative and coherrence region sequence. - Run
TRY3/model_seq2seq.pyto train language model. - While using
TRY3/s2vt_predict_v2.pyto inference the model.
Guide
extract_frames.py: Uniform sampling 30 frames for each video.load_data.py: Create label vector and word dictionary.Res_video_bag.py: Lexical FCN(Resnet50) with a frame as an instance.lexical_Res.py: Lexical FCN(Resnet50) with a region as an instance.region_selection.py: Region sequence generator, which cound form one region sequence now.- dic/: Where to put ix2word, word2ix, word_counts.
- frames/: Where to put frames extracted by
extract_frames.py. - MSRVTT/: Where to put training/testing labels and region sequences generated by
region_selection.py. - videos/: Where to put the MSR-VTT videos.
- Weight_Resnet50/: Where to put weight save from
lexical_Res.py. - Weight_Resnet50_vasbag/: Where to put weight save from
Res_video_bag.py - TRY3/
s2vt_train.py: Language model using S2VT.(train) - TRY3/
s2vt.py: S2VT model graph. - TRY3/
s2vt_inference.py: Language model using S2VT.(inference)
Reference
Contact
Shih-Chen Lin (dennis60512@gmail.com)
Any discussions and suggestions are welcome!