Video Joint Modelling Based on Hierarchical Transformer for Co-summarization (VJMHT)
August 24, 2025 ยท View on GitHub
Haopeng Li, Qiuhong Ke, Mingming Gong, Rui Zhang
IEEE Transactions on Pattern Analysis and Machine Intelligence
Introduction
We propose Video Joint Modelling based on Hierarchical Transformer (VJMHT) for co-summarization, which takes into consideration the semantic dependencies across videos.
VJMHT consists of two layers of Transformer: the first layer extracts semantic representation from individual shots of similar videos, while the second layer performs shot-level video joint modelling to aggregate cross-video semantic information. By this means, complete cross-video high-level patterns are explicitly modelled and learned for the summarization of individual videos.
Moreover, Transformer-based video representation reconstruction is introduced to maximize the high-level similarity between the summary and the original video.
Requirements and Dependencies
- Python=3.8.5
- PyTorch=1.9, ortools=8.1.8487
Data Preparation
Download the datasets to datasets/.
Evaluation
Download our models to results/.
Run the following command to test our models.
$ python main.py -c configs/dataset_setting.py --eval
where dataset_setting.py is the configuration file that can be found in configs/. The results are saved in results/DATASET_SETTING/.
Example for testing the model trained on TVSum in the canonical setting:
$ python main.py -c configs/tvsum_can.py --eval
The results are saved in results/TVSUM_CAN.
Training
Run the following command to train the model:
$ python main.py -c configs/dataset_setting.py
Example for training the model on TVSum in the canonical setting:
$ python main.py -c configs/tvsum_can.py
The trained models and results are saved in results/TVSUM_CAN.
License and Citation
The use of this code is RESTRICTED to non-commercial research and educational purposes.
If you use this code or reference our paper in your work please cite this publication as:
@article{li2022video,
title={Video Joint Modelling Based on Hierarchical Transformer for Co-summarization},
author={Li, Haopeng and Ke, Qiuhong and Gong, Mingming and Zhang, Rui},
journal={IEEE Transactions on Pattern Analysis and Machine Intelligence},
year={2022},
publisher={IEEE}
}
Acknowledgement
The code is developed based on VASNet.