BT-Adapter

February 2, 2024 ยท View on GitHub

BT-Adapter: Video Conversation is Feasible Without Video Instruction Tuning


PWC PWC PWC PWC PWC PWC PWC PWC PWC PWC PWC PWC PWC


PaperWeightsVideo-text PretrainingDownstream EvaluationInstruction TuningVideoChatGPT Evaluation
Video-text PretrainingDownstream EvaluationInstruction TuningVideoChatGPT Evaluation

Overview and Highlights

๐Ÿ’ก Plug-and-use, parameter-efficient, multimodal-friendly, and temporal-sensitive structure

๐Ÿ’ก State-of-the-art zero-shot results on various video tasks using thousands of fewer GPU hours

๐Ÿ’ก State-of-the-art video conversation results with and without video instruction tuning

Qualitative Results

The Evaluation of BT-Adapter's Performance across Different Situations.

๐Ÿ‘€ The Sequence of Actions

๐Ÿ‘€ Unusual Actions

๐Ÿ‘€ Complex Actions and Scenes In A Long Video

Citation

If you find the code useful for your research, please consider citing our paper:

@article{liu2023one,
  title={One for all: Video conversation is feasible without video instruction tuning},
  author={Liu, Ruyang and Li, Chen and Ge, Yixiao and Shan, Ying and Li, Thomas H and Li, Ge},
  journal={arXiv preprint arXiv:2309.15785},
  year={2023}
}