Video-Guided Text-to-Music Generation Using Public Domain Movie Collections (ISMIR 2025)

August 17, 2025 · View on GitHub

This repository provides the Python implementations of our proposed model architecture, which integrates a video adapter into MusicGen, introduced in our paper titled "Video-Guided Text-to-Music Generation Using Public Domain Movie Collections" from ISMIR 2025.

Model Architecture

If you find this repository useful for your research, please consider citing our paper.

@article{kim2025ossl,
  title = {Video-Guided Text-to-Music Generation Using Public Domain Movie Collections},
  author = {Haven Kim and Zachary Novack and Weihan Xu and Julian McAuley and Hao-Wen Dong},
  journal = {ISMIR 2025},
  year = {2025},
  url = {https://arxiv.org/abs/2506.12573}
}

Acknowledgements

Our implementation builds heavily on the official audiocraft repository.

Open Screen Sound Library Version 1 (OSSL-v1.)

Please see this webpage for downloading the dataset.