AudioCaps

February 25, 2025 ยท View on GitHub

CSV Description

There are 4 columns in the csv file.

  • audiocap_id: The id unique to the audio clips and its corresponding caption.
  • youtube_id: The youtube clip that the audio belongs to. You can use this to obtain the VGGish embedding from AudioSet.
  • start_time: The start time of the clip.
  • caption: The audio caption.

Statistics:

SplitCount
Train49,838
Validation495
Test975
Total51,308

Raw Video and Audio

Please fill out this form and we will get back to you with the download link.

Last edit: Feb 25, 2025