AudioCaps
February 25, 2025 ยท View on GitHub
CSV Description
There are 4 columns in the csv file.
- audiocap_id: The id unique to the audio clips and its corresponding caption.
- youtube_id: The youtube clip that the audio belongs to. You can use this to obtain the VGGish embedding from AudioSet.
- start_time: The start time of the clip.
- caption: The audio caption.
Statistics:
| Split | Count |
|---|---|
| Train | 49,838 |
| Validation | 495 |
| Test | 975 |
| Total | 51,308 |
Raw Video and Audio
Please fill out this form and we will get back to you with the download link.
Last edit: Feb 25, 2025