mir-datasets.md
March 21, 2026 · View on GitHub
| status | dataset | metadata | contents | with audio |
|---|---|---|---|---|
| ☠ | 200DrumMachines | audio samples | 7371 one-shots | yes |
| ✅ | AAM | onsets, pitches, instruments, melody instrument, keys, chords, tempos, beats, segments | 3,000 music tracks (with single instrument multitracks) | yes |
| ✅ | AccoMontage2 | song harmonization and accompaniment arrangement based on a lead melody | None | no |
| ☠ | ACM_MIRUM | tempo | 1410 excerpts (60s) | yes |
| ✅ | AcousticBrainz-Genre | 15-31 genres with 265-745 subgenres | audio features for about 2000000 songs | no |
| ✅ | ADC2004 | predominant pitch | 20 excerpts | yes |
| ✅ | Acoustic Event Dataset | 28 event classes | 5223 audio snippets | yes |
| ✅ | AIST Dance Video Database | street dance videos | 13,940 videos for 60 pieces | yes |
| ✅ | Amg1608 | valence & arousal | 1608 excerpts (30s) | no |
| ✅ | AMT-pilot | structure by multiple annotators | 8 songs | yes |
| ✅ | Automatic Practice Logging | piano practice | 620 segments | yes |
| ✅ | artist20 | 20 artists | 1413 songs | no |
| ✅ | ASAP | aligned MIDI/audio performances and MIDI/XML scores, beats, downbeats, time signatures, key signatures | 1068 MIDI performances, 520 audio performances, 222 scores | yes (see MAESTRO) |
| ✅ | ATEPP | symbolic music MIDI, musicXML, classification tasks, expressive piano performances | 1742 performances (~1000 hours) by 49 pianists and covers 1580 movements by 25 composers | |
| ✅ | AudioSet | 632 event classes | 2084320 clips (10s) | no |
| ☠ | bach10 | aligned multitrack MIDI | 10 chorales | yes |
| ✅ | BAF | audio fingerprinting, music monitoring in broadcast | 2,000 tracks from Epidemic Sound and 3,425 TV audio recordings (60s) | yes |
| ✅ | ballroom | 8 genres, tempo, beats, bars / downbeats | 698 excerpts (30s) | yes |
| ✅ | beatboxset1 | percussion annotation | 14 clips | yes |
| ✅ | BPS-FH Beethoven Piano Sonata with Function Harmony | functional annotation | 32 sonatas | no |
| ✅ | C224a | 14 genres | 224 artists | no |
| ✅ | C3ka | 18 genres | 3000 artists | no |
| ✅ | C49ka-C111ka | genres | 48800/110588 artists | no |
| ✅ | CAL10k | tags | 10870 songs | no |
| ✅ | CAL500 | tags | 502 songs | yes |
| ✅ | CarnaticRhythm | sama, beats | 176 pieces | on request |
| ✅ | Chordify Annotator Subjectivity Dataset | chords by 4 annotators | 50 songs | no |
| ✅ | CBFdataset | 4 playing techniques (Chinese Bamboo Flute) | 10 performers | yes |
| ✅ | CCMixter | vocal track, background track | 50 mixes | yes |
| ✅ | ChoCo, the Chord Corpus | chords, keys, knowledge graph | 20K+ songs/pieces | no |
| ✅ | Chopin22 | aligned MIDI | 44 recordings | yes |
| ✅ | ChoralMusicSeparation | JSB chorales, separation | 8.2-hour-long choral music dataset from the JSB Chorales Dataset | yes |
| ✅ | Clotho | 5 descriptive captions | 4981 snippets | yes |
| ✅ | CMMSD | note/rest/transition, onsets, vibrato | 36 excerpts | no |
| ✅ | Coidach | 55 genres | 26420 songs | no |
| ✅ | corpusCOFLA | editorial, predominant melody | 1800 flamenco recordings | no |
| ✅ | covers80 | cover songs | 80 song pairs | yes |
| ✅ | Cross-Composer | 11 composers, piece, key, era, instrumentation | 1100 chromagrams and chord labels | no |
| ✅ | Cross-Era | composer, piece, key, era, instrumentation | 2000 chromagrams and chord labels | no |
| ✅ | Choral Singing Dataset | f0, MIDI | 48 recordings | yes |
| ✅ | Da-TACOS | cover songs | 25000 songs | no |
| ✅ | dadaGP dataset | guitarPro tablatures, encoder and decoder python tool to and from text and token format, symbolic music generation | a total of 26,181 songs in guitarPro/token format for symbolic music generation | no |
| ✅ | Dataset of synchronised Audio, LyrIcs and vocal notes | aligned notes and lyrics | 5358 songs | no |
| ✅ | DAMP | karaoke performances, aligned lyrics, pronunciation assessment | 34000 monophonic recordings | yes |
| ✅ | Dagstuhl ChoirSet | beats, time-aligned scores, F0 | 81 takes | yes |
| ✅ | DEAM - The MediaEval Database for Emotional Analysis of Music | valence & arousal | 1802 excerpts | yes |
| ✅ | DEAPDataset | valence & arousal, dominance, physiological data | 120 music video excerpts | no |
| ✅ | DESED | 10 audio event classes | pprox 20k 10s clips (unlabeled, weakly/strongly labeled) | yes |
| ✅ | DREANSS | onset times, percussion instruments | 18 excerpts | yes |
| ✅ | DrumPt | 4 playing techniques | app. 2000 annotations | yes (see ENST) |
| ✅ | DSD100 | multitrack recordings, stems for vocals, drums, bass and accompaniment | 100 songs | yes |
| ✅ | EMO-Soundscapes | arousal & valence | 1213 soundscape recordings | yes |
| ✅ | emoMusic | arousal & valence | 744 excerpts (45s) | yes |
| ✅ | Emotify | induced emotion | 400 excerpts | yes |
| ✅ | EMusic | arousal & valence | 100 excerpts (experimental music) | yes |
| ✅ | EnsembleSet | source separation, synthesized with Spitfire BBC Symphony Orchestra Professional Library, 20 different mix/microphone configurations | dataset presents 80 tracks (6+ hours) with a range of string, wind, and brass instruments arranged as chamber ensembles | yes |
| ✅ | ENST-Drums | onset times, perc. instruments, playing technique | 318 segments | yes |
| ✅ | Erkomaishvili Dataset | sheet music, structure, F0, note onsets | 118 tracks | yes |
| ✅ | Expanded Groove MIDI Dataset | drummer/session id, drum timing, kit name | 45537 midi/audio pairs | rendered |
| ✅ | Extendedballroom | 9 genres, tempo | 4000 excerpts (30s) | downloadable |
| ✅ | ExtraSensory | 51 context labels | 300000 sensor recordings from 60 users | yes |
| ☠ | ffuhrmann | 11 predom. instr. | 6951 excerpts from 220 songs | yes/no |
| ✅ | FifteenSongs | Grateful Dead | 15 grateful dead songs with leadsheets | yes |
| ✅ | Flamenco database | editorial, biographical, musicological information on flamenco, 1102 artists, 74 palos, 2860 albums | 13311 tracks | no |
| ✅ | FMA-full | 161 genres | 106574 songs | yes |
| ✅ | FMA-large | 161 genres | 106574 excerpts (30s) | yes |
| ✅ | FMA-medium | 16 genres | 25000 excerpts (30s) | yes |
| ✅ | FMA-small | 8 genres | 8000 excerpts (30s) | yes |
| ✅ | Freesound-Loop-Dataset | tempo, key, instrumentation, genre | 3000 annotated loops, 9455 loops total | yes |
| ✅ | FSD-Kaggle2019 | 80 tags | 29000 clips | yes |
| ✅ | Fugue Analyses | fugue structure, patterns, cadences | 36 fugues (Bach & Shostakovich) | no |
| ✅ | GiantStepsKey | key | 604 files | no |
| ✅ | GiantStepsTempo | tempo | 664 files | no |
| ✅ | GiantStepsTempo:alternate | tempo | 664 files | no |
| ✅ | Greek Music Dataset | genre, valence, arousal | 1400 songs | downloadable |
| ☠ | Gracenote Music Identification 2014 | timestamp, country | 110M music ID matches | no |
| ✅ | GoodSounds | 12 instruments, pitch, sound quality | 8750 notes | yes |
| ☠ | GPT | 7 guitar playing techniques | 6580 clips | yes |
| ✅ | Groove MIDI Dataset | drummer/session id, drum timing | 1150 MIDI recordings | rendered |
| ✅ | Guitar Solo Dataset | start/stop of guitar solos | 60 songs | no |
| ☠ | GTZAN | 10 genres, tempo labels, key labels (lerch), key labels (li), beat/downbeat, metrical levels | 1000 excerpts (30s) | yes |
| ✅ | GuitarSet | midi, pitch, beat, chords | 360 guitar excerpts (30s) with hexaphonic audio | yes |
| ✅ | GZ_IsoTech | Guzheng | 2824 | yes |
| ☠ | Hainsworth | tempo | 245 excerpts (60s) | yes |
| ✅ | HarmonixSet | beats, downbeats, structure | 912 pop songs | no |
| ✅ | HED | emotion annotations, harmonisation and tempo arrangements | 4000 tracks with emotion annotations | yes |
| ☠ | HHDS | multitrack, style, tempo | 18 songs | yes |
| ☠ | holzapfel:onset | onset times | 78 excerpts | yes |
| ✅ | homburg | 9 genres | 1889 excerpts (10s) | yes |
| ✅ | HookTheory | aligned melody and harmony annotations | 50 hours of aligned melody and harmony annotations | yes |
| ✅ | IADS | valence & arousal, dominance | 111 sound snippets | yes |
| ☠ | IDMT Multitrack | multitrack, style | 12 songs | yes |
| ✅ | IDMT-PIANO-MM | classical and jazz piano recordings | 432 piano recordings (around four hours) | yes |
| ✅ | IDMT-SMT-Audio-Effects | effects on bass and guitar notes | 55044 recordings | yes |
| ✅ | IDMT-SMT-Bass | bass performance styles | 4300 excerpts | yes |
| ✅ | IDMT-SMT-Bass-SINGLE-TRACK | style annotated bass lines | 17 bass lines (?) | yes |
| ✅ | IDMT-SMT-Drum | onset times, perc. instruments | 518 files | yes |
| ✅ | IDMT-SMT-Guitar | 9 guitar playing techniques | 4700+400 note events | yes |
| ☠ | iKala | singing voice tracks, background tracks | 252 excerpts (30s) | yes |
| ☠ | INRIA:EuroVision | structure | 124 songs | no |
| ☠ | INRIA:Quaero | structure | 159 songs | no |
| ✅ | IRMAS | 11 instruments | 2874 excerpts | yes |
| ☠ | ISMIR2004Genre | 6 genres | 729 excerpts (30s) | yes |
| ✅ | ISMIR2004Tempo | tempo | 465 excerpts (20s) | yes |
| ✅ | Jazz Audio-Aligned Harmony Dataset | structure, key, chords, beats | 113 songs | no |
| ✅ | jaCappella corpus, Japanese a cappella vocal ensemble corpus | musical sheet (MusicXML), 10 genres, singer ID | 50 songs (6 voices), audio recordings of each voice part and mixture | yes |
| ☠ | Jamendo-VAD | voice activity | 61+16+16 songs | yes |
| ✅ | JGDB | multitrack, MIDI | random generated excerpts | yes |
| ✅ | JKU-ScoFo | audio, MIDI | 16 recordings | yes |
| ✅ | Josquin La Rue Secure Duo Dataset | symbolic scores | 77 duos (Josquin & La Rue) | no |
| ✅ | Jordan:Classical | structure | 15 pieces | yes |
| ✅ | Jordan:Jazz | structure | 15 pieces | yes |
| ✅ | KUGDastgahi | dastgahi music | 213 solo recordings by four professional musicians | audio |
| ✅ | LabROSA:APT | MIDI | 29 piano excerpts | yes |
| ✅ | LabROSA:MIDI | audio, MIDI | 4 songs | yes |
| ✅ | last.fm-1K and last.fm-360K | user listening habits from last.fm | 992 users | no |
| ✅ | LFM-1b | listening habits | 120000 users | no |
| ✅ | Lyrical Influence Networks Dataset | lyrics-based artist and genre graphs | 42802 artists/214 genres | no |
| ☠ | Lakh MIDI Dataset | MIDI, tempo, key | 176581 MIDI files | no |
| ☠ | LMD - Latin | 10 genres | 3160 songs | no |
| ✅ | LocalifyMusicEvents-USA-2019 | music events, socioeconomic indicators | 308051 music events that took place in 2019 and from 1139 US cities | no |
| ✅ | Lyra | a dataset for Greek Traditional and Folk music that includes 1570 pieces | 1570 songs | yes |
| ✅ | M-DJCUE | cue points | 134 tracks | no |
| ✅ | MAESTRO | audio aligned midi, velocity, sustain | 172 hours of piano | yes |
| ✅ | magnatagatune | similarity, tags | 25863 excerpts (30s) | yes |
| ✅ | MAPS | piano notes/chords/pieces, tempo/key | 238 pieces | yes |
| ✅ | MARD | album reviews | 66566 songs | no |
| ✅ | MARG-AMT | MIDI pitch, onset/offset times | 30 melodies | yes |
| ✅ | MAST | vocal performance assessment | 1018 performances | no |
| ✅ | MAST-Rhythm | rhythm performance assessment | 3721 performances | yes |
| ✅ | McGill Billboard | chords | 740 songs | no |
| ✅ | MDBDrums | onset times, perc. instrument, playing technique | 23 excerpts | yes |
| ✅ | Medley-solos-DB: a cross-collection dataset for musical instrument recognition | 8 instruments | 21572 excerpts | yes |
| ✅ | MedleyDB | multitrack, genre, melody f0, instrument activation | 122 songs | yes |
| ✅ | Melon Playlist Dataset | 148826 playlists, 30 genres, 219 subgenres, 30652 playlist tags | mel-spectrograms for 649091 songs (20-50s segments) | no |
| ☠ | MeloSol | melody, monophonic, symbolic, kern, key | 783 melodies | no |
| ✅ | MER500 | emotion | 500 clips | yes |
| ✅ | MIR-1K | vocal tracks, background tracks | 1000 excerpts | yes |
| ✅ | mirex05Train | predominant pitch | 13 excerpts | yes |
| ✅ | mirex06Train | tempo, beats | 20 excerpts (30s) | yes |
| ✅ | Mid Level Perceptual Music Features | 7 perceptual features | 5000 audio files | yes |
| ✅ | Million Musical Tweets | listening behavior | 1086808 tweets | no |
| ✅ | Modal | onset times | 71 snippets | yes |
| ✅ | MOODetector:Bi-Modal | lyrics, valence & arousal | 133 excerpts | yes |
| ✅ | MOODetector:Multi-Modal | lyrics, MIDI, mood | 903 excerpts (30s) | yes |
| ☠ | moodswings | arousal & valence | 240 excerpts (30s) | no |
| ✅ | Mozart's String Quartets | sonata from structure, cadences | 32 movements | no |
| ✅ | Million Song Dataset | metadata, proprietary features | 1000000 songs | no |
| ✅ | Multimodal Sheet Music Dataset | piano notes/chords/pieces, synthetic audio, aligned MIDI, aligned sheet music images, OMR | 497 pieces | no |
| ✅ | The Meertens Tune Collections | phrases, key, meter | 18000 melodies | partially |
| ✅ | A Multimodal Dataset of Musical Themes for MIR Research | sheet music, symbolic encodings, audio snippets, symbolic-audio alignments, composer, work, recording, and theme characteristics | 2067 Themes | yes |
| ✅ | MTG-Jamendo | tags (genre, instruments, mood) | 55000 tracks | yes |
| ✅ | MTG-Query by Humming | title, artist | 118 queries/481 songs | yes/no |
| ✅ | MusAV | arousal & valence (relative annotations) | 2092 excerpts (30s) | yes |
| ✅ | musdb-XL | source separation | musdb-XL is an eXtremely Loud version of musdb-hq evaluation dataset | yes |
| ✅ | MUSDB18 | multitrack recordings, stems for vocals, drums, bass and accompaniment | 150 songs | yes |
| ✅ | MUSIC4ALL | tags, lyrics | 109,269 excerpts (30s) | on request |
| ✅ | musiclef2012 | tags | 1355 songs | no |
| ✅ | MusicMicro | music listening patterns | 136866 users | no |
| ☠ | MusicNet | pitch, onsets | 330 recordings | implicitly |
| ✅ | Multi-modal Dataset of Music Video | chords / keys (music feature), note density (music feature), loudness (music feature), semantic (video feature), motion (video feature), emotion (video feature), scene offset (video feature) | 748 music videos | on request |
| ✅ | NES-MDB | multi-track MIDI, aligned audio | 5000 songs | on request |
| ☠ | Nine Inch Nails Multitracks | multitrack | 66 songs | yes |
| ✅ | NMED-H - Naturalistic Music EEG Dataset Hindi | EEG | 24 trials x 16 excerpts (4.5min) | no |
| ✅ | Naturalistic Music EEG Dataset – Rhythm Pilot | EEG | 20 trials x 10 excerpts (4.5min) | no |
| ✅ | Naturalistic Music EEG Dataset - Tempo | EEG | 30 trials x 16 excerpts (30sec) | no |
| ✅ | NSynth | instrument, pitch | 305979 single notes | yes |
| ☠ | NUS-48E | aligned phonemes | 48 pairs of sung and spoken | yes |
| ✅ | ODB | onset times | 19 excerpts | yes |
| ✅ | Onset_Leveau | onset times | 21 excerpts | yes |
| ✅ | Open Broadcast Media Audio from TV | 6 classes for music presence | 1647 excerpts (60s) | yes |
| ✅ | OpenMIC-2018 | 20 instruments | 20000 excerpts (10s) | yes |
| ✅ | Orchset | predominant pitch | 64 excerpts | yes |
| ✅ | Piano Gestures Dataset | video, intentions, audio | 210 clips | yes |
| ✅ | Phenicx-Anechoic | multi-track audio (orchestral music), aligned MIDI | 4 pieces | yes |
| ✅ | Phonation | pitch, vowel, phonation mode | 900 monophonic snippets | yes |
| ✅ | PlaylistDataset | playlists | 75262 songs/2840553 transitions | no |
| ✅ | QBT-Extended | taps | 3365 queries/51 songs | MIDI |
| ✅ | QMUL:Beatles | structure, key, chords, beats | 181 songs | no |
| ✅ | QMUL:King | structure, key, chords | 14 songs | no |
| ✅ | QMUL:MichaelJackson | structure | 38 songs | no |
| ✅ | QMUL:MixEvaluation | multitrack, mixes | 18 songs/180 mixes | yes |
| ✅ | QMUL:Queen | structure, key, chords | 51/31 songs | no |
| ✅ | QMUL:RSS | structure | 60 songs | no |
| ✅ | QMUL:Zweieck | structure, key, chords, beats | 18 songs | no |
| ✅ | QUASI | multitrack | 11 songs | yes |
| ✅ | RobbieWilliamsAnnotations | chords, keys, beats | 65 songs | no |
| ☠ | RockCorpus | chords, melody, bars | 200 songs | no |
| ✅ | RWC | lyrics, 10 genre, 50 instruments, chords, structure, aligned MIDI | 115 songs/50 classical/100 songs | yes |
| ✅ | SALAMI | structure | 1447 songs | no |
| ✅ | SAMBASET | recording date, escolas, beats | 392 | no |
| ✅ | Sargon | structure | 4 songs | yes |
| ✅ | Semantic Artist Similarity | artist biographies, similarity | 268+2336 artists | no |
| ✅ | Schenker Anayses | MusicXML, Schenker analysis | 41 pieces | no |
| ✅ | SCP - EEG-Recorded Responses to Short Chord Progressions | EEG | 108/648 trials x 12 stimuli (5s) | yes |
| ✅ | Sample detection dataset | start of samples | 80 songs, 80 samples | no |
| ✅ | SEILS | scores in different symbolic formats | 30 madrigals | no |
| ☠ | Seyerlehner:1517-Artists | 19 genres | 3180 songs | yes |
| ☠ | Seyerlehner:Annotated | 19 genres | 190 songs | yes |
| ☠ | Seyerlehner:Pop | tempo | 1105 songs | yes |
| ☠ | Seyerlehner:Unique | 14 genres | 3115 excerpts (30s) | yes |
| ✅ | SHS100K | cover songs | ca. 10,000 songs with 100,000 tracks | no |
| ✅ | SISEC2013 | multitrack, mix | 5 excerpts | yes |
| ✅ | SLAKH | MIDI, synthesized audio (tracks + mix) | 2100 mixes | yes |
| ☠ | SMC:MIREX | tempo, beats | 217 excerpts | yes |
| ✅ | SMD | audio, aligned MIDI | 50 recordings | yes |
| ✅ | SongInterpretationDataset | lyrics | 27,834 songs (30 seconds each, recorded at 44.1 kHz) | yes |
| ☠ | SoundTracks | valence, energy, tension, mood | 360+110 excerpts | yes |
| ✅ | SPAM | structure | 50 songs | no |
| ✅ | Shazam Research Dataset Offsets | in-song query times | 188M queries over 20 songs | no |
| ✅ | Su-AMT | onset times, pitch | 10 excerpts | yes |
| ✅ | SUPRA-RW | piano roll performances | 478 performances | yes |
| ✅ | Schubert Winterreise Dataset (SWD) | lyrics, scores (image, symbolic, MIDI), audio, measures, chords, local keys, global keys, structure | 24 songs, 9 performances | yes |
| ✅ | SymbolicTextureMozartSonatas | symbolic music | 9 movements of Mozart Piano Sonatas totaling a set of 1164 annotated measures | no |
| ✅ | SymphonyMIDI | MIDI, symphonic | 46187 MIDI scores | no |
| ✅ | Texture in String Quartets | texture | 11 movements | no |
| ✅ | Traditional Flute Dataset | audio, aligned MIDI | 30 excerpts | yes |
| ✅ | ThisIsMyJam | favorite songs, artists | 131k users | no |
| ✅ | TinySOL, an audio dataset of isolated musical notes | instrument, pitch, dynamics, string number (if applicable) | 2913 isolated notes | yes |
| ✅ | TONAS | pitch | 72 single-voiced excerpts | yes |
| ✅ | Track Popularity | popularity rating | 23385 songs | no |
| ☠ | Tunebot | title, artist | 10000 queries/? songs | yes/no |
| ✅ | UIOWA:MIS | single instrument notes | many | yes |
| ☠ | UMA-Piano | piano chords | 275040 recordings | yes |
| ✅ | UnmixDB | DJ mix parameters | 37 playlists | yes |
| ✅ | URBAN-SED | 9 event classes | 10000 recordings | yes |
| ✅ | UrbanSound8k | 10 event classes | 8732 slices | yes |
| ☠ | Multi-modal Music Performance | score-aligned video and audio | 44 recordings | yes |
| ✅ | uspop2002 | tags, genre, chords | 8752 songs | no |
| ✅ | Violin Gestures Dataset | EMG, playing techniques, audio | 960 recordings | yes |
| ✅ | ViolinEtudesf0Estimation | f0 estimation for Violin Etudes | 27.8-hours violin performance | yes |
| ✅ | VocalSet | 17 vocal techniques | 3560 recordings | yes |
| ✅ | YM2413-MDB | retro video game symbolic music dataset with emotion annotations, ismir 2022 | 669 songs | no |
| ✅ | YousicianUkulele | evaluated notes and chords | 500000 exercises by 1000 users | no |