Spotify Subset

January 10, 2024 · View on GitHub

The 'Spotify Subset' includes file names from the Spotify Dataset (Tanaka et al. (2022)) for classifying language variations in Brazilian Portuguese. The selection of file names resulted from applying a filter to the original dataset metadata, focusing on idiomatic expressions and names or acronyms of locations.

Spotify A subset

General Table

SpeakersDurationEpisodesFemaleMale
92~15hrs 24 min524338

Subset A Information

AccentSpeakerDurationFemaleMale
Rio de Janeiro549 min23
Bahia41hr 27 min4
Mato Grosso do Sul418 min31
Maranhão71hr 18 min23
Minas Gerais~355hrs 23 min~13~22
Recife103hrs 45 min
São Paulo~251hr 18 min~19~7
Rio Grande do Sul2~53 min2

Spotify B subset

General Table

AccentTrain_speakersDev_speakersTest_speakersPodcastsEpisodesDurationsegments
RE6923111557~48.2314,008
SP5218151178~30.8811,906