Dataset Million Song - Rostlab/DM_CS_WS_2016-17 GitHub Wiki

Dataset Million Song

Proposer: `@brosequartz' - [email protected] - Bibigul Shektybayeva
Votes:
1. 🙋 @ImkeHelene
2. @magiob
3. @avradips
4. 🙋 @paulafortuna
5. @kapoorabhishek24
6. 🙋 @vivek-sethia

Summary

The Million Song Dataset contains audio features and metadata for a million popular songs. This dataset is proposed to be used for automatic genre prediction project.

Prediction Goals

Main goal:

Automatic genre prediction

Other interesting goals

Classification by genre or mood
Examination of songs' similarity

Long Description

Size: 280 GB (subset of 1.8 GB available with 10k songs)
Format: HDF5, code available to transform into mat-file

The dataset was collected mainly from The Echo Nest API, that provides metadata and audio analysis for music tracks. The dataset does not include audio files, only the derived features. There are 55 features describing information about the artist, song and release. The features are of different types, some examples:

Artist 7digitalid, type: int
Artist latitude, type: float
Artist musicbrainz.com tags, type: array string
Artist name, type: string
Beats confidence, type: array float
Danceability , type: float
Duration, type: float
Loudness, type: float
Album name: type: string
Segments pitches, type: 2D array float
Segments timbre, type: 2D array float
Song echonest.com id, type: string
Title, type: string
Year, type: int

Full list and description of features is available at: http://labrosa.ee.columbia.edu/millionsong/pages/field-list.

Links / Data / Other

http://labrosa.ee.columbia.edu/millionsong/

References:

Thierry Bertin-Mahieux, Daniel P.W. Ellis, Brian Whitman, and Paul Lamere. The Million Song Dataset. In Proceedings of the 12th International Society for Music Information Retrieval Conference (ISMIR 2011), 2011.