Dataset Million Song - Rostlab/DM_CS_WS_2016-17 GitHub Wiki
Dataset Million Song
- Proposer: `@brosequartz' - [email protected] - Bibigul Shektybayeva
- Votes:
- 🙋 @ImkeHelene
- @magiob
- @avradips
- 🙋 @paulafortuna
- @kapoorabhishek24
- 🙋 @vivek-sethia
Summary
The Million Song Dataset contains audio features and metadata for a million popular songs. This dataset is proposed to be used for automatic genre prediction project.
Prediction Goals
Main goal:
- Automatic genre prediction
Other interesting goals
- Classification by genre or mood
- Examination of songs' similarity
Long Description
- Size: 280 GB (subset of 1.8 GB available with 10k songs)
- Format: HDF5, code available to transform into mat-file
The dataset was collected mainly from The Echo Nest API, that provides metadata and audio analysis for music tracks. The dataset does not include audio files, only the derived features. There are 55 features describing information about the artist, song and release. The features are of different types, some examples:
- Artist 7digitalid, type: int
- Artist latitude, type: float
- Artist musicbrainz.com tags, type: array string
- Artist name, type: string
- Beats confidence, type: array float
- Danceability , type: float
- Duration, type: float
- Loudness, type: float
- Album name: type: string
- Segments pitches, type: 2D array float
- Segments timbre, type: 2D array float
- Song echonest.com id, type: string
- Title, type: string
- Year, type: int
Full list and description of features is available at: http://labrosa.ee.columbia.edu/millionsong/pages/field-list.
Links / Data / Other
References:
- Thierry Bertin-Mahieux, Daniel P.W. Ellis, Brian Whitman, and Paul Lamere. The Million Song Dataset. In Proceedings of the 12th International Society for Music Information Retrieval Conference (ISMIR 2011), 2011.