Dataset Million Song - Rostlab/DM_CS_WS_2016-17 GitHub Wiki

Dataset Million Song

  • Proposer: `@brosequartz' - [email protected] - Bibigul Shektybayeva
  • Votes:
    1. 🙋 @ImkeHelene
    2. @magiob
    3. @avradips
    4. 🙋 @paulafortuna
    5. @kapoorabhishek24
    6. 🙋 @vivek-sethia

Summary

The Million Song Dataset contains audio features and metadata for a million popular songs. This dataset is proposed to be used for automatic genre prediction project.

Prediction Goals

Main goal:

  • Automatic genre prediction

Other interesting goals

  • Classification by genre or mood
  • Examination of songs' similarity

Long Description

  • Size: 280 GB (subset of 1.8 GB available with 10k songs)
  • Format: HDF5, code available to transform into mat-file

The dataset was collected mainly from The Echo Nest API, that provides metadata and audio analysis for music tracks. The dataset does not include audio files, only the derived features. There are 55 features describing information about the artist, song and release. The features are of different types, some examples:

  • Artist 7digitalid, type: int
  • Artist latitude, type: float
  • Artist musicbrainz.com tags, type: array string
  • Artist name, type: string
  • Beats confidence, type: array float
  • Danceability , type: float
  • Duration, type: float
  • Loudness, type: float
  • Album name: type: string
  • Segments pitches, type: 2D array float
  • Segments timbre, type: 2D array float
  • Song echonest.com id, type: string
  • Title, type: string
  • Year, type: int

Full list and description of features is available at: http://labrosa.ee.columbia.edu/millionsong/pages/field-list.

Links / Data / Other

References:

  • Thierry Bertin-Mahieux, Daniel P.W. Ellis, Brian Whitman, and Paul Lamere. The Million Song Dataset. In Proceedings of the 12th International Society for Music Information Retrieval Conference (ISMIR 2011), 2011.