Data - mbabbott/cse517a_mbabbott-ccarlos_application_project GitHub Wiki
Dataset
Source
The dataset is a compilation of MIDI sequences of songs in the key of G major or the key of D major. Most of the sequences are midi sequences taken from various composers on musescore.com. The rest are pieces composed by J.S. Bach, downloaded from http://jsbach.net/midi/. We chose G and D major because a large portion of the work available to us in MIDI form is in one of those keys.
We treated the data as bag-of-words collections of musical notes. Each note in a given MIDI sequence is a single 'word' and the number of times the note appears is its frequency. We stored the data in .csv files, with each column representing a piece. The first 12 rows correspond to a musical note, starting at the note C and stepping up a semitone row by row. The number in a given cell represents the percentage of a piece that a given note takes up. The 13th row corresponds to the actual key of the piece, with 1 representing the key of G major and 0 representing the key of D major. We manually downloaded all the pieces, separating them by key. We calculated the note ratios using our python function midi2bagofratio.
Statistics
197 pieces are in the key of D Major
in our dataset:
C represents 1.75% of notes in D major pieces
C# represents 9.43% of notes in D major pieces
D represents 17.9% of notes in D major pieces
D# represents 1.16% of notes in D major pieces
E represents 13.1% of notes in D major pieces
F represents 1.27% of notes in D major pieces
F# represents 13.7% of notes in D major pieces
G represents 9.77% of notes in D major pieces
G# represents 2.35% of notes in D major pieces
A represents 17.9% of notes in D major pieces
A# represents 1.09% of notes in D major pieces
B represents 10.6% of notes in D major pieces
143 pieces are in the key of G Major
in our dataset:
C represents 9.52% of notes in G major pieces
C# represents 3.32% of notes in G major pieces
D represents 15.9% of notes in G major pieces
D# represents 2.21% of notes in G major pieces
E represents 11.3% of notes in G major pieces
F represents 1.95% of notes in G major pieces
F# represents 10.3% of notes in G major pieces
G represents 16.7% of notes in G major pieces
G# represents 1.49% of notes in G major pieces
A represents 12.9% of notes in G major pieces
A# represents 1.48% of notes in G major pieces
B represents 12.8% of notes in G major pieces