Database - kristinedomingo/cs373-idb GitHub Wiki

Database

Scraping the Data

Since there exists quite literally hundreds of thousands of data points (of albums, artists, and tracks), it would have been impractical to scrape all of the possible data that Spotify's Web API could return. Instead, SweetMusic's database contains a certain number of rows per table.

Since each table has a one-to-many or many-to-many relationship with each other, each row of a table links to one or more rows, of one or more tables. Each Artist has a "top Track" and a "most recently released Album". Each album has multiple Tracks, and each of those Tracks have one or more Artists, which in turn each have a "top Track" and a "most recently release Album", etc. This posed a problem - when first running scraper.py, the script would take an extremely long time to run (so long, that it was more practical to just kill the process). So, the scraper simply just stops scraping data when it hits a certain number of Artists/Albums/Tracks.

To actually scrape the data, scraper.py begins by being passed a few Spotify Artist unique IDs. All of these IDs are added to a queue of Artists. Then, as the scraper goes through the Artists queue, for each of those Artists, the "top Track" ID and "most recent Album" ID are added to separate queues. The scraper then goes through both of those queues after the Artists queue is empty, queueing up IDs in their respective queues. The scraper currently cuts off at 350 Tracks, or 250 Albums, or 200 Artists.

Initializing

Flask-SqlAlchemy is use. The database name used in mySQL is mydb. In order to implement the data it had to be scraped from the Spotify API and modify it in order to give us the proper data. The scraper.py gathers the 3 created album_json, artist_json, and tracks_json. From there specific json are from grabed. From there initializing the data functions create_albums(), create_tracks(), create_artist() that takes in the specific json which are Album, Tracks or Artist. Each function adds a row to the table with there specific attributes. Lastly the scraper is set off with a default limit of 350 tracks. All the missing tracks have to be added via getting them from the albums_json and artist_json using create_tracks_artist() and create_tracks_album(). All of this happens in initializing.py when the create_sweetmusic_db() is called.

##Relationships There is a one to many relationship in the database between Track and Album table. In order to relate them a foreign key album_id inside the Track table was created.(Note Album table has to be fill in first in order to get the put a Foreign key inside the Track table) There are 2 many to many relationships. First one is Artist and Album tables in order to relate them. A table name artists had to be created. It grabs the id from the Artist and connects it to the Album id.The second many to many relationship is artist and tracks a table name artists2 which connects the artist id with the respective tracks id. (Note Artist has to be fill in with data first in order to relate the Artist with Track or Album..)