Project Status - DDMAL/linkedmusic-datalake GitHub Wiki
Completed Work
A visualization of the current complete LinkedMusic data lake ontology can be found here.
As of 20 August 2025, we have finished ingesting the following databases to the LinkedMusic data lake, a total of 383,809,033 RDF triples:
Note: As our workflow becomes more streamlined, these databases would benefit from additional passes of the full ingestion pipeline
-
MusicBrainz - an open music encyclopedia that provides extensive music metadata and serves as a universal reference for music identification
- 355,651,218 RDF triples
- visualization
- documentation
-
DIAMM - an archive of digital images of European medieval manuscripts
- 483,631 RDF triples
- visualization
- documentation
-
The Session - a community website dedicated to Irish traditional music
- 1,052,162 RDF triples
- visualization
- documentation
-
The Global Jukebox - a website focusing on traditional folk, indigenous, and popular songs from around the world
- 110,044 RDF triples
- visualization
- documentation
-
Dig That Lick 1000 - a project the extracts and analyses solos from jazz performances
- 13,756 RDF triples
- visualization
- documentation
-
CantusDB - a repository of Latin chants found in medieval manuscripts and early printed books
- 3,570,624 RDF triples
- visualization
- documentation
-
RISM - an international collaborative database that catalogues historical musical sources
- 22,927,598 RDF triples
- visualization
- documentation
Datasets in-progress
-
SIMSSA Database - a discovery tool for symbolic music files (MEI, Kern, MusicXML, MIDI)
-
Cantus Index - a catalogue of liturgical chant texts and melodies
-
AcousticBrainz - a collection of acoustic information from music recordings between 2015 and 2022, providing insights into spectral data, genres, moods, keys, and scales.
Datasets to ingest in the near future
- Weimar Jazz Database
- CritiqueBrainz
- ListenBrainz
- Cover Art Archive
- Digital Analysis of Chant Transmission
- Printed Sacred Music Database
- ESEA (East-and-Southeast-Asian) & Chinese (Traditional) Music Instrument - located in the store
Additional tasks
- Develop the NLQ2SPARQL tool (SESEMMI)
- NLQ2SPARQL Context contains the currently used context when testing LLM SPARQL query generation
- Refer to Sample LinkedMusic Queries for queries across our four benchmark challenges used in testing
- Refer to NLQ2SPARQL and NLQ2SPARQL Q&A (Unfinished) for previous work
- Upload new entities and properties to Wikidata
- Refer to Wikidata Uploading (Feast Day Project) for our current approach to uploading Wikidata, starting with Feasts
- Refer to Wikidata: Things we should add for other categories of items that would be useful for our purposes if they were on Wikidata
Note: Dataset-specific documentation may have a partial list of specific items missing from Wikidata
- Develop the front end for LinkedMusic