RDF Conversion Guidelines - DDMAL/linkedmusic-datalake GitHub Wiki
For database-specific documentation, see the documentation folder in the repo.
General guidelines
- Subjects are always URIs from the database, never WD URIs or strings
- No blank nodes/statements/point-in-time structures
- No qualifiers
- If a property points to another entity in the dataset, always point to it, never to the QID
- If a property is backwards, that's fine so long as the subject is a URI
- For data that has multiple values in multiple languages, specify the language when possible
Methods to find properties
- Use P2888 (exact match) when the entity itself is reconciled
- Use rdfs:label for entity name/label
- Use SPARQL query on to entities that have that relation to verify
- An LLM can help to double-check results
- Use WD search to search properties
- for large datasets, use the search API
- LLMs can be useful for rewording property names
Determining which property to use
- Not everything is a WD property
- Direction matters!
- Use the most precise/specific property unless it is rarely used
- Respect domain/range
- For dates/coordinates/etc, use the same data types as WD
Step-by-step approach for RDF conversion
1. Understanding the Schema
When reconciling data from CSV files, we don't get a good view of the data schema. As such, before beginning RDF conversion, we need to understand the schema, notably paying attention to:
- Different types of relationships (many-to-many, one-to-many, one-to-one).
- Redundant or duplicated properties across the files.
2. Detecting Relationship Attributes
Some columns in a CSV file do not describe a single entity, but rather describe the relationship between two or more entities. These are relationship attributes.
As an example, a track can be on an album at a specific, and that position would be the relationship attribute.
These relationship attributes might need to be handled in a particular fashion when converting to RDF.
3. Namespaces and URIs
When converting reconciled Q-IDs to URIs, follow these rules for namespaces:
-
wd:
(Entity prefix) must point tohttp://www.wikidata.org/entity/
.- ✅
wd:Q482994
- ❌
wd:https://www.wikidata.org/wiki/Q482994
- ✅
-
wdt:
(Direct property prefix) must point tohttp://www.wikidata.org/prop/direct/
.- ✅
wdt:P175
- ❌
wdt:https://www.wikidata.org/wiki/Property:P175
- ✅
Do not mix HTTP vs. HTTPS or /entity/
vs. /wiki/
in these prefixes. Inconsistent URIs will not resolve in the Wikidata SPARQL endpoint.
4. Reconciling Properties (P Items)
Reconciling properties is the major part of the RDF conversion process. Here are some guidelines to make that easier and more consistent:
- Compare labels and descriptions: Read the local property's definition, then find candidate Wikidata properties and compare their definitions.
- Verify domain and range: Check the domain and range values on both your local property and the Wikidata candidate.
- A valid match should have local domain/range subset or exact match of the Wikidata domain/range.
- Object vs Data properties: Verify whether your local property and the Wikidata property are object or data properties and be mindful of this when converting
- Object properties point to another entity/object
- Data properties store literals like a person's name
- Select or create: If no suitable property exists, decide whether to:
- Add a new property to Wikidata, or
- Use an alternative (currently not done).
Some extra notes about rdfs:label
- Some local properties essentially duplicate
rdfs:label
(e.g., "name," "title"). It is often simpler to map them directly tordfs:label
instead of a specialized property. - Benefit: LLM-driven SPARQL generation (LLM2SPARQL) often expects labels to be under
rdfs:label
.
5. Exact Matches
When an entity itself has been reconciled to Wikidata, use P2888 (exact match) to link it to Wikidata.
6. Advanced Modeling
Some sources (e.g., RISM) embed details in blank nodes when more data needs to be added to a relationship or property. Example:
@prefix dcterms: <http://purl.org/dc/terms/> .
@prefix ns1: <https://rism.online/api/v1#> .
<https://rism.online/sources/1000000001>
a ns1:Source ;
ns1:hasRelationship [
dcterms:relation <https://rism.online/people/40005939> ;
ns1:hasRole <http://id.loc.gov/vocabulary/relators/arr>
] .
ns1:hasRole
is not a direct attribute of the source but modifies the relationship.- Decision point: Model
ns1:hasRole
as:- A standalone property on an intermediate node, or
A nested structure that remains within the blank node.
We currently do not use blank nodes or any other advanced modelling.
7. Documentation and Audit Trail
Maintain clear documentation for all property mapping decisions, keep:
- Property mappings: A table mapping local property names to Wikidata P-IDs.
- Decision notes: For each manual or ambiguous reconciliation, record:
- The local value or structure.
- Candidate matches and rationale.
- Final choice.
The following are currently not done, we use exclusively Wikidata entities and properties
## 6. Leveraging Wikidata:WikiProject Music
Consult the Wikidata:WikiProject Music page for:
- Commonly used properties in the music domain (e.g., recording, performer, label).
- Best practices for modeling music-related data.
- Examples of existing mappings that can serve as templates.
## 7. Reconciling with Other Schemas and Ontologies
When Wikidata lacks a suitable class or property, consider:
- Schema.org: Widely adopted for web metadata.
- Music Ontology (Music Ont.): Focused on music-specific concepts (e.g.,
mo:Record
). - Dublin Core: General-purpose for creative works (
dc:title
,dc:creator
).
Priority Order (highest to lowest):
- Wikidata
- Music Ontology
- Schema.org
- Dublin Core
If multiple schemas can represent the same concept, choose the one most common in your target ecosystem, or the one that best aligns with your query patterns.