RDF Conversion Guidelines - DDMAL/linkedmusic-datalake GitHub Wiki

For database-specific documentation, see the documentation folder in the repo.

General guidelines

Subjects are always URIs from the database, never WD URIs or strings
No blank nodes/statements/point-in-time structures
No qualifiers
If a property points to another entity in the dataset, always point to it, never to the QID
If a property is backwards, that's fine so long as the subject is a URI
For data that has multiple values in multiple languages, specify the language when possible
When choosing what information to put in the graph, always put the most specific information, as you can query the rest
- This is Wikidata's philosophy; as a consequence, there may not be ideal property mappings when relying on less specific information
- E.g., If you have both the city and venue where a Session took place, then only store the venue, as you will be able to query the city from the venue. In this case, there are clear properties to match Session to venue (P276) and the venue to the city (P131), but there is no obvious property to match the Session to its city.

Methods to find properties

Use P2888 (exact match) when the entity itself is reconciled
Use rdfs:label for entity name/label
Use skos:altLabel for entity alternative names
Use SPARQL query on to entities that have that relation to verify
An LLM can help to double-check results
Use WD search to search properties
for large datasets, use the search API
- LLMs can be useful for rewording property names

Determining which property to use

Not everything is a WD property
Direction matters!
Use the most precise/specific property unless it is rarely used
Respect domain/range
For dates/coordinates/etc, use the same data types as WD

Step-by-step approach for RDF conversion

1. Understanding the Schema

When reconciling data from CSV files, we don't get a good view of the data schema. As such, before beginning RDF conversion, we need to understand the schema, notably paying attention to:

Different types of relationships (many-to-many, one-to-many, one-to-one).
Redundant or duplicated properties across the files.

2. Detecting Relationship Attributes

Some columns in a CSV file do not describe a single entity, but rather describe the relationship between two or more entities. These are relationship attributes.

As an example, a track can be on an album at a specific, and that position would be the relationship attribute.

These relationship attributes might need to be handled in a particular fashion when converting to RDF.

3. Namespaces and URIs

When converting reconciled Q-IDs to URIs, follow these rules for namespaces:

wd: (Entity prefix) must point to http://www.wikidata.org/entity/.
- ✅ wd:Q482994
- ❌ wd:https://www.wikidata.org/wiki/Q482994
wdt: (Direct property prefix) must point to http://www.wikidata.org/prop/direct/.
- ✅ wdt:P175
- ❌ wdt:https://www.wikidata.org/wiki/Property:P175

Do not mix HTTP vs. HTTPS or /entity/ vs. /wiki/ in these prefixes. Inconsistent URIs will not resolve in the Wikidata SPARQL endpoint.

4. Reconciling Properties (P Items)

Reconciling properties is the major part of the RDF conversion process. Here are some guidelines to make that easier and more consistent:

Compare labels and descriptions: Read the local property's definition, then find candidate Wikidata properties and compare their definitions.
Verify domain and range: Check the domain and range values on both your local property and the Wikidata candidate.
- A valid match should have local domain/range subset or exact match of the Wikidata domain/range.
Object vs Data properties: Verify whether your local property and the Wikidata property are object or data properties and be mindful of this when converting
- Object properties point to another entity/object
- Data properties store literals like a person's name
Select or create: If no suitable property exists, decide whether to:
- Add a new property to Wikidata, or
- Use an alternative (currently not done).

Some extra notes about rdfs:label

Some local properties essentially duplicate rdfs:label (e.g., "name," "title"). It is often simpler to map them directly to rdfs:label instead of a specialized property.
Benefit: LLM-driven SPARQL generation (LLM2SPARQL) often expects labels to be under rdfs:label.

5. Exact Matches

When an entity itself has been reconciled to Wikidata, use P2888 (exact match) to link it to Wikidata.

6. Advanced Modeling

Some sources (e.g., RISM) embed details in blank nodes when more data needs to be added to a relationship or property. Example:

@prefix dcterms: <http://purl.org/dc/terms/> .
@prefix ns1: <https://rism.online/api/v1#> .

<https://rism.online/sources/1000000001>
    a ns1:Source ;
    ns1:hasRelationship [
        dcterms:relation <https://rism.online/people/40005939> ;
        ns1:hasRole <http://id.loc.gov/vocabulary/relators/arr>
    ] .

ns1:hasRole is not a direct attribute of the source but modifies the relationship.
Decision point: Model ns1:hasRole as:
1. A standalone property on an intermediate node, or
2. ~~A nested structure that remains within the blank node.~~

We currently do not use blank nodes or any other advanced modelling.

7. Documentation and Audit Trail

Maintain clear documentation for all property mapping decisions, keep:

Property mappings: A table mapping local property names to Wikidata P-IDs.
Decision notes: For each manual or ambiguous reconciliation, record:
- The local value or structure.
- Candidate matches and rationale.
- Final choice.

The following are currently not done, we use exclusively Wikidata entities and properties

~~## 6. Leveraging Wikidata:WikiProject Music~~

Consult the Wikidata:WikiProject Music page for:

Commonly used properties in the music domain (e.g., recording, performer, label).
Best practices for modeling music-related data.
Examples of existing mappings that can serve as templates.

~~## 7. Reconciling with Other Schemas and Ontologies~~

When Wikidata lacks a suitable class or property, consider:

Schema.org: Widely adopted for web metadata.
Music Ontology (Music Ont.): Focused on music-specific concepts (e.g., mo:Record).
Dublin Core: General-purpose for creative works (dc:title, dc:creator).

Priority Order (highest to lowest):

Wikidata
Music Ontology
Schema.org
Dublin Core

If multiple schemas can represent the same concept, choose the one most common in your target ecosystem, or the one that best aligns with your query patterns.