RDF Conversion Guidelines - DDMAL/linkedmusic-datalake GitHub Wiki

For database-specific documentation, see the documentation folder in the repo.

General guidelines

  • Subjects are always URIs from the database, never WD URIs or strings
  • No blank nodes/statements/point-in-time structures
  • No qualifiers
  • If a property points to another entity in the dataset, always point to it, never to the QID
  • If a property is backwards, that's fine so long as the subject is a URI
  • For data that has multiple values in multiple languages, specify the language when possible

Methods to find properties

  • Use P2888 (exact match) when the entity itself is reconciled
  • Use rdfs:label for entity name/label
  • Use SPARQL query on to entities that have that relation to verify
  • An LLM can help to double-check results
  • Use WD search to search properties
  • for large datasets, use the search API
    • LLMs can be useful for rewording property names

Determining which property to use

  • Not everything is a WD property
  • Direction matters!
  • Use the most precise/specific property unless it is rarely used
  • Respect domain/range
  • For dates/coordinates/etc, use the same data types as WD

Step-by-step approach for RDF conversion

1. Understanding the Schema

When reconciling data from CSV files, we don't get a good view of the data schema. As such, before beginning RDF conversion, we need to understand the schema, notably paying attention to:

  1. Different types of relationships (many-to-many, one-to-many, one-to-one).
  2. Redundant or duplicated properties across the files.

2. Detecting Relationship Attributes

Some columns in a CSV file do not describe a single entity, but rather describe the relationship between two or more entities. These are relationship attributes.

As an example, a track can be on an album at a specific, and that position would be the relationship attribute.

These relationship attributes might need to be handled in a particular fashion when converting to RDF.

3. Namespaces and URIs

When converting reconciled Q-IDs to URIs, follow these rules for namespaces:

  • wd: (Entity prefix) must point to http://www.wikidata.org/entity/.

    • wd:Q482994
    • wd:https://www.wikidata.org/wiki/Q482994
  • wdt: (Direct property prefix) must point to http://www.wikidata.org/prop/direct/.

    • wdt:P175
    • wdt:https://www.wikidata.org/wiki/Property:P175

Do not mix HTTP vs. HTTPS or /entity/ vs. /wiki/ in these prefixes. Inconsistent URIs will not resolve in the Wikidata SPARQL endpoint.

4. Reconciling Properties (P Items)

Reconciling properties is the major part of the RDF conversion process. Here are some guidelines to make that easier and more consistent:

  1. Compare labels and descriptions: Read the local property's definition, then find candidate Wikidata properties and compare their definitions.
  2. Verify domain and range: Check the domain and range values on both your local property and the Wikidata candidate.
    • A valid match should have local domain/range subset or exact match of the Wikidata domain/range.
  3. Object vs Data properties: Verify whether your local property and the Wikidata property are object or data properties and be mindful of this when converting
    • Object properties point to another entity/object
    • Data properties store literals like a person's name
  4. Select or create: If no suitable property exists, decide whether to:
    • Add a new property to Wikidata, or
    • Use an alternative (currently not done).

Some extra notes about rdfs:label

  • Some local properties essentially duplicate rdfs:label (e.g., "name," "title"). It is often simpler to map them directly to rdfs:label instead of a specialized property.
  • Benefit: LLM-driven SPARQL generation (LLM2SPARQL) often expects labels to be under rdfs:label.

5. Exact Matches

When an entity itself has been reconciled to Wikidata, use P2888 (exact match) to link it to Wikidata.

6. Advanced Modeling

Some sources (e.g., RISM) embed details in blank nodes when more data needs to be added to a relationship or property. Example:

@prefix dcterms: <http://purl.org/dc/terms/> .
@prefix ns1: <https://rism.online/api/v1#> .

<https://rism.online/sources/1000000001>
    a ns1:Source ;
    ns1:hasRelationship [
        dcterms:relation <https://rism.online/people/40005939> ;
        ns1:hasRole <http://id.loc.gov/vocabulary/relators/arr>
    ] .
  • ns1:hasRole is not a direct attribute of the source but modifies the relationship.
  • Decision point: Model ns1:hasRole as:
    1. A standalone property on an intermediate node, or
    2. A nested structure that remains within the blank node.

We currently do not use blank nodes or any other advanced modelling.

7. Documentation and Audit Trail

Maintain clear documentation for all property mapping decisions, keep:

  1. Property mappings: A table mapping local property names to Wikidata P-IDs.
  2. Decision notes: For each manual or ambiguous reconciliation, record:
    • The local value or structure.
    • Candidate matches and rationale.
    • Final choice.

The following are currently not done, we use exclusively Wikidata entities and properties

## 6. Leveraging Wikidata:WikiProject Music

Consult the Wikidata:WikiProject Music page for:

  1. Commonly used properties in the music domain (e.g., recording, performer, label).
  2. Best practices for modeling music-related data.
  3. Examples of existing mappings that can serve as templates.

## 7. Reconciling with Other Schemas and Ontologies

When Wikidata lacks a suitable class or property, consider:

  1. Schema.org: Widely adopted for web metadata.
  2. Music Ontology (Music Ont.): Focused on music-specific concepts (e.g., mo:Record).
  3. Dublin Core: General-purpose for creative works (dc:title, dc:creator).

Priority Order (highest to lowest):

  1. Wikidata
  2. Music Ontology
  3. Schema.org
  4. Dublin Core

If multiple schemas can represent the same concept, choose the one most common in your target ecosystem, or the one that best aligns with your query patterns.