Blog Using Double Optional to Select Preferred Value in SPARQL - statnett/Talk2PowerSystem GitHub Wiki

Date: 2025-10-29. Author: Vladimir Alexiev

Table of Contents

Intro
Get Lang String by Preference
DCAT: Get Dataset Description or Title
CimSparql: get TopologicalNode from ConnectivityNode or Terminal

Intro

A common SPARQL need is to fetch one of several possible values in some preference order.

The "most natural" way to approach such a query is to try the different alternatives and select into different variables, then combine them with COALESCE. I've seen people struggle with this (and I've been guilty myself) by using complex constructs like UNION, FILTER NOT EXISTS, FILTER(!BOUND), COALESCE. It is even worse to use GROUP BY and some aggregation function like SAMPLE, MAX, MAX(LENGTH). Using these constructs complicates the query and makes it more brittle in the face of multiple values or other data problems/deformities. If you are not extra careful, it's easy to fall into the "Cartesian Explosion" trap, i.e. produce a resultset that's the Cartesian Product of multivalued variables.

A simpler way is to use a sequence of OPTIONAL clauses that all select into the same variable. This is a beautiful trick that makes sense when you grok it:

If the first OPTIONAL succeeds then it will bind the variable
Which will preclude the second OPTIONAL from selecting another value for the variable
- If it happens that the same value satisfies both OPTIONAL branches, then it doesn't matter which one (or both) succeed
And so on for all subsequent branches

Below I'll give several examples of real-world queries that can be optimized using this "Double OPTIONAL" trick.

The only "gotcha" is if the query optimizer decides to reorder OPTIONAL clauses: but I don't know of any optimizer to do that, at least if you put more complicated clauses later in the query.

Get Lang String by Preference

A resource has multilingual labels/descriptions. Pick one to display, using some preference list of languages.

For example, Wikidata resources usually have labels and descriptions in many languages. The Wikidata SPARQL endpoint has the wikibase:label service for this. Typically you use it like this (there's a template with this construct):

SELECT ?item ?itemLabel WHERE {
  ?item <some-criteria>
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],bg,en". }
}

Here [AUTO_LANGUAGE] is a user browser preference (Accept-Language header) and bg,en is a fixed list of preferred languages.

Another example is GraphDB's GraphQL endpoint. See Using language specifications in your GraphQL queries for details. E.g. the above language spec would be written like lang: BROWSER,bg,en.

Let's take an example from a Wikidata Mapping of Organizations & Persons that extracts a subset if WD fields for a Company Graph project:

where {{[a schema:Dataset; schema:about ?it; schema:dateModified ?modified]
  bind(strafter(str(?it),str(wd:))as ?WD)
  bind(concat(str(WD:),?WD) as ?_item)
  bind(uri(?_item) as?item)

  optional{?it rdfs:label ?prefName_en. filter(lang(?prefName_en)="en")}
  optional{?it rdfs:label ?prefName_de. filter(lang(?prefName_de)="de")}
  optional{?it rdfs:label ?prefName_fr. filter(lang(?prefName_fr)="fr")}
  optional{?it rdfs:label ?prefName_es. filter(lang(?prefName_es)="es")}
  optional{?it rdfs:label ?prefName_it. filter(lang(?prefName_it)="it")}
  optional{?it rdfs:label ?prefName_zh. filter(lang(?prefName_zh)="zh")}
  bind(coalesce(?prefName_en,?prefName_de,?prefName_fr,?prefName_es,?prefName_it,?prefName_zh) as ?prefName)

  optional{?it schema:description ?descr_en. filter(lang(?descr_en)="en")}
  optional{?it schema:description ?descr_de. filter(lang(?descr_de)="de")}
  optional{?it schema:description ?descr_fr. filter(lang(?descr_fr)="fr")}
  optional{?it schema:description ?descr_es. filter(lang(?descr_es)="es")}
  optional{?it schema:description ?descr_it. filter(lang(?descr_it)="it")}
  optional{?it schema:description ?descr_zh. filter(lang(?descr_zh)="zh")}
  bind(coalesce(?descr_en,?descr_de,?descr_fr,?descr_es,?descr_it,?descr_zh) as ?descr)

Such fragment occurs many times in the query: not only when fetching a name and description of a company, but also the label of its industry, company type, etc. If you decide to handle more than the 6 languages listed above, you need to edit all these places and add extra clauses, and add to the coalesce. (Please note that query is generated from a literate representation using m4 macros: see the Label Languages section and the macros LANG_ALL, LANG_ONE.)

But the query can be simplified significantly if we use the same variable and get rid of the coalesce:

  optional{?it rdfs:label ?prefName. filter(lang(?prefName)="en")}
  optional{?it rdfs:label ?prefName. filter(lang(?prefName)="de")}
  optional{?it rdfs:label ?prefName. filter(lang(?prefName)="fr")}
  optional{?it rdfs:label ?prefName. filter(lang(?prefName)="es")}
  optional{?it rdfs:label ?prefName. filter(lang(?prefName)="it")}
  optional{?it rdfs:label ?prefName. filter(lang(?prefName)="zh")}

  optional{?it schema:description ?descr. filter(lang(?descr)="en")}
  optional{?it schema:description ?descr. filter(lang(?descr)="de")}
  optional{?it schema:description ?descr. filter(lang(?descr)="fr")}
  optional{?it schema:description ?descr. filter(lang(?descr)="es")}
  optional{?it schema:description ?descr. filter(lang(?descr)="it")}
  optional{?it schema:description ?descr. filter(lang(?descr)="zh")}

DCAT: Get Dataset Description or Title

The Talk2PowerSystem chatbot now shows a Components page. That lists all agent settings, ontologies, datasets, GraphDB, chatbot backend and frontend components, with their identification, version, date, dependencies, etc.

As I write this, the bot:

Has version 1.2.0-rc4 of 2025-10-27T15:24:19Z with git SHA 8cb067d72eedfdbf0cf29b14d8be33f613ce827b
Runs on Python 3.12.11 (main, Sep 30 2025, 00:38:52) built with GCC 14.2.0
Runs on GraphDB version 11.1.1+sha.82602bfa

It shows the 25 or so ontologies in the database, and the 43 datasets.

For example, one of the datasets is Equipment (EQ) part of the Nordic 44-bus synthetic test model developed by Statnett SF of the Nordic region dated 2025-02-14
Clicking on any ontology or datasets shows you all its metadata in GraphDB

The reason we show all this info (and record it for every evaluation campaign) is so that we can we can report the precise components that were used for a particular evaluation.

Focusing on Datasets:

All but one have dct:description (the exception is Telemark-120_AO, which stands for Aviation Obstacles)
All have descriptions in en or as plain string, and a third have bilingual descriptions in en, no
Half have dct:title
Title (e.g. DIGIN10-30-MV1_AS) is always shorter and less telling than Description (e.g. DIGIN10 Medium Voltage 1 (MV1) Asset Model)
A couple of datasets have two different titles and descriptions due to a bug where two dataset URNs (mRID) are collapsed: two different datasets use the same URN.

Given all these complications, this query is the best way to fetch an appropriate name:

select * {
    ?x a dcat:Dataset
    optional {?x dct:description ?name filter(lang(?name)!="no")}
    optional {?x dct:title ?name}
} order by ?name

Notice how we use the same variable ?name. If you tried to select two vars ?title, ?descr, you'll be bit by Cartesian Explosion due to the collapsed URNs, i.e. you'll get 2*2=4 results for the same Dataset URN.

CimSparql: get TopologicalNode from ConnectivityNode or Terminal

branch_node_withdraw.sparql is a complex real-world CIM SPARQL query that answers the question:

Which connected terminals of AC line segments, series compensators, and power transformers within a given region are associated with which topological nodes, and what are their instantaneous active/reactive power flows (p, q) and connectivity-node mRIDs, considering only equipment enabled for network analysis?

A rationalized version of this query is posted in issue cimsparql#371. Here we show only one part of that rationalization.

This part selects ?topological_node from one of two variables that are bound earlier: ?con_node or ?terminal:

  {
    ?con_node cim:ConnectivityNode.TopologicalNode ?topological_node .
  }
  UNION {
    FILTER NOT EXISTS {
      ?con_node cim:ConnectivityNode.TopologicalNode ?topological_node
    }
    ?terminal cim:Terminal.TopologicalNode ?topological_node .
  }

We can simplify it to this:

  optional {?con_node cim:ConnectivityNode.TopologicalNode ?topological_node}
  optional {?terminal cim:Terminal.TopologicalNode ?topological_node}

We don't need to use FILTER NOT EXISTS nor FILTER(!BOUND) because if ?topological_node is bound by the first OPTIONAL, the second OPTIONAL cannot bind it to any other value.