Blog Using Semantic Reasoning to Help LLM with SPARQL Generation in Electrical CIM - statnett/Talk2PowerSystem GitHub Wiki

Date: 2025-07-03. Author: Vladimir Alexiev

Table of Contents

Intro

Electrical CIM is a complex ontology that can represent the complete electrical enterprise, including generation, transmission, distribution, client operations, asset management, market info and operations. One of the basic ontology modules (profiles) is about Equipment. It models Equipment (conducting and non-conducting) and Equipment Containers such as

  • Electrical transformation Substations being buildings that hold subcontainers and ultimately Equipments such as PowerTransformers, Switches (Breakers and Disconnectors), Busbars, etc
  • Bays and VoltageLevels, which are optional subcontainers of substations
  • Lines, which are considered containers of individual ACLineSegments

Equipments have Terminals (most often 2, but sometimes multiple or even 1).

  • Terminals are not connected directly but through ConnectivityNodes that allow more than binary connections to be expressed.

Thus a query like "List all substations that are connected via an AC-line to substation XYZ" becomes very complex, and unfeasible for an LLM to generate.

We have used standard OWL2 RL rules to add inferred relations, which make the task much easier:

  • Subproperty (union) to express general parthood
  • Transitive closure to express parthood to any level
  • Property paths to express connection between equipments through Terminals and ConnectivityNode; and between containers through their parts

This makes the query much simpler and allows the LLM to generate it confidently. It was also important to describe (document) the derived properties well, to allow the LLM to discover them and use them. This is a great case showcasing the benefits of using semantic reasoning to make a KG easier to use, therefore making the LLM's task easier.

Graphical Abstract

Difficult Query

Consider the question "List all substations that are connected via an AC-line to substation ARENDAL". One way to implement it is with the following query (https://cim.ontotext.com/graphdb/sparql, no login required):

PREFIX cim: <https://cim.ucaiug.io/ns#>
PREFIX sesame: <http://www.openrdf.org/schema/sesame#>
select ?sub1Name ?lineName ?sub2Name {
  {select distinct * {
    values ?sub1Name {"ARENDAL"}
    ?sub1 a cim:Substation;
      cim:IdentifiedObject.name ?sub1Name;
      (cim:EquipmentContainer.Equipments|cim:Substation.VoltageLevels|cim:VoltageLevel.Bays)+ / # equipment in ?sub1
      cim:ConductingEquipment.Terminals / cim:Terminal.ConnectivityNode /
        cim:ConnectivityNode.Terminals / cim:Terminal.ConductingEquipment / # connected to segment
      cim:Equipment.EquipmentContainer ?line. # part of ?line
    ?line a cim:Line; cim:IdentifiedObject.name ?lineName}}
  {select distinct * {
    ?sub2 a cim:Substation;
      cim:IdentifiedObject.name ?sub2Name;
      (cim:EquipmentContainer.Equipments|cim:Substation.VoltageLevels|cim:VoltageLevel.Bays)+ / # equipment in ?sub2
      cim:ConductingEquipment.Terminals / cim:Terminal.ConnectivityNode /
        cim:ConnectivityNode.Terminals / cim:Terminal.ConductingEquipment / # connected to segment
      cim:Equipment.EquipmentContainer ?line}}
  filter(?sub1 != ?sub2)
}

This is a correct and fast query (takes 0.1s) but it's overly complicated to be generated by LLM.

Difficult Query Diagram

The following shows the graph patterns of the complex query. The property paths navigate through some blank nodes (i.e. nodes that we don't care about), and you see how deep are the graph patterns.

The transitive closure + is shown with a self-loop. The FILTER is shown with a link called :different.

LLM Explanation

I didn't dare try to ask an LLM to generate the query, but LLM (GPT o1) is pretty good at explaining the query:

  • Q: what does this query do?
    • Highlighting the query's actions: The SPARQL query starts by identifying sub1 by the name "ARENDAL" and traces through its related equipment. Then, it finds lines connected to sub1, storing the line name. A similar process follows for sub2.
    • Synthesizing line connections: The query pairs sub1 and sub2 sharing the same line. It starts with "ARENDAL," finding all substations connected to lines associated with "ARENDAL" and then makes pairwise connections.
    • A: This query finds all transmission lines that connect the substation named "ARENDAL" to any other substation, returning the names of "ARENDAL", the shared line, and the second substation.

Human Explanation

When I was explaning the query, first I wrote things that are very similar to the LLM's explanation:

  • The first subquery finds equipment in substation ?sub1 that is connected to a segment of a ?line
  • The second subquery finds another substation ?sub2 that is connected in the same way to ?line

But I also wrote some deeper explanations. (I'm quite sure that if you ask the LLM about these aspects, it would also provide deeper explanation.)

  • The filter must be outside both subqueries because ?sub1, ?sub2 are not shared between the subqueries
  • Because CIM doesn't have subproperties, we need alternatives to find all equipment nested in subcontainers: cim:EquipmentContainer.Equipments|cim:Substation.VoltageLevels|cim:VoltageLevel.Bays
  • Equipments are connected through their Terminals, by using a ConnectivityNode. So we need a property path like this (CIM has inverses for every relation):
cim:ConductingEquipment.Terminals / cim:Terminal.ConnectivityNode /
  cim:ConnectivityNode.Terminals / cim:Terminal.ConductingEquipment
  • We can use sesame:directType to access only the leaf-level (concrete) classes of resources. But in this case we check with concrete classes only, so using rdf:type (aliased as a) gives the same result
  • distinct is required because a substation may be connected to a line through different parts
  • We must do distinct in each subquery in order to reduce the number of solutions passed from the first subquery to the second
  • The first subquery ensures that ?line has the correct type, then just passes this binding into the second subquery
  • The filter checks that the two substations are different (otherwise we may well come back to ?sub1).
    • The != check is appropriate since ?sub is bound, but if we were looking for all pairs of substations, it's better to use an asymmetric check filter(str(?sub1) < str(?sub2) to eliminate symmetric solutions

LLM Simplification

I asked a LLM "with reasoning" to simplify the query:

  • Q: is the query complicated? can you simplify it?
  • Reasoned for 15 seconds: Breaking down the query, Mapping connections, Simplifying the SPARQL query, Streamlining the query, Mapping connections, Refining the SPARQL query, Enhancing query clarity...
PREFIX cim: <https://cim.ucaiug.io/ns#>
SELECT DISTINCT ?sub1Name ?lineName ?sub2Name
WHERE {
  ?sub1 a cim:Substation ;
        cim:IdentifiedObject.name "ARENDAL" .
  ?sub2 a cim:Substation ;
        cim:IdentifiedObject.name ?sub2Name .

  ?sub1 (cim:EquipmentContainer.Equipments|
         cim:Substation.VoltageLevels|
         cim:VoltageLevel.Bays)+ /
         cim:ConductingEquipment.Terminals /
         cim:Terminal.ConnectivityNode /
         cim:ConnectivityNode.Terminals /
         cim:Terminal.ConductingEquipment /
         cim:Equipment.EquipmentContainer ?line .

  ?sub2 (cim:EquipmentContainer.Equipments|
         cim:Substation.VoltageLevels|
         cim:VoltageLevel.Bays)+ /
         cim:ConductingEquipment.Terminals /
         cim:Terminal.ConnectivityNode /
         cim:ConnectivityNode.Terminals /
         cim:Terminal.ConductingEquipment /
         cim:Equipment.EquipmentContainer ?line .

  ?line a cim:Line ;
        cim:IdentifiedObject.name ?lineName .

  FILTER(?sub1 != ?sub2)

  BIND("ARENDAL" as ?sub1Name)
}

This may be easier to read (depending on your preference), but it is not easier to understand or generate.

More importantly, it's a lot less efficient (0.8s vs 0.1s) It does DISTINCT at the end only, not for each subquery.

Adding Inference

The query can be simplified a lot if we add inferred relations. Assume a namespace cimr: (CIM Rules). In CIM, all relations have inverses, so we do the same. We define the following:

  • Union: cim:EquipmentContainer.Equipments|cim:Substation.VoltageLevels|cim:VoltageLevel.Bays -> cimr:hasPart; inverse cimr:isPart
  • Union: cim:Terminal.ConductingEquipment|cim:Terminal.AuxiliaryEquipment -> cimr:Terminal.Equipment; inverse cimr:Equipment.Terminals
  • Transitive closure: cimr:hasPart+ -> cimr:hasPartTransitive; inverse cimr:isPartTransitive
  • Property path: cim:ConductingEquipment.Terminals / cim:Terminal.ConnectivityNode / cim:ConnectivityNode.Terminals / cim:Terminal.ConductingEquipment -> cimr:connectedTo (symmetric)
  • Property path: cimr:hasPartTransitive / cimr:connectedTo / cimr:isPartTransitive -> cimr:connectedThroughPart (symmetric)

Here is a dependency graph of the relations:

Legend:

  • Relations are shown as boxes with background:
    • Explicit: white background
    • Inferred: lightgray background
  • Inferences are shown as lines with color:
    • Union (rdfs:subPropertyOf): blue
    • Property chain (owl:propertyChainAxiom): green
    • Transitive (rdfs:subPropertyOf and owl:TransitiveProperty): red
    • Inverse (owl:inverseOf) or symmetric (owl:SymmetricProperty): black

Simplified Query

After adding cimr: derived relations, the query becomes much simpler:

select ?sub1Name ?lineName ?sub2Name {
  values ?sub1Name {"ARENDAL"}
  ?sub1 a cim:Substation; cim:IdentifiedObject.name ?sub1Name;
    cimr:connectedThroughPart ?line.
  ?line a cim:Line; cim:IdentifiedObject.name ?lineName.
  ?sub2 a cim:Substation; cim:IdentifiedObject.name ?sub2Name;
    cimr:connectedThroughPart ?line.
  filter(?sub1 != ?sub2)
}

This diagram shows the graph patterns of the query (compare to Difficult Query Diagram):

Inference in CIM

The project Inst4CIM-KG is dedicated to the evolution of CIM ontologies, instance data representation, and SHACL shapes.

The section Reasoning discusses what reasoning is appropriate for CIM.

  • CIM defines rdfs:subClassOf reasoning
    • rdf:type querying can rely on it, which will simplify many queries. To access only the direct (concrete) class of a resource, use sesame:directType
    • SHACL should rely on it, which allows significant simplifications and speed-ups
  • CIM defines owl:inverseOf (all CIM relations have inverses).
    • We concluded that inverse reasoning is not mandatory, but is desirable for querying.
    • Otherwise one must carefully track the direction of relations and use inverse property paths if needed

We also add:

  • rdfs:subPropertyOf: to make a union of relations.
  • owl:TransitiveProperty: to represent parts (containers and equipment) nested any number of levels deep
  • owl:propertyChainAxiom: to "chain" several relations together
  • owl:SymmetricProperty: two inferred relations are declared symmetric, but we don't need this reasoning since the respective property chains are already symmetric.

Implementing Standard Inference

All the constructs mentioned above are part of OWL2 rules, so we can use standard inference. We can use the owl2-rl-optimized ruleset.

The cimr.ttl ontology implements the required reasoning as follows (note: the file may move, use the Github search functionality if the link is broken):

cim:EquipmentContainer.Equipments rdfs:subPropertyOf cimr:hasPart.
cim:Substation.VoltageLevels      rdfs:subPropertyOf cimr:hasPart.
cim:VoltageLevel.Bays             rdfs:subPropertyOf cimr:hasPart.
cimr:hasPart                      owl:inverseOf      cimr:isPart.
cimr:hasPart                      rdfs:subPropertyOf cimr:hasPartTransitive.
cimr:hasPartTransitive            owl:inverseOf      cimr:isPartTransitive.
cimr:hasPartTransitive            a                  owl:TransitiveProperty.
cim:Terminal.ConductingEquipment  rdfs:subPropertyOf cimr:Terminal.Equipment.
cim:Terminal.AuxiliaryEquipment   rdfs:subPropertyOf cimr:Terminal.Equipment.
cimr:Terminal.Equipment           owl:inverseOf      cimr:Equipment.Terminals.
cimr:connectedTo                  owl:propertyChainAxiom
  (cimr:Equipment.Terminals cim:Terminal.ConnectivityNode cim:ConnectivityNode.Terminals cimr:Terminal.Equipment).
cimr:connectedTo                  a                  owl:SymmetricProperty.
cimr:connectedThroughPart         owl:propertyChainAxiom
  (cimr:hasPartTransitive cimr:connectedTo cimr:isPartTransitive).
cimr:connectedThroughPart         a                  owl:SymmetricProperty.

These few axioms implement some powerful reasoning. We can check how much that increases inferred statements (expansion ratio). Hover over the repo name (top-right corner) and check the triple counts:

  • The basic CIM ontologies use RDFS subclass, OWL inverse
    • triples: 63.5k explicit, 17.6k inferred, 81.2k total: 1.28x expansion
  • cimr uses RDFS subclass and subproperty, OWL inverse, transitive, propertyChainAxiom
    • triples: 63.5k explicit, 60.4k inferred, 124k total: 1.95x expansion

In other projects we see typical expansion ratios of 1.15-1.20x, but 1.95x is not too much. Doubling the KG size will add only a few percent to querying time.

Customizing Inference

Apart from standard inference, we can use custom rulesets to:

  • Tighten up the derived relations to exclude some inferences that we don't need
  • Optimize inference rule efficiency

See the GraphDB documentation on Rules Optimizations for some ideas. We use a custom ruleset cim_owl2-rl-optimized that:

  • Is optimised to remove tautologies
  • Uses a more efficient transitiveOver rule
  • Can be optimized further by using fixed-arity property chains, instead of chains represented with rdf:List

Let's illustrate the last two points:

transitiveOver

hasPartTransitive is declared transitive as follows:

cimr:hasPartTransitive a owl:TransitiveProperty.

The standard implementation of transitivity is as follows:

Id: prp_trp
  p <rdf:type> <owl:TransitiveProperty>
  x p y
  y p z
  ---------------
  x p z

Every time p is inserted, the database should look for other instances of p connected on the left or right of it and extend the chain with one more transitive link. In other words, the database should potentially consider every split of a chain to infer the closure of the chain, leading to o(n^3) complexity where n is the maximum chain length.

We can use a custom construct to optimize this (see here). Instead of owl:TransitiveProperty, we declare the relation as follows:

cimr:hasPartTransitive psys:transitiveOver cimr:hasPart.

(psys stands for "PROTON System ontology", which was made by Ontotext some 20 years ago).

The rule implementing it is:

Id: psys_transitiveOver
  p <psys:transitiveOver> q
  x p y
  y q z
  ---------------
  x p z

This looks deceptively similar to the previous rule, but there is a crucial difference: it distinguishes the step property q (cimr:hasPart) from the transitive closure p (cimr:hasPartTransitive). The step can only be added on the right of the transitive chain. This is an arbitrary restriction, and we have considered psys:transitiveLeft in other projects. But if the transitive prop is not inserted explicitly (which is a good practice anyway), then the completeness of inference is guaranteed regardless of the order of inserting the step property.

This reduces reasoning complexity to o(n^2), which makes a big difference for long transitive chains.

Fixed-arity Property Paths

We use two property paths: of length (arity) 3 and 4:

cimr:connectedTo                  owl:propertyChainAxiom
  (cimr:Equipment.Terminals cim:Terminal.ConnectivityNode cim:ConnectivityNode.Terminals cimr:Terminal.Equipment).
cimr:connectedThroughPart         owl:propertyChainAxiom
  (cimr:hasPartTransitive cimr:connectedTo cimr:isPartTransitive).

These chains are represented as rdf:List (shown above using the parenthesised Turtle shortcut notation).

The rules to implement such generic chains involve unrolling the list and the use of a hidden onto:_checkChain context:

Id: prp_spo2_1
    p <owl:propertyChainAxiom> pc
    start pc last                   [Context <onto:_checkChain>]
    ----------------------------
    start p last
        
Id: prp_spo2_2
    pc <rdf:first> p
    pc <rdf:rest> t                 [Constraint t != <rdf:nil>]
    start p next
    next t last                     [Context <onto:_checkChain>]
    ----------------------------
    start pc last                   [Context <onto:_checkChain>]

Id: prp_spo2_3
    pc <rdf:first> p
    pc <rdf:rest> <rdf:nil>
    start p last
    ----------------------------
    start pc last                   [Context <onto:_checkChain>]

(If you can prove that this works as expected, you deserve a PhD in logical reasoning. I have only done it with lists of length 2.)

We can simplify these rules if we consider chains of fixed length (arity). For example, we can declare the first chain as follows:

cimr:Connectedto  a psys:PropChain4             ;
  psys:premise1   cimr:Equipment.Terminals      ;
  psys:premise2   cim:Terminal.ConnectivityNode ;
  psys:premise3   cim:ConnectivityNode.Terminals;
  psys:premise4   cimr:Terminal.Equipment       ;
  psys:conclusion cimr:connectedTo              .

The implementaton is a single non-recursive rule since it uses this fixed-arity structure:

Id: psys_PropChain4
  t <rdf:type>        <psys:PropChain4>
  t <psys:premise1>   p1
  t <psys:premise2>   p2
  t <psys:premise2>   p3
  t <psys:premise2>   p4
  t <psys:conclusion> q
  x1 p1 x2
  x2 p2 x3
  x3 p3 x4
  x4 p4 x5
  ----------------
  x1 q  x5

CIM Rules ontology: Describing Inferred Properties

It was also important to describe (document) the derived properties well, to allow the LLM to discover them and use them.

Conclusion

LLMs are widely used for Natural Language Querying (NLQ). SPARQL is a great target query language because of its high level of abstraction and easy joining of data through graph patterns.

Nevertheless, when applied on a complex ontology like the electrical CIM, LLMs struggle to generate some of the more complex queries. We show how RDF inference can make this task much simpler.

Related Work

Inferring "shortcut" relations is not a new idea. I have seen it at least twice before:

  • The ASHRAE 223p ontology used in Building Management Systems to describe producers (eg a heater), consumers (eg a radiator), flows, sensors, actuators, and the connections between them. It has an elaborate building parts connectivity structure:

References

  1. Large-scale Reasoning with a Complex Cultural Heritage Ontology (CIDOC CRM). Alexiev, V.; Manov, D.; Parvanova, J.; and Petrov, S. In Workshop Practical Experiences with CIDOC CRM and its Extensions (CRMEX 2013) at TPDL 2013, volume 1117, Valetta, Malta, September 2013. CEUR WS Paper slides preprint
  2. Implementing CIDOC CRM Search Based on Fundamental Relations and OWLIM Rules. Alexiev, V. In Workshop on Semantic Digital Archives (SDA 2012), part of International Conference on Theory and Practice of Digital Libraries (TPDL 2012), volume 912, Paphos, Cyprus, September 2012. CEUR WS Paper slides published

Acknowledgement

This work is part of the Talk2PowerSystem project of Statnett and Graphwise.

This blog post is based on the wiki page Talk2PowerSystem/Inference, which has a few more technical details.

⚠️ **GitHub.com Fallback** ⚠️