Date: 2025-07-03. Author: Vladimir Alexiev

Table of Contents

Intro

Electrical CIM is a complex ontology that can represent the complete electrical enterprise, including generation, transmission, distribution, client operations, asset management, market info and operations. One of the basic ontology modules (profiles) is about Equipment. It models Equipment (conducting and non-conducting) and Equipment Containers such as

Electrical transformation Substations being buildings that hold subcontainers and ultimately Equipments such as PowerTransformers, Switches (Breakers and Disconnectors), Busbars, etc
Bays and VoltageLevels, which are optional subcontainers of substations
Lines, which are considered containers of individual ACLineSegments

Equipments have Terminals (most often 2, but sometimes multiple or even 1).

Terminals are not connected directly but through ConnectivityNodes that allow more than binary connections to be expressed.

Thus a query like "List all substations that are connected via an AC-line to substation XYZ" becomes very complex, and unfeasible for an LLM to generate.

We have used standard OWL2 RL rules to add inferred relations, which make the task much easier:

Subproperty (union) to express general parthood
Transitive closure to express parthood to any level
Property paths to express connection between equipments through Terminals and ConnectivityNode; and between containers through their parts

This makes the query much simpler and allows the LLM to generate it confidently. It was also important to describe (document) the derived properties well, to allow the LLM to discover them and use them. This is a great case showcasing the benefits of using semantic reasoning to make a KG easier to use, therefore making the LLM's task easier.

Graphical Abstract

Difficult Query

Consider the question "List all substations that are connected via an AC-line to substation ARENDAL". One way to implement it is with the following query (https://cim.ontotext.com/graphdb/sparql, no login required):

PREFIX cim: <https://cim.ucaiug.io/ns#>
PREFIX sesame: <http://www.openrdf.org/schema/sesame#>
select ?sub1Name ?lineName ?sub2Name {
  {select distinct * {
    values ?sub1Name {"ARENDAL"}
    ?sub1 a cim:Substation;
      cim:IdentifiedObject.name ?sub1Name;
      (cim:EquipmentContainer.Equipments|cim:Substation.VoltageLevels|cim:VoltageLevel.Bays)+ / # equipment in ?sub1
      cim:ConductingEquipment.Terminals / cim:Terminal.ConnectivityNode /
        cim:ConnectivityNode.Terminals / cim:Terminal.ConductingEquipment / # connected to segment
      cim:Equipment.EquipmentContainer ?line. # part of ?line
    ?line a cim:Line; cim:IdentifiedObject.name ?lineName}}
  {select distinct * {
    ?sub2 a cim:Substation;
      cim:IdentifiedObject.name ?sub2Name;
      (cim:EquipmentContainer.Equipments|cim:Substation.VoltageLevels|cim:VoltageLevel.Bays)+ / # equipment in ?sub2
      cim:ConductingEquipment.Terminals / cim:Terminal.ConnectivityNode /
        cim:ConnectivityNode.Terminals / cim:Terminal.ConductingEquipment / # connected to segment
      cim:Equipment.EquipmentContainer ?line}}
  filter(?sub1 != ?sub2)
}

This is a correct and fast query (takes 0.1s) but it's overly complicated to be generated by LLM.

Difficult Query Diagram

The following shows the graph patterns of the complex query. The property paths navigate through some blank nodes (i.e. nodes that we don't care about), and you see how deep are the graph patterns.

The transitive closure + is shown with a self-loop. The FILTER is shown with a link called :different.

LLM Explanation

I didn't dare try to ask an LLM to generate the query, but LLM (GPT o1) is pretty good at explaining the query:

Q: what does this query do?
- Highlighting the query's actions: The SPARQL query starts by identifying sub1 by the name "ARENDAL" and traces through its related equipment. Then, it finds lines connected to sub1, storing the line name. A similar process follows for sub2.
- Synthesizing line connections: The query pairs sub1 and sub2 sharing the same line. It starts with "ARENDAL," finding all substations connected to lines associated with "ARENDAL" and then makes pairwise connections.
- A: This query finds all transmission lines that connect the substation named "ARENDAL" to any other substation, returning the names of "ARENDAL", the shared line, and the second substation.

Human Explanation

When I was explaning the query, first I wrote things that are very similar to the LLM's explanation:

The first subquery finds equipment in substation ?sub1 that is connected to a segment of a ?line
The second subquery finds another substation ?sub2 that is connected in the same way to ?line

But I also wrote some deeper explanations. (I'm quite sure that if you ask the LLM about these aspects, it would also provide deeper explanation.)

The filter must be outside both subqueries because ?sub1, ?sub2 are not shared between the subqueries
Because CIM doesn't have subproperties, we need alternatives to find all equipment nested in subcontainers: cim:EquipmentContainer.Equipments|cim:Substation.VoltageLevels|cim:VoltageLevel.Bays
Equipments are connected through their Terminals, by using a ConnectivityNode. So we need a property path like this (CIM has inverses for every relation):

cim:ConductingEquipment.Terminals / cim:Terminal.ConnectivityNode /
  cim:ConnectivityNode.Terminals / cim:Terminal.ConductingEquipment

We can use sesame:directType to access only the leaf-level (concrete) classes of resources. But in this case we check with concrete classes only, so using rdf:type (aliased as a) gives the same result
distinct is required because a substation may be connected to a line through different parts
We must do distinct in each subquery in order to reduce the number of solutions passed from the first subquery to the second
The first subquery ensures that ?line has the correct type, then just passes this binding into the second subquery
The filter checks that the two substations are different (otherwise we may well come back to ?sub1).
- The != check is appropriate since ?sub is bound, but if we were looking for all pairs of substations, it's better to use an asymmetric check filter(str(?sub1) < str(?sub2) to eliminate symmetric solutions

LLM Simplification

I asked a LLM "with reasoning" to simplify the query:

Q: is the query complicated? can you simplify it?
Reasoned for 15 seconds: Breaking down the query, Mapping connections, Simplifying the SPARQL query, Streamlining the query, Mapping connections, Refining the SPARQL query, Enhancing query clarity...

PREFIX cim: <https://cim.ucaiug.io/ns#>
SELECT DISTINCT ?sub1Name ?lineName ?sub2Name
WHERE {
  ?sub1 a cim:Substation ;
        cim:IdentifiedObject.name "ARENDAL" .
  ?sub2 a cim:Substation ;
        cim:IdentifiedObject.name ?sub2Name .

  ?sub1 (cim:EquipmentContainer.Equipments|
         cim:Substation.VoltageLevels|
         cim:VoltageLevel.Bays)+ /
         cim:ConductingEquipment.Terminals /
         cim:Terminal.ConnectivityNode /
         cim:ConnectivityNode.Terminals /
         cim:Terminal.ConductingEquipment /
         cim:Equipment.EquipmentContainer ?line .

  ?sub2 (cim:EquipmentContainer.Equipments|
         cim:Substation.VoltageLevels|
         cim:VoltageLevel.Bays)+ /
         cim:ConductingEquipment.Terminals /
         cim:Terminal.ConnectivityNode /
         cim:ConnectivityNode.Terminals /
         cim:Terminal.ConductingEquipment /
         cim:Equipment.EquipmentContainer ?line .

  ?line a cim:Line ;
        cim:IdentifiedObject.name ?lineName .

  FILTER(?sub1 != ?sub2)

  BIND("ARENDAL" as ?sub1Name)
}

This may be easier to read (depending on your preference), but it is not easier to understand or generate.

More importantly, it's a lot less efficient (0.8s vs 0.1s) It does DISTINCT at the end only, not for each subquery.

Adding Inference

The query can be simplified a lot if we add inferred relations. Assume a namespace cimr: (CIM Rules). In CIM, all relations have inverses, so we do the same. We define the following:

Union: cim:EquipmentContainer.Equipments|cim:Substation.VoltageLevels|cim:VoltageLevel.Bays -> cimr:hasPart; inverse cimr:isPart
Union: cim:Terminal.ConductingEquipment|cim:Terminal.AuxiliaryEquipment -> cimr:Terminal.Equipment; inverse cimr:Equipment.Terminals
Transitive closure: cimr:hasPart+ -> cimr:hasPartTransitive; inverse cimr:isPartTransitive
Property path: cim:ConductingEquipment.Terminals / cim:Terminal.ConnectivityNode / cim:ConnectivityNode.Terminals / cim:Terminal.ConductingEquipment -> cimr:connectedTo (symmetric)
Property path: cimr:hasPartTransitive / cimr:connectedTo / cimr:isPartTransitive -> cimr:connectedThroughPart (symmetric)

Here is a dependency graph of the relations:

Legend:

Relations are shown as boxes with background:
- Explicit: white background
- Inferred: lightgray background
Inferences are shown as lines with color:
- Union (rdfs:subPropertyOf): blue
- Property chain (owl:propertyChainAxiom): green
- Transitive (rdfs:subPropertyOf and owl:TransitiveProperty): red
- Inverse (owl:inverseOf) or symmetric (owl:SymmetricProperty): black

Simplified Query

After adding cimr: derived relations, the query becomes much simpler:

select ?sub1Name ?lineName ?sub2Name {
  values ?sub1Name {"ARENDAL"}
  ?sub1 a cim:Substation; cim:IdentifiedObject.name ?sub1Name;
    cimr:connectedThroughPart ?line.
  ?line a cim:Line; cim:IdentifiedObject.name ?lineName.
  ?sub2 a cim:Substation; cim:IdentifiedObject.name ?sub2Name;
    cimr:connectedThroughPart ?line.
  filter(?sub1 != ?sub2)
}

This diagram shows the graph patterns of the query (compare to Difficult Query Diagram):

Inference in CIM

The project Inst4CIM-KG is dedicated to the evolution of CIM ontologies, instance data representation, and SHACL shapes.

The section Reasoning discusses what reasoning is appropriate for CIM.

CIM defines rdfs:subClassOf reasoning
- rdf:type querying can rely on it, which will simplify many queries. To access only the direct (concrete) class of a resource, use sesame:directType
- SHACL should rely on it, which allows significant simplifications and speed-ups
CIM defines owl:inverseOf (all CIM relations have inverses).
- We concluded that inverse reasoning is not mandatory, but is desirable for querying.
- Otherwise one must carefully track the direction of relations and use inverse property paths if needed

We also add:

rdfs:subPropertyOf: to make a union of relations.
- CIM doesn't have any subproperty definitions, but issue #156 CIM needs subproperties argues that it should
owl:TransitiveProperty: to represent parts (containers and equipment) nested any number of levels deep
owl:propertyChainAxiom: to "chain" several relations together
owl:SymmetricProperty: two inferred relations are declared symmetric, but we don't need this reasoning since the respective property chains are already symmetric.

Implementing Standard Inference

All the constructs mentioned above are part of OWL2 rules, so we can use standard inference. We can use the owl2-rl-optimized ruleset.

The cimr.ttl ontology implements the required reasoning as follows (note: the file may move, use the Github search functionality if the link is broken):

cim:EquipmentContainer.Equipments rdfs:subPropertyOf cimr:hasPart.
cim:Substation.VoltageLevels      rdfs:subPropertyOf cimr:hasPart.
cim:VoltageLevel.Bays             rdfs:subPropertyOf cimr:hasPart.
cimr:hasPart                      owl:inverseOf      cimr:isPart.
cimr:hasPart                      rdfs:subPropertyOf cimr:hasPartTransitive.
cimr:hasPartTransitive            owl:inverseOf      cimr:isPartTransitive.
cimr:hasPartTransitive            a                  owl:TransitiveProperty.
cim:Terminal.ConductingEquipment  rdfs:subPropertyOf cimr:Terminal.Equipment.
cim:Terminal.AuxiliaryEquipment   rdfs:subPropertyOf cimr:Terminal.Equipment.
cimr:Terminal.Equipment           owl:inverseOf      cimr:Equipment.Terminals.
cimr:connectedTo                  owl:propertyChainAxiom
  (cimr:Equipment.Terminals cim:Terminal.ConnectivityNode cim:ConnectivityNode.Terminals cimr:Terminal.Equipment).
cimr:connectedTo                  a                  owl:SymmetricProperty.
cimr:connectedThroughPart         owl:propertyChainAxiom
  (cimr:hasPartTransitive cimr:connectedTo cimr:isPartTransitive).
cimr:connectedThroughPart         a                  owl:SymmetricProperty.

These few axioms implement some powerful reasoning. We can check how much that increases inferred statements (expansion ratio). Hover over the repo name (top-right corner) and check the triple counts:

The basic CIM ontologies use RDFS subclass, OWL inverse
- triples: 63.5k explicit, 17.6k inferred, 81.2k total: 1.28x expansion
cimr uses RDFS subclass and subproperty, OWL inverse, transitive, propertyChainAxiom
- triples: 63.5k explicit, 60.4k inferred, 124k total: 1.95x expansion

In other projects we see typical expansion ratios of 1.15-1.20x, but 1.95x is not too much. Doubling the KG size will add only a few percent to querying time.

Customizing Inference

Apart from standard inference, we can use custom rulesets to:

Tighten up the derived relations to exclude some inferences that we don't need
Optimize inference rule efficiency

See the GraphDB documentation on Rules Optimizations for some ideas. We use a custom ruleset cim_owl2-rl-optimized that:

Is optimised to remove tautologies
Uses a more efficient transitiveOver rule
Can be optimized further by using fixed-arity property chains, instead of chains represented with rdf:List

Let's illustrate the last two points:

transitiveOver

hasPartTransitive is declared transitive as follows:

cimr:hasPartTransitive a owl:TransitiveProperty.

The standard implementation of transitivity is as follows:

Id: prp_trp
  p <rdf:type> <owl:TransitiveProperty>
  x p y
  y p z
  ---------------
  x p z

Every time p is inserted, the database should look for other instances of p connected on the left or right of it and extend the chain with one more transitive link. In other words, the database should potentially consider every split of a chain to infer the closure of the chain, leading to o(n^3) complexity where n is the maximum chain length.

We can use a custom construct to optimize this (see here). Instead of owl:TransitiveProperty, we declare the relation as follows:

cimr:hasPartTransitive psys:transitiveOver cimr:hasPart.

(psys stands for "PROTON System ontology", which was made by Ontotext some 20 years ago).

The rule implementing it is:

Id: psys_transitiveOver
  p <psys:transitiveOver> q
  x p y
  y q z
  ---------------
  x p z

This looks deceptively similar to the previous rule, but there is a crucial difference: it distinguishes the step property q (cimr:hasPart) from the transitive closure p (cimr:hasPartTransitive). The step can only be added on the right of the transitive chain. This is an arbitrary restriction, and we have considered psys:transitiveLeft in other projects. But if the transitive prop is not inserted explicitly (which is a good practice anyway), then the completeness of inference is guaranteed regardless of the order of inserting the step property.

This reduces reasoning complexity to o(n^2), which makes a big difference for long transitive chains.

Fixed-arity Property Paths

We use two property paths: of length (arity) 3 and 4:

cimr:connectedTo                  owl:propertyChainAxiom
  (cimr:Equipment.Terminals cim:Terminal.ConnectivityNode cim:ConnectivityNode.Terminals cimr:Terminal.Equipment).
cimr:connectedThroughPart         owl:propertyChainAxiom
  (cimr:hasPartTransitive cimr:connectedTo cimr:isPartTransitive).

These chains are represented as rdf:List (shown above using the parenthesised Turtle shortcut notation).

The rules to implement such generic chains involve unrolling the list and the use of a hidden onto:_checkChain context:

Id: prp_spo2_1
    p <owl:propertyChainAxiom> pc
    start pc last                   [Context <onto:_checkChain>]
    ----------------------------
    start p last
        
Id: prp_spo2_2
    pc <rdf:first> p
    pc <rdf:rest> t                 [Constraint t != <rdf:nil>]
    start p next
    next t last                     [Context <onto:_checkChain>]
    ----------------------------
    start pc last                   [Context <onto:_checkChain>]

Id: prp_spo2_3
    pc <rdf:first> p
    pc <rdf:rest> <rdf:nil>
    start p last
    ----------------------------
    start pc last                   [Context <onto:_checkChain>]

(If you can prove that this works as expected, you deserve a PhD in logical reasoning. I have only done it with lists of length 2.)

We can simplify these rules if we consider chains of fixed length (arity). For example, we can declare the first chain as follows:

cimr:Connectedto  a psys:PropChain4             ;
  psys:premise1   cimr:Equipment.Terminals      ;
  psys:premise2   cim:Terminal.ConnectivityNode ;
  psys:premise3   cim:ConnectivityNode.Terminals;
  psys:premise4   cimr:Terminal.Equipment       ;
  psys:conclusion cimr:connectedTo              .

The implementaton is a single non-recursive rule since it uses this fixed-arity structure:

Id: psys_PropChain4
  t <rdf:type>        <psys:PropChain4>
  t <psys:premise1>   p1
  t <psys:premise2>   p2
  t <psys:premise2>   p3
  t <psys:premise2>   p4
  t <psys:conclusion> q
  x1 p1 x2
  x2 p2 x3
  x3 p3 x4
  x4 p4 x5
  ----------------
  x1 q  x5

And we handle arity 3 in a similar way.

CIM Rules ontology: Describing Inferred Properties

It was also important to describe (document) the derived properties well:

To allow the LLM to discover them and use them.
To be able to generate a proper GraphQL schema (a topic for a page to be written in the future)

The cimr CIM Rules ontology ("officially called" the CIM Inferred Extension Ontology) does that.

It first declares a EquipmentOrContainer superclass to serve as domain and range of hasPart, isPart:

cimr:EquipmentOrContainer a owl:Class;
  rdfs:label "Equipment or Container";
  rdfs:comment "Union (superclass) of Equipment and ConnectivityNodeContainer";
  rdfs:subClassOf cim:PowerSystemResource;
  rdfs:isDefinedBy cimr: .

cim:Equipment                 rdfs:subClassOf cimr:EquipmentOrContainer.
cim:ConnectivityNodeContainer rdfs:subClassOf cimr:EquipmentOrContainer.

Please note that we also remove the current parent of Equipment and ConnectivityNodeContainer (which is PowerSystemResource) to avoid redundancy in the class hierarchy.

Then it properly declares and describes each inferred relation, e.g.:

cimr:hasPart a owl:ObjectProperty;
  rdfs:label "has part";
  rdfs:comment "Equipment or Container has part. Disjunction of the props listed in skos:example";
  rdfs:domain cimr:EquipmentOrContainer;
  rdfs:range  cimr:EquipmentOrContainer;
  skos:example cim:EquipmentContainer.Equipments, cim:Substation.VoltageLevels, cim:VoltageLevel.Bays;
  rdfs:isDefinedBy cimr: .

The implementation of inferencing is in the same ontology, e.g.:

cimr:hasPart                      owl:inverseOf      cimr:isPart.
cimr:hasPart                      rdfs:subPropertyOf cimr:hasPartTransitive.
cim:EquipmentContainer.Equipments rdfs:subPropertyOf cimr:hasPart.
cim:Substation.VoltageLevels      rdfs:subPropertyOf cimr:hasPart.
cim:VoltageLevel.Bays             rdfs:subPropertyOf cimr:hasPart.

Conclusion

LLMs are widely used for Natural Language Querying (NLQ). SPARQL is a great target query language because of its high level of abstraction and easy joining of data through graph patterns.

Nevertheless, when applied on a complex ontology like the electrical CIM, LLMs struggle to generate some of the more complex queries. We show how RDF inference can make this task much simpler.

Related Work

Inferring "shortcut" relations is not a new idea. I have seen it at least twice before:

British Museum CIDOC CRM Fundamental Relations Implementation [1,2] which reduces complex networks like the one below to a single relation like FC70_Thing FR7_from_place E53_Place

The ASHRAE 223p ontology used in Building Management Systems to describe producers (eg a heater), consumers (eg a radiator), flows, sensors, actuators, and the connections between them. It has an elaborate building parts connectivity structure:

References

Large-scale Reasoning with a Complex Cultural Heritage Ontology (CIDOC CRM). Alexiev, V.; Manov, D.; Parvanova, J.; and Petrov, S. In Workshop Practical Experiences with CIDOC CRM and its Extensions (CRMEX 2013) at TPDL 2013, volume 1117, Valetta, Malta, September 2013. CEUR WS Paper slides preprint
Implementing CIDOC CRM Search Based on Fundamental Relations and OWLIM Rules. Alexiev, V. In Workshop on Semantic Digital Archives (SDA 2012), part of International Conference on Theory and Practice of Digital Libraries (TPDL 2012), volume 912, Paphos, Cyprus, September 2012. CEUR WS Paper slides published

Acknowledgement

This work is part of the Talk2PowerSystem project of Statnett and Graphwise.

This blog post is based on the wiki page Talk2PowerSystem/Inference, which has a few more technical details.

Blog Using Semantic Reasoning to Help LLM with SPARQL Generation in Electrical CIM - statnett/Talk2PowerSystem GitHub Wiki

Intro

Graphical Abstract

Difficult Query

Difficult Query Diagram

LLM Explanation

Human Explanation

LLM Simplification

Adding Inference

Simplified Query

Inference in CIM

Implementing Standard Inference

Customizing Inference

transitiveOver

Fixed-arity Property Paths

CIM Rules ontology: Describing Inferred Properties

Conclusion

Related Work

References

Acknowledgement

⚠️ GitHub.com Fallback ⚠️

Blog Using Semantic Reasoning to Help LLM with SPARQL Generation in Electrical CIM - statnett/Talk2PowerSystem GitHub Wiki

Intro

Graphical Abstract

Difficult Query

Difficult Query Diagram

LLM Explanation

Human Explanation

LLM Simplification

Adding Inference

Simplified Query

Inference in CIM

Implementing Standard Inference

Customizing Inference

transitiveOver

Fixed-arity Property Paths

CIM Rules ontology: Describing Inferred Properties

Conclusion

Related Work

References

Acknowledgement

⚠️ **GitHub.com Fallback** ⚠️

⚠️ GitHub.com Fallback ⚠️