Blog Using Semantic Reasoning to Help LLM with SPARQL Generation in Electrical CIM - statnett/Talk2PowerSystem GitHub Wiki
Date: 2025-07-03. Author: Vladimir Alexiev
Table of Contents
Electrical CIM is a complex ontology that can represent the complete electrical enterprise, including generation, transmission, distribution, client operations, asset management, market info and operations. One of the basic ontology modules (profiles) is about Equipment. It models Equipment (conducting and non-conducting) and Equipment Containers such as
- Electrical transformation
Substations
being buildings that hold subcontainers and ultimatelyEquipments
such asPowerTransformers
,Switches
(Breakers
andDisconnectors
),Busbars
, etc -
Bays
andVoltageLevels
, which are optional subcontainers of substations - Lines, which are considered containers of individual ACLineSegments
Equipments have Terminals (most often 2, but sometimes multiple or even 1).
- Terminals are not connected directly but through
ConnectivityNodes
that allow more than binary connections to be expressed.
Thus a query like "List all substations that are connected via an AC-line to substation XYZ" becomes very complex, and unfeasible for an LLM to generate.
We have used standard OWL2 RL rules to add inferred relations, which make the task much easier:
- Subproperty (union) to express general parthood
- Transitive closure to express parthood to any level
- Property paths to express connection between equipments through Terminals and
ConnectivityNode
; and between containers through their parts
This makes the query much simpler and allows the LLM to generate it confidently. It was also important to describe (document) the derived properties well, to allow the LLM to discover them and use them. This is a great case showcasing the benefits of using semantic reasoning to make a KG easier to use, therefore making the LLM's task easier.
Consider the question "List all substations that are connected via an AC-line to substation ARENDAL". One way to implement it is with the following query (https://cim.ontotext.com/graphdb/sparql, no login required):
PREFIX cim: <https://cim.ucaiug.io/ns#>
PREFIX sesame: <http://www.openrdf.org/schema/sesame#>
select ?sub1Name ?lineName ?sub2Name {
{select distinct * {
values ?sub1Name {"ARENDAL"}
?sub1 a cim:Substation;
cim:IdentifiedObject.name ?sub1Name;
(cim:EquipmentContainer.Equipments|cim:Substation.VoltageLevels|cim:VoltageLevel.Bays)+ / # equipment in ?sub1
cim:ConductingEquipment.Terminals / cim:Terminal.ConnectivityNode /
cim:ConnectivityNode.Terminals / cim:Terminal.ConductingEquipment / # connected to segment
cim:Equipment.EquipmentContainer ?line. # part of ?line
?line a cim:Line; cim:IdentifiedObject.name ?lineName}}
{select distinct * {
?sub2 a cim:Substation;
cim:IdentifiedObject.name ?sub2Name;
(cim:EquipmentContainer.Equipments|cim:Substation.VoltageLevels|cim:VoltageLevel.Bays)+ / # equipment in ?sub2
cim:ConductingEquipment.Terminals / cim:Terminal.ConnectivityNode /
cim:ConnectivityNode.Terminals / cim:Terminal.ConductingEquipment / # connected to segment
cim:Equipment.EquipmentContainer ?line}}
filter(?sub1 != ?sub2)
}
This is a correct and fast query (takes 0.1s) but it's overly complicated to be generated by LLM.
The following shows the graph patterns of the complex query. The property paths navigate through some blank nodes (i.e. nodes that we don't care about), and you see how deep are the graph patterns.
The transitive closure +
is shown with a self-loop.
The FILTER
is shown with a link called :different
.
I didn't dare try to ask an LLM to generate the query, but LLM (GPT o1) is pretty good at explaining the query:
- Q: what does this query do?
-
Highlighting the query's actions:
The SPARQL query starts by identifying
sub1
by the name "ARENDAL" and traces through its related equipment. Then, it finds lines connected tosub1
, storing the line name. A similar process follows forsub2
. -
Synthesizing line connections:
The query pairs
sub1
andsub2
sharing the sameline
. It starts with "ARENDAL," finding all substations connected to lines associated with "ARENDAL" and then makes pairwise connections. - A: This query finds all transmission lines that connect the substation named "ARENDAL" to any other substation, returning the names of "ARENDAL", the shared line, and the second substation.
-
Highlighting the query's actions:
The SPARQL query starts by identifying
When I was explaning the query, first I wrote things that are very similar to the LLM's explanation:
- The first subquery finds equipment in substation
?sub1
that is connected to a segment of a?line
- The second subquery finds another substation
?sub2
that is connected in the same way to?line
But I also wrote some deeper explanations. (I'm quite sure that if you ask the LLM about these aspects, it would also provide deeper explanation.)
- The
filter
must be outside both subqueries because?sub1, ?sub2
are not shared between the subqueries - Because CIM doesn't have subproperties, we need alternatives to find all equipment nested in subcontainers:
cim:EquipmentContainer.Equipments|cim:Substation.VoltageLevels|cim:VoltageLevel.Bays
-
Equipments
are connected through theirTerminals
, by using aConnectivityNode
. So we need a property path like this (CIM has inverses for every relation):
cim:ConductingEquipment.Terminals / cim:Terminal.ConnectivityNode /
cim:ConnectivityNode.Terminals / cim:Terminal.ConductingEquipment
- We can use
sesame:directType
to access only the leaf-level (concrete) classes of resources. But in this case we check with concrete classes only, so usingrdf:type
(aliased asa
) gives the same result -
distinct
is required because a substation may be connected to a line through different parts - We must do
distinct
in each subquery in order to reduce the number of solutions passed from the first subquery to the second - The first subquery ensures that
?line
has the correct type, then just passes this binding into the second subquery - The filter checks that the two substations are different (otherwise we may well come back to
?sub1
).- The
!=
check is appropriate since?sub
is bound, but if we were looking for all pairs of substations, it's better to use an asymmetric checkfilter(str(?sub1) < str(?sub2)
to eliminate symmetric solutions
- The
I asked a LLM "with reasoning" to simplify the query:
- Q: is the query complicated? can you simplify it?
- Reasoned for 15 seconds: Breaking down the query, Mapping connections, Simplifying the SPARQL query, Streamlining the query, Mapping connections, Refining the SPARQL query, Enhancing query clarity...
PREFIX cim: <https://cim.ucaiug.io/ns#>
SELECT DISTINCT ?sub1Name ?lineName ?sub2Name
WHERE {
?sub1 a cim:Substation ;
cim:IdentifiedObject.name "ARENDAL" .
?sub2 a cim:Substation ;
cim:IdentifiedObject.name ?sub2Name .
?sub1 (cim:EquipmentContainer.Equipments|
cim:Substation.VoltageLevels|
cim:VoltageLevel.Bays)+ /
cim:ConductingEquipment.Terminals /
cim:Terminal.ConnectivityNode /
cim:ConnectivityNode.Terminals /
cim:Terminal.ConductingEquipment /
cim:Equipment.EquipmentContainer ?line .
?sub2 (cim:EquipmentContainer.Equipments|
cim:Substation.VoltageLevels|
cim:VoltageLevel.Bays)+ /
cim:ConductingEquipment.Terminals /
cim:Terminal.ConnectivityNode /
cim:ConnectivityNode.Terminals /
cim:Terminal.ConductingEquipment /
cim:Equipment.EquipmentContainer ?line .
?line a cim:Line ;
cim:IdentifiedObject.name ?lineName .
FILTER(?sub1 != ?sub2)
BIND("ARENDAL" as ?sub1Name)
}
This may be easier to read (depending on your preference), but it is not easier to understand or generate.
More importantly, it's a lot less efficient (0.8s vs 0.1s) It does DISTINCT at the end only, not for each subquery.
The query can be simplified a lot if we add inferred relations.
Assume a namespace cimr:
(CIM Rules).
In CIM, all relations have inverses, so we do the same.
We define the following:
- Union:
cim:EquipmentContainer.Equipments|cim:Substation.VoltageLevels|cim:VoltageLevel.Bays
->cimr:hasPart
; inversecimr:isPart
- Union:
cim:Terminal.ConductingEquipment|cim:Terminal.AuxiliaryEquipment
->cimr:Terminal.Equipment
; inversecimr:Equipment.Terminals
- Transitive closure:
cimr:hasPart+
->cimr:hasPartTransitive
; inversecimr:isPartTransitive
- Property path:
cim:ConductingEquipment.Terminals / cim:Terminal.ConnectivityNode / cim:ConnectivityNode.Terminals / cim:Terminal.ConductingEquipment
->cimr:connectedTo
(symmetric) - Property path:
cimr:hasPartTransitive / cimr:connectedTo / cimr:isPartTransitive
->cimr:connectedThroughPart
(symmetric)
Here is a dependency graph of the relations:
Legend:
- Relations are shown as boxes with background:
- Explicit: white background
- Inferred: lightgray background
- Inferences are shown as lines with color:
- Union (
rdfs:subPropertyOf
): blue - Property chain (
owl:propertyChainAxiom
): green - Transitive (
rdfs:subPropertyOf
andowl:TransitiveProperty
): red - Inverse (
owl:inverseOf
) or symmetric (owl:SymmetricProperty
): black
- Union (
After adding cimr:
derived relations, the query becomes much simpler:
select ?sub1Name ?lineName ?sub2Name {
values ?sub1Name {"ARENDAL"}
?sub1 a cim:Substation; cim:IdentifiedObject.name ?sub1Name;
cimr:connectedThroughPart ?line.
?line a cim:Line; cim:IdentifiedObject.name ?lineName.
?sub2 a cim:Substation; cim:IdentifiedObject.name ?sub2Name;
cimr:connectedThroughPart ?line.
filter(?sub1 != ?sub2)
}
This diagram shows the graph patterns of the query (compare to Difficult Query Diagram):
The project Inst4CIM-KG is dedicated to the evolution of CIM ontologies, instance data representation, and SHACL shapes.
The section Reasoning discusses what reasoning is appropriate for CIM.
- CIM defines
rdfs:subClassOf
reasoning-
rdf:type
querying can rely on it, which will simplify many queries. To access only the direct (concrete) class of a resource, usesesame:directType
- SHACL should rely on it, which allows significant simplifications and speed-ups
-
- CIM defines
owl:inverseOf
(all CIM relations have inverses).- We concluded that inverse reasoning is not mandatory, but is desirable for querying.
- Otherwise one must carefully track the direction of relations and use inverse property paths if needed
We also add:
-
rdfs:subPropertyOf
: to make a union of relations.- CIM doesn't have any subproperty definitions, but issue #156 CIM needs subproperties argues that it should
-
owl:TransitiveProperty
: to represent parts (containers and equipment) nested any number of levels deep -
owl:propertyChainAxiom
: to "chain" several relations together -
owl:SymmetricProperty
: two inferred relations are declared symmetric, but we don't need this reasoning since the respective property chains are already symmetric.
All the constructs mentioned above are part of OWL2 rules, so we can use standard inference.
We can use the owl2-rl-optimized
ruleset.
The cimr.ttl ontology implements the required reasoning as follows (note: the file may move, use the Github search functionality if the link is broken):
cim:EquipmentContainer.Equipments rdfs:subPropertyOf cimr:hasPart.
cim:Substation.VoltageLevels rdfs:subPropertyOf cimr:hasPart.
cim:VoltageLevel.Bays rdfs:subPropertyOf cimr:hasPart.
cimr:hasPart owl:inverseOf cimr:isPart.
cimr:hasPart rdfs:subPropertyOf cimr:hasPartTransitive.
cimr:hasPartTransitive owl:inverseOf cimr:isPartTransitive.
cimr:hasPartTransitive a owl:TransitiveProperty.
cim:Terminal.ConductingEquipment rdfs:subPropertyOf cimr:Terminal.Equipment.
cim:Terminal.AuxiliaryEquipment rdfs:subPropertyOf cimr:Terminal.Equipment.
cimr:Terminal.Equipment owl:inverseOf cimr:Equipment.Terminals.
cimr:connectedTo owl:propertyChainAxiom
(cimr:Equipment.Terminals cim:Terminal.ConnectivityNode cim:ConnectivityNode.Terminals cimr:Terminal.Equipment).
cimr:connectedTo a owl:SymmetricProperty.
cimr:connectedThroughPart owl:propertyChainAxiom
(cimr:hasPartTransitive cimr:connectedTo cimr:isPartTransitive).
cimr:connectedThroughPart a owl:SymmetricProperty.
These few axioms implement some powerful reasoning. We can check how much that increases inferred statements (expansion ratio). Hover over the repo name (top-right corner) and check the triple counts:
- The basic CIM ontologies use RDFS subclass, OWL inverse
- triples: 63.5k explicit, 17.6k inferred, 81.2k total: 1.28x expansion
-
cimr
uses RDFS subclass and subproperty, OWL inverse, transitive, propertyChainAxiom- triples: 63.5k explicit, 60.4k inferred, 124k total: 1.95x expansion
In other projects we see typical expansion ratios of 1.15-1.20x, but 1.95x is not too much. Doubling the KG size will add only a few percent to querying time.
Apart from standard inference, we can use custom rulesets to:
- Tighten up the derived relations to exclude some inferences that we don't need
- Optimize inference rule efficiency
See the GraphDB documentation on Rules Optimizations for some ideas. We use a custom ruleset cim_owl2-rl-optimized that:
- Is optimised to remove tautologies
- Uses a more efficient
transitiveOver
rule - Can be optimized further by using fixed-arity property chains, instead of chains represented with
rdf:List
Let's illustrate the last two points:
hasPartTransitive
is declared transitive as follows:
cimr:hasPartTransitive a owl:TransitiveProperty.
The standard implementation of transitivity is as follows:
Id: prp_trp
p <rdf:type> <owl:TransitiveProperty>
x p y
y p z
---------------
x p z
Every time p
is inserted, the database should look for other instances of p
connected on the left or right of it and extend the chain with one more transitive link.
In other words, the database should potentially consider every split of a chain to infer the closure of the chain,
leading to o(n^3)
complexity where n
is the maximum chain length.
We can use a custom construct to optimize this (see here).
Instead of owl:TransitiveProperty
, we declare the relation as follows:
cimr:hasPartTransitive psys:transitiveOver cimr:hasPart.
(psys
stands for "PROTON System ontology", which was made by Ontotext some 20 years ago).
The rule implementing it is:
Id: psys_transitiveOver
p <psys:transitiveOver> q
x p y
y q z
---------------
x p z
This looks deceptively similar to the previous rule,
but there is a crucial difference:
it distinguishes the step property q
(cimr:hasPart
) from the transitive closure p
(cimr:hasPartTransitive
).
The step can only be added on the right of the transitive chain.
This is an arbitrary restriction, and we have considered psys:transitiveLeft
in other projects.
But if the transitive prop is not inserted explicitly (which is a good practice anyway),
then the completeness of inference is guaranteed regardless of the order of inserting the step property.
This reduces reasoning complexity to o(n^2)
, which makes a big difference for long transitive chains.
We use two property paths: of length (arity) 3 and 4:
cimr:connectedTo owl:propertyChainAxiom
(cimr:Equipment.Terminals cim:Terminal.ConnectivityNode cim:ConnectivityNode.Terminals cimr:Terminal.Equipment).
cimr:connectedThroughPart owl:propertyChainAxiom
(cimr:hasPartTransitive cimr:connectedTo cimr:isPartTransitive).
These chains are represented as rdf:List
(shown above using the parenthesised Turtle shortcut notation).
The rules to implement such generic chains involve unrolling the list
and the use of a hidden onto:_checkChain
context:
Id: prp_spo2_1
p <owl:propertyChainAxiom> pc
start pc last [Context <onto:_checkChain>]
----------------------------
start p last
Id: prp_spo2_2
pc <rdf:first> p
pc <rdf:rest> t [Constraint t != <rdf:nil>]
start p next
next t last [Context <onto:_checkChain>]
----------------------------
start pc last [Context <onto:_checkChain>]
Id: prp_spo2_3
pc <rdf:first> p
pc <rdf:rest> <rdf:nil>
start p last
----------------------------
start pc last [Context <onto:_checkChain>]
(If you can prove that this works as expected, you deserve a PhD in logical reasoning. I have only done it with lists of length 2.)
We can simplify these rules if we consider chains of fixed length (arity). For example, we can declare the first chain as follows:
cimr:Connectedto a psys:PropChain4 ;
psys:premise1 cimr:Equipment.Terminals ;
psys:premise2 cim:Terminal.ConnectivityNode ;
psys:premise3 cim:ConnectivityNode.Terminals;
psys:premise4 cimr:Terminal.Equipment ;
psys:conclusion cimr:connectedTo .
The implementaton is a single non-recursive rule since it uses this fixed-arity structure:
Id: psys_PropChain4
t <rdf:type> <psys:PropChain4>
t <psys:premise1> p1
t <psys:premise2> p2
t <psys:premise2> p3
t <psys:premise2> p4
t <psys:conclusion> q
x1 p1 x2
x2 p2 x3
x3 p3 x4
x4 p4 x5
----------------
x1 q x5
It was also important to describe (document) the derived properties well, to allow the LLM to discover them and use them.
LLMs are widely used for Natural Language Querying (NLQ). SPARQL is a great target query language because of its high level of abstraction and easy joining of data through graph patterns.
Nevertheless, when applied on a complex ontology like the electrical CIM, LLMs struggle to generate some of the more complex queries. We show how RDF inference can make this task much simpler.
Inferring "shortcut" relations is not a new idea. I have seen it at least twice before:
-
British Museum CIDOC CRM Fundamental Relations Implementation [1,2]
which reduces complex networks like the one below
to a single relation like
FC70_Thing FR7_from_place E53_Place
- The ASHRAE 223p ontology used in Building Management Systems to describe producers (eg a heater), consumers (eg a radiator), flows, sensors, actuators, and the connections between them. It has an elaborate building parts connectivity structure:
- Large-scale Reasoning with a Complex Cultural Heritage Ontology (CIDOC CRM). Alexiev, V.; Manov, D.; Parvanova, J.; and Petrov, S. In Workshop Practical Experiences with CIDOC CRM and its Extensions (CRMEX 2013) at TPDL 2013, volume 1117, Valetta, Malta, September 2013. CEUR WS Paper slides preprint
- Implementing CIDOC CRM Search Based on Fundamental Relations and OWLIM Rules. Alexiev, V. In Workshop on Semantic Digital Archives (SDA 2012), part of International Conference on Theory and Practice of Digital Libraries (TPDL 2012), volume 912, Paphos, Cyprus, September 2012. CEUR WS Paper slides published
This work is part of the Talk2PowerSystem
project of Statnett and Graphwise.
This blog post is based on the wiki page Talk2PowerSystem/Inference, which has a few more technical details.