Inference - statnett/Talk2PowerSystem GitHub Wiki
Tasks
- #16 (DONE) consider derived (shortcut) props
- #20 cimex ontology: decide prefix and prop names
- #156 CIM needs subproperties
- #93 (DONE) Removing Tautologies
Complex Query
Consider this Demo1 question:
- Q1.3 Substation Connectivity "List all substations that are connected via an AC-line or a DC line to substation named XYZ".
The query for it looks like this: (saved query Q3
)
at https://cim.ontotext.com/graphql, user "readonly", password "1234567890":
PREFIX cim: <http://iec.ch/TC57/2013/CIM-schema-cim16#>
PREFIX sesame: <http://www.openrdf.org/schema/sesame#>
select ?sub1Name ?lineName ?sub2Name {
{select distinct * {
values ?sub1Name {"ARENDAL"}
?sub1 a cim:Substation;
cim:IdentifiedObject.name ?sub1Name;
(cim:EquipmentContainer.Equipments|cim:Substation.VoltageLevels|cim:VoltageLevel.Bays)+ / # equipment in ?sub1
cim:ConductingEquipment.Terminals / cim:Terminal.ConnectivityNode / cim:ConnectivityNode.Terminals / cim:Terminal.ConductingEquipment / # connected to segment
cim:Equipment.EquipmentContainer ?line. # part of ?line
?line a cim:Line; cim:IdentifiedObject.name ?lineName}}
{select distinct * {
?sub2 a cim:Substation;
cim:IdentifiedObject.name ?sub2Name;
(cim:EquipmentContainer.Equipments|cim:Substation.VoltageLevels|cim:VoltageLevel.Bays)+ / # equipment in ?sub2
cim:ConductingEquipment.Terminals / cim:Terminal.ConnectivityNode / cim:ConnectivityNode.Terminals / cim:Terminal.ConductingEquipment / # connected to segment
cim:Equipment.EquipmentContainer ?line}}
filter(?sub1 != ?sub2)
}
This is a correct and fast query (takes 0.1s) but it's overly complicated to be generated with LLM (a polite way of saying FUBAR). Explanation:
- The first subquery finds equipment in substation ?sub1 that is connected to a segment of a ?line
- Because CIM doesn't have subproperties, we need these alternatives to find all equipment nested in subcontainers:
cim:EquipmentContainer.Equipments|cim:Substation.VoltageLevels|cim:VoltageLevel.Bays
- The second subquery finds another substation ?sub2 that connected in the same way to ?line
- The filter must be outside both subqueries because ?sub1, ?sub2 are not shared between the subqueries
LLM Explanation and Simplification
I didn't dare try to ask LLM to generate, but LLM (GPT o1) is very good at explaining the query:
- Q: what does this query do?
- Highlighting the query's actions
The SPARQL query starts by identifying sub1 by the name "ARENDAL" and traces through its related equipment. Then, it finds lines connected to sub1, storing the line name. A similar process follows for sub2.
- Synthesizing line connections The query pairs sub1 and sub2 sharing the same line. It starts with "ARENDAL," finding all substations connected to lines associated with "ARENDAL" and then makes pairwise connections.
- A: This query finds all transmission lines that connect the substation named "ARENDAL" to any other substation, returning the names of “ARENDAL”, the shared line, and the second substation.
- Highlighting the query's actions
The SPARQL query starts by identifying sub1 by the name "ARENDAL" and traces through its related equipment. Then, it finds lines connected to sub1, storing the line name. A similar process follows for sub2.
I also asked it to simplify:
- Q: is it complicated? can you simplify it?
- Reasoned for 15 seconds: Breaking down the query, Mapping connections, Simplifying the SPARQL query, Streamlining the query, Mapping connections, Refining the SPARQL query, Enhancing query clarity
PREFIX cim: <http://iec.ch/TC57/2013/CIM-schema-cim16#>
SELECT DISTINCT ?sub1Name ?lineName ?sub2Name
WHERE {
?sub1 a cim:Substation ;
cim:IdentifiedObject.name "ARENDAL" .
?sub2 a cim:Substation ;
cim:IdentifiedObject.name ?sub2Name .
?sub1 (cim:EquipmentContainer.Equipments|
cim:Substation.VoltageLevels|
cim:VoltageLevel.Bays)+ /
cim:ConductingEquipment.Terminals /
cim:Terminal.ConnectivityNode /
cim:ConnectivityNode.Terminals /
cim:Terminal.ConductingEquipment /
cim:Equipment.EquipmentContainer ?line .
?sub2 (cim:EquipmentContainer.Equipments|
cim:Substation.VoltageLevels|
cim:VoltageLevel.Bays)+ /
cim:ConductingEquipment.Terminals /
cim:Terminal.ConnectivityNode /
cim:ConnectivityNode.Terminals /
cim:Terminal.ConductingEquipment /
cim:Equipment.EquipmentContainer ?line .
?line a cim:Line ;
cim:IdentifiedObject.name ?lineName .
FILTER(?sub1 != ?sub2)
BIND("ARENDAL" as ?sub1Name)
}
Ok, this is easier to read, but not easier to understand or generate, and it's a lot less efficient (0.8s vs 0.1s) because it does DISTINCT at the end only.
Derived Props
The query can be simplified a lot if we add derived props.
Assume a namespace cimr:
(CIM Extensions):
cim:EquipmentContainer.Equipments|cim:Substation.VoltageLevels|cim:VoltageLevel.Bays
->cimr:hasPart
; inversecimr:isPart
cim:Terminal.ConductingEquipment|cim:Terminal.AuxiliaryEquipment
->cimr:Terminal.Equipment
; inversecimr:Equipment.Terminals
cimr:hasPart+
->cimr:hasPartTransitive
; inversecimr:isPartTransitive
cim:ConductingEquipment.Terminals / cim:Terminal.ConnectivityNode / cim:ConnectivityNode.Terminals / cim:Terminal.ConductingEquipment
->cimr:connectedTo
(symmetric)cimr:hasPartTransitive / cimr:connectedTo / cimr:isPartTransitive
->cimr:connectedThroughPart
(symmetric)
Aside: ASHRAE/Bricks Connections
If you think CIM has a complex connection model, consider the ASHRAE 223p standard that has an even more elaborate connection model. The same is referenced in the Bricks Schema that has a simpler model. ASHRAE 223p and Bricks are used in Building Management Systems to describe producers (eg a heater), consumers (eg a radiator), flows, sensors, actuators, and the connections between them.
cnx
is the basic asserted (symmetric) relation, and all relations on the following figure can be inferred from it:
ASHRAE inference is implemented using SHACL Rules (Triple and SPARQL rules) as discussed in data-shapes#343. How this could be implemented efficiently is discussed in data-shapes#347.
Inference
Inst4CIM-KG section Reasoning discusses what reasoning is appropriate with CIM.
- CIM defines
rdfs:subClassOf
reasoning and SHACL should rely on it. - CIM defines
owl:inverseOf
(all CIM relations have inverses) but doesn't rely on it.
Here we enable the above and add:
rdfs:subPropertyOf
: needed forcimr:hasPart, cimr:isPart
owl:TransitiveProperty
: needed forcimr:hasPartTransitive, cimr:isPartTransitive
owl:propertyChainAxiom
: needed forcimr:connectedTo, cimr:connectedThroughPart
owl:SymmetricProperty
:cimr:connectedTo, cimr:connectedThroughPart
are declared symmetric, but we don't need this reasoning since the respective property chains are already symmetric.
We use a custom ruleset owl2-rl-optimized (see GDB doc) Which is:
- optimised to remove tautologies (see #93)
- uses a more efficient
transitiveOver
rule (see here) - can further be optimized by using more custom rules (e.g. fixed-arity property chains, instead of chains represented with
rdf:List
).
This query can be used to load and use it directly.
PREFIX sys: <http://www.ontotext.com/owlim/system#>
INSERT DATA {
<_:cim-owl-rl-optimised> sys:addRuleset <https://raw.githubusercontent.com/statnett/Talk2PowerSystem/refs/heads/main/cim_owl2-rl-optimized.pie> .
[] sys:defaultRuleset "cim-owl-rl-optimised".
[] sys:reinfer [].
}
Discussion and Prior Examples
Statnett:
- We ended up making shortcuts in the model to simplify the query, but we cant alter the model every time we have a new use case. Or can we ?? And still keep it manageable?
- Both containment relations and connectivity are very commonly used. We should keep in mind is that there are several different hierarchies that could be used, depending on who's using the model. A simple non-electrical example: for some users substations are part of geographical regions but for others they primarily considered part of a bidding zone, while a third set of users primarily talk about who owns (or operates) the substation.
- I'm not sure whether any of our internal models are doing this, but the standard encourages Line objects that attach to more than two substations in the case of switchless junctions. See figure 9 and 10 in IEC 61970-301:2020 for an example of this.
We need some structure or principles or "theory" what inferred props to create. 10y ago I worked with a complex ontology in heritage/archeology/history called CIDOC CRM. It captures large and sprawling graphs of situations and attribution. The theory of what shortcuts to make was called "Fundamental Relations". Eg one FR is "thing is From place", which in CRM could mean:
- thing was made in place
- part of thing was made in subplace of place
- thing was made by person born in place
- thing was made by person who flourished (worked) in place
- thing was made for important event that happened at place (eg "Vatican tiara")
Now add subprops and recursive loops at various spots, and you'll quickly see how "From" collapses a whole bunch of possible "situation subgraphs" into one easy to use relation.
References:
- Large-scale Reasoning with a Complex Cultural Heritage Ontology (CIDOC CRM). Alexiev, V.; Manov, D.; Parvanova, J.; and Petrov, S. In Workshop Practical Experiences with CIDOC CRM and its Extensions (CRMEX 2013) at TPDL 2013, volume 1117, Valetta, Malta, September 2013. CEUR WS Paper slides preprint
- Implementing CIDOC CRM Search Based on Fundamental Relations and OWLIM Rules. Alexiev, V. In Workshop on Semantic Digital Archives (SDA 2012), part of International Conference on Theory and Practice of Digital Libraries (TPDL 2012), volume 912, Paphos, Cyprus, September 2012. CEUR WS Paper slides published
- FR Implementation (in an old Confluence, so don't mind the security warning). In particular I extract the rules from the page, and implemented an expander from a shorthand form
p <ptop:transitiveOver> q; x p y; y q z => x p z
to GraphDB Rules (.pie) notation
p <ptop:transitiveOver> q
x p y
y q z
--------
x p z
- FR Dependency Graph of relation dependencies (of course derived from the page!). It shows me that I don't have dependency loops, and have not mistyped a relation (no disconnected parts)
What I learned since is that it's better to use more generic rule structures; and push the domain-specifics into axioms. This gives various ideas how to use specialized rule constructs while keeping domain-specific stuff in axioms (not to overload the rules file with domain-specific terminology):
- Extending OWL2 Property Constructs with OWLIM Rules. Alexiev, V. Technical Report Ontotext Corp, September 2014.
The recent ASHRAE 233P standard has a lot of connectivity stuff (about buildings) that we can use as inspiration.
Implementing Reasoning
Load the cimex.ttl ontology.
Since initially we had the default Graphdb reasoning (rdfs-polus-optimized
), we use dynamic rule switching:
prefix sys: <http://www.ontotext.com/owlim/system#>
insert data {
[] sys:addRuleset "owl2-rl-optimized".
[] sys:defaultRuleset "owl2-rl-optimized".
[] sys:reinfer [].
}
Check that the correct ruleset is activated:
prefix sys: <http://www.ontotext.com/owlim/system#>
SELECT ?state ?ruleset {
?state sys:listRulesets ?ruleset
}
We now check how much that increases inferred props (expansion ratio). Hover over the repo name (top-right corner) and check the triple counts:
- The basic CIM ontologies use RDFS subclass, OWL inverse; CIM requires subclass but not inverse; but we had both.
- triples: 63.5k explicit, 17.6k inferred, 81.2k total: 1.28x expansion
cimex
uses RDFS subclass and subproperty, OWL inverse, transitive, propertyChainAxiom (owl-rl2-optimized
reasoning).- triples: 63.5k explicit, 60.4k inferred, 124k total: 1.95x expansion
Typical expansion ratios are 1.15-1.20, but 1.95 is not too much. Doubling the KG size will add only a few percent to querying times. Nevertheless, we could think about tightening up the derived subproperties to exclude some inferences that we don't need.
Simplified Query
After adding cimr:
derived props, the query becomes much simpler (saved query Q3-simple
):
PREFIX cimr: <https://cim.ucaiug.io/rules#>
PREFIX cim: <http://iec.ch/TC57/2013/CIM-schema-cim16#>
PREFIX sesame: <http://www.openrdf.org/schema/sesame#>
select ?sub1Name ?lineName ?sub2Name {
values ?sub1Name {"ARENDAL"}
?sub1 a cim:Substation; cim:IdentifiedObject.name ?sub1Name;
cimr:connectedThroughPart ?line.
?line a cim:Line; cim:IdentifiedObject.name ?lineName.
?sub2 a cim:Substation; cim:IdentifiedObject.name ?sub2Name;
cimr:connectedThroughPart ?line.
filter(?sub1 != ?sub2)
}