Inference - statnett/Talk2PowerSystem GitHub Wiki

Tasks

  • #16 (DONE) consider derived (shortcut) props
  • #20 cimex ontology: decide prefix and prop names
  • #156 CIM needs subproperties
  • #93 (DONE) Removing Tautologies
  • #270 (DONE) define terms used in reasoning
  • #218 (DONE) Pare down inference

Introduction

Reasoning simplifies working with the CIM knowledge graph by materializing relationships in the data are not explicit. They are derived from semantic relationships such as subclass, inverse, subproperty, transitive property chains. This enables much simpler SPARQL queries and allows LLMs and humans to work with higher-level, "shortcut" properties (such as cimr:connectedThroughPart) instead of navigating complex property paths.

For a detailed explanation of the motivation and practical examples, read the blog post Using Semantic Reasoning to Help LLM with SPARQL Generation in Electrical CIM.

CIM Reasoning

Inst4CIM-KG section Reasoning discusses what reasoning is appropriate with CIM.

  • CIM defines rdfs:subClassOf reasoning and SHACL should rely on it.
  • CIM defines owl:inverseOf (all CIM relations have inverses) but doesn't rely on it.

Here we enable the above and add:

  • rdfs:subPropertyOf: needed for cimr:hasPart, cimr:isPart
  • owl:TransitiveProperty: needed for cimr:hasPartTransitive, cimr:isPartTransitive
  • owl:propertyChainAxiom: needed for cimr:connectedTo, cimr:connectedThroughPart
  • owl:SymmetricProperty: cimr:connectedTo, cimr:connectedThroughPart are declared symmetric, but we don't need this reasoning since the respective property chains are already symmetric.

Implementation

Load the cimr.ttl ontology.

We use a custom ruleset cim.pie. It is a minimal ruleset that only has rules that we need (see GDB doc) Which is:

  • optimised to remove tautologies (see #93)
  • uses a more efficient transitiveOver rule (see here)
  • is optimized by fixed-arity property chains, instead of chains represented with rdf:List. See (#218)

This query 01-add-inference.ru can be used to load and use it directly.

PREFIX sys: <http://www.ontotext.com/owlim/system#>
INSERT DATA {
    <_:cim> sys:addRuleset <https://raw.githubusercontent.com/statnett/Talk2PowerSystem/refs/heads/main/data/cim.pie> .
    [] sys:defaultRuleset "cim". 
    [] sys:reinfer [].
}

Check that the correct ruleset is activated:

prefix sys: <http://www.ontotext.com/owlim/system#>
SELECT ?state ?ruleset {
    ?state sys:listRulesets ?ruleset
}

Custom Ruleset Reductions

Using a custom ruleset allows us to pare-down inference compared to a standard ruleset. Statistics before and after the change:

ruleset explicit inferred expansion rdf:type rdfs:domain rdfs:range
OWL2-RL-optimized 122,948 199,932 2.63 38568 18875 11916
Custom (pared-down) 122,906 155,887 2.27 32163 5246 5233
Reduction 22% 16% 72% 54%

Eliminate Useless Domain/Range Reasoning

Most importantly, we have eliminated inferred domain/range statements. These are counter-intuitive and useless. Take for example this query that looks for properties related to cim:Switch, i.e. having this class as domain or range:

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX cim: <https://cim.ucaiug.io/ns#>
select ?x ?p ?y {
  {?p rdfs:domain cim:Switch; rdfs:range ?y}
  union {?p rdfs:domain ?x; rdfs:range cim:Switch}
}
  • Without reasoning, it returns only props that are directly related to Switch. You can see that from the prop name:
    • Attributes of switch (Switch.<lowercase>)
    • Outgoing relations of switch (Switch.<Uppercase>)
    • Incoming relations of switch (<Uppercase>.Switch)
x p y
cim:Switch.SwitchSchedules cim:SwitchSchedule
cim:Switch.normalOpen xsd:boolean
cim:Switch.ratedCurrent xsd:float
cim:Switch.retained xsd:boolean
cim:Switch.SvSwitch cim:SvSwitch
cim:Switch.locked xsd:boolean
cim:Switch.open xsd:boolean
nc:Switch.TopologyAction nc:TopologyAction
nc:Switch.SwitchRegularSchedule nc:SwitchRegularSchedule
nc:Switch.SwitchSchedule nc:SwitchSchedule
cim:SwitchSchedule cim:SwitchSchedule.Switch
cim:SvSwitch cim:SvSwitch.Switch
nc:TopologyAction nc:TopologyAction.Switch
nc:SwitchRegularSchedule nc:SwitchRegularSchedule.Switch
nc:SwitchSchedule nc:SwitchSchedule.Switch
  • With reasoning, the query also returns all superclasses. For example:
    • SwitchSchedule.Switch has domain not only SwitchSchedule but also all its superclasses: IdentifiedObject, BasicIntervalSchedule, SeasonDayTypeSchedule, RegularIntervalSchedule
    • Switch.open has domain not only Switch but also all its superclasses: IdentifiedObject, PowerSystemResource, Equipment, ConductingEquipment

This is counter-intuitive and useless.

  • It would be useful if the domain of Switch.open included all its subclasses: Breaker Cut DisconnectingCircuitBreaker Disconnector Fuse GroundDisconnector Jumper LoadBreakSwitch ProtectedSwitch
  • But this is not how rdfs:domain "inheritance" is defined in RDFS.

We guess that RDFS got Type variance wrong (Contravariance vs Covariance).

Discussion and Prior Examples

Statnett:

  • We ended up making shortcuts in the model to simplify the query, but we cant alter the model every time we have a new use case. Or can we ?? And still keep it manageable?
  • Both containment relations and connectivity are very commonly used. We should keep in mind is that there are several different hierarchies that could be used, depending on who's using the model. A simple non-electrical example: for some users substations are part of geographical regions but for others they primarily considered part of a bidding zone, while a third set of users primarily talk about who owns (or operates) the substation.
  • I'm not sure whether any of our internal models are doing this, but the standard encourages Line objects that attach to more than two substations in the case of switchless junctions. See figure 9 and 10 in IEC 61970-301:2020 for an example of this.

We need some structure or principles or "theory" what inferred props to create. 10y ago I worked with a complex ontology in heritage/archeology/history called CIDOC CRM. It captures large and sprawling graphs of situations and attribution. The theory of what shortcuts to make was called "Fundamental Relations". Eg one FR is "thing is From place", which in CRM could mean:

  • thing was made in place
  • part of thing was made in subplace of place
  • thing was made by person born in place
  • thing was made by person who flourished (worked) in place
  • thing was made for important event that happened at place (eg "Vatican tiara")

Now add subprops and recursive loops at various spots, and you'll quickly see how "From" collapses a whole bunch of possible "situation subgraphs" into one easy to use relation.

References:

p <ptop:transitiveOver> q; x p y; y q z => x p z

to GraphDB Rules (.pie) notation

p <ptop:transitiveOver> q
x p y
y q z
--------
x p z
  • FR Dependency Graph of relation dependencies (of course derived from the text!). It shows me that I don't have dependency loops, and have not mistyped a relation (no disconnected parts)

What I learned since is that it's better to use more generic rule structures; and push the domain-specifics into axioms. This gives various ideas how to use specialized rule constructs while keeping domain-specific stuff in axioms (not to overload the rules file with domain-specific terminology):

Aside: ASHRAE/Bricks Connections

The recent ASHRAE 233P standard has a lot of connectivity stuff (about buildings) that we can use as inspiration.

If you think CIM has a complex connection model, consider the ASHRAE 223p standard that has an even more elaborate connection model. The same is referenced in the Bricks Schema that has a simpler model. ASHRAE 223p and Bricks are used in Building Management Systems to describe producers (eg a heater), consumers (eg a radiator), flows, sensors, actuators, and the connections between them.

cnx is the basic asserted (symmetric) relation, and all relations on the following figure can be inferred from it:

ASHRAE inference is implemented using SHACL Rules (Triple and SPARQL rules) as discussed in data-shapes#343. How this could be implemented efficiently is discussed in data-shapes#347.

⚠️ **GitHub.com Fallback** ⚠️