Inference - statnett/Talk2PowerSystem GitHub Wiki

Tasks

#16 (DONE) consider derived (shortcut) props
#20 cimex ontology: decide prefix and prop names
#156 CIM needs subproperties
#93 (DONE) Removing Tautologies

Complex Query

Consider this Demo1 question:

Q1.3 Substation Connectivity "List all substations that are connected via an AC-line or a DC line to substation named XYZ".

The query for it looks like this: (saved query Q3) at https://cim.ontotext.com/graphdb (no login required):

PREFIX cim: <https://cim.ucaiug.io/ns#>
PREFIX sesame: <http://www.openrdf.org/schema/sesame#>
select ?sub1Name ?lineName ?sub2Name {
  {select distinct * {
    values ?sub1Name {"ARENDAL"}
    ?sub1 a cim:Substation;
      cim:IdentifiedObject.name ?sub1Name;
      (cim:EquipmentContainer.Equipments|cim:Substation.VoltageLevels|cim:VoltageLevel.Bays)+ / # equipment in ?sub1
      cim:ConductingEquipment.Terminals / cim:Terminal.ConnectivityNode / cim:ConnectivityNode.Terminals / cim:Terminal.ConductingEquipment / # connected to segment
      cim:Equipment.EquipmentContainer ?line. # part of ?line
    ?line a cim:Line; cim:IdentifiedObject.name ?lineName}}
  {select distinct * {
    ?sub2 a cim:Substation;
      cim:IdentifiedObject.name ?sub2Name;
      (cim:EquipmentContainer.Equipments|cim:Substation.VoltageLevels|cim:VoltageLevel.Bays)+ / # equipment in ?sub2
      cim:ConductingEquipment.Terminals / cim:Terminal.ConnectivityNode / cim:ConnectivityNode.Terminals / cim:Terminal.ConductingEquipment / # connected to segment
      cim:Equipment.EquipmentContainer ?line}}
  filter(?sub1 != ?sub2)
}

This is a correct and fast query (takes 0.1s) but it's overly complicated to be generated with LLM (a polite way of saying FUBAR). Explanation:

The first subquery finds equipment in substation ?sub1 that is connected to a segment of a ?line
Because CIM doesn't have subproperties, we need these alternatives to find all equipment nested in subcontainers: cim:EquipmentContainer.Equipments|cim:Substation.VoltageLevels|cim:VoltageLevel.Bays
The second subquery finds another substation ?sub2 that connected in the same way to ?line
The filter must be outside both subqueries because ?sub1, ?sub2 are not shared between the subqueries

LLM Explanation and Simplification

I didn't dare try to ask LLM to generate, but LLM (GPT o1) is very good at explaining the query:

Q: what does this query do?
- Highlighting the query's actions The SPARQL query starts by identifying sub1 by the name "ARENDAL" and traces through its related equipment. Then, it finds lines connected to sub1, storing the line name. A similar process follows for sub2.
  - Synthesizing line connections The query pairs sub1 and sub2 sharing the same line. It starts with "ARENDAL," finding all substations connected to lines associated with "ARENDAL" and then makes pairwise connections.
- A: This query finds all transmission lines that connect the substation named "ARENDAL" to any other substation, returning the names of “ARENDAL”, the shared line, and the second substation.

I also asked it to simplify:

Q: is it complicated? can you simplify it?
Reasoned for 15 seconds: Breaking down the query, Mapping connections, Simplifying the SPARQL query, Streamlining the query, Mapping connections, Refining the SPARQL query, Enhancing query clarity

PREFIX cim: <https://cim.ucaiug.io/ns#>
SELECT DISTINCT ?sub1Name ?lineName ?sub2Name
WHERE {
  ?sub1 a cim:Substation ;
        cim:IdentifiedObject.name "ARENDAL" .
  ?sub2 a cim:Substation ;
        cim:IdentifiedObject.name ?sub2Name .

  ?sub1 (cim:EquipmentContainer.Equipments|
         cim:Substation.VoltageLevels|
         cim:VoltageLevel.Bays)+ /
         cim:ConductingEquipment.Terminals /
         cim:Terminal.ConnectivityNode /
         cim:ConnectivityNode.Terminals /
         cim:Terminal.ConductingEquipment /
         cim:Equipment.EquipmentContainer ?line .

  ?sub2 (cim:EquipmentContainer.Equipments|
         cim:Substation.VoltageLevels|
         cim:VoltageLevel.Bays)+ /
         cim:ConductingEquipment.Terminals /
         cim:Terminal.ConnectivityNode /
         cim:ConnectivityNode.Terminals /
         cim:Terminal.ConductingEquipment /
         cim:Equipment.EquipmentContainer ?line .

  ?line a cim:Line ;
        cim:IdentifiedObject.name ?lineName .

  FILTER(?sub1 != ?sub2)

  BIND("ARENDAL" as ?sub1Name)
}

Ok, this is easier to read, but not easier to understand or generate, and it's a lot less efficient (0.8s vs 0.1s) because it does DISTINCT at the end only.

Derived Props

The query can be simplified a lot if we add derived props. Assume a namespace cimr: (CIM Extensions):

cim:EquipmentContainer.Equipments|cim:Substation.VoltageLevels|cim:VoltageLevel.Bays -> cimr:hasPart; inverse cimr:isPart
cim:Terminal.ConductingEquipment|cim:Terminal.AuxiliaryEquipment -> cimr:Terminal.Equipment; inverse cimr:Equipment.Terminals
cimr:hasPart+ -> cimr:hasPartTransitive; inverse cimr:isPartTransitive
cim:ConductingEquipment.Terminals / cim:Terminal.ConnectivityNode / cim:ConnectivityNode.Terminals / cim:Terminal.ConductingEquipment -> cimr:connectedTo (symmetric)
cimr:hasPartTransitive / cimr:connectedTo / cimr:isPartTransitive -> cimr:connectedThroughPart (symmetric)

Aside: ASHRAE/Bricks Connections

If you think CIM has a complex connection model, consider the ASHRAE 223p standard that has an even more elaborate connection model. The same is referenced in the Bricks Schema that has a simpler model. ASHRAE 223p and Bricks are used in Building Management Systems to describe producers (eg a heater), consumers (eg a radiator), flows, sensors, actuators, and the connections between them.

cnx is the basic asserted (symmetric) relation, and all relations on the following figure can be inferred from it:

ASHRAE inference is implemented using SHACL Rules (Triple and SPARQL rules) as discussed in data-shapes#343. How this could be implemented efficiently is discussed in data-shapes#347.

Inference

Inst4CIM-KG section Reasoning discusses what reasoning is appropriate with CIM.

CIM defines rdfs:subClassOf reasoning and SHACL should rely on it.
CIM defines owl:inverseOf (all CIM relations have inverses) but doesn't rely on it.

Here we enable the above and add:

rdfs:subPropertyOf: needed for cimr:hasPart, cimr:isPart
owl:TransitiveProperty: needed for cimr:hasPartTransitive, cimr:isPartTransitive
owl:propertyChainAxiom: needed for cimr:connectedTo, cimr:connectedThroughPart
owl:SymmetricProperty: cimr:connectedTo, cimr:connectedThroughPart are declared symmetric, but we don't need this reasoning since the respective property chains are already symmetric.

Implementing Reasoning

Load the cimr.ttl ontology.

We use a custom ruleset cim_owl2-rl-optimized (see GDB doc) Which is:

optimised to remove tautologies (see #93)
uses a more efficient transitiveOver rule (see here)
can further be optimized by using more custom rules (e.g. fixed-arity property chains, instead of chains represented with rdf:List).

This query can be used to load and use it directly.

PREFIX sys: <http://www.ontotext.com/owlim/system#>
INSERT DATA {
    <_:cim-owl-rl-optimised> sys:addRuleset <https://raw.githubusercontent.com/statnett/Talk2PowerSystem/refs/heads/main/load/resources/cim_owl2-rl-optimized.pie> .
    [] sys:defaultRuleset "cim-owl-rl-optimised". 
    [] sys:reinfer [].
}

Check that the correct ruleset is activated:

prefix sys: <http://www.ontotext.com/owlim/system#>
SELECT ?state ?ruleset {
    ?state sys:listRulesets ?ruleset
}

We now check how much that increases inferred props (expansion ratio). Hover over the repo name (top-right corner) and check the triple counts:

The basic CIM ontologies use RDFS subclass, OWL inverse; CIM requires subclass but not inverse; but we had both.
- triples: 63.5k explicit, 17.6k inferred, 81.2k total: 1.28x expansion
cimr uses RDFS subclass and subproperty, OWL inverse, transitive, propertyChainAxiom (owl-rl2-optimized reasoning).
- triples: 63.5k explicit, 60.4k inferred, 124k total: 1.95x expansion

Typical expansion ratios are 1.15-1.20, but 1.95 is not too much. Doubling the KG size will add only a few percent to querying times. Nevertheless, we could think about tightening up the derived subproperties to exclude some inferences that we don't need.

Simplified Query

After adding cimr: derived props, the query becomes much simpler (saved query Q3-simple):

PREFIX cimr: <https://cim.ucaiug.io/rules#>
PREFIX cim: <http://iec.ch/TC57/2013/CIM-schema-cim16#>
PREFIX sesame: <http://www.openrdf.org/schema/sesame#>
select ?sub1Name ?lineName ?sub2Name {
    values ?sub1Name {"ARENDAL"}
    ?sub1 a cim:Substation; cim:IdentifiedObject.name ?sub1Name;
      cimr:connectedThroughPart ?line.
    ?line a cim:Line; cim:IdentifiedObject.name ?lineName.
    ?sub2 a cim:Substation; cim:IdentifiedObject.name ?sub2Name;
      cimr:connectedThroughPart ?line.
    filter(?sub1 != ?sub2)
}

Discussion and Prior Examples

Statnett:

We ended up making shortcuts in the model to simplify the query, but we cant alter the model every time we have a new use case. Or can we ?? And still keep it manageable?
Both containment relations and connectivity are very commonly used. We should keep in mind is that there are several different hierarchies that could be used, depending on who's using the model. A simple non-electrical example: for some users substations are part of geographical regions but for others they primarily considered part of a bidding zone, while a third set of users primarily talk about who owns (or operates) the substation.
I'm not sure whether any of our internal models are doing this, but the standard encourages Line objects that attach to more than two substations in the case of switchless junctions. See figure 9 and 10 in IEC 61970-301:2020 for an example of this.

We need some structure or principles or "theory" what inferred props to create. 10y ago I worked with a complex ontology in heritage/archeology/history called CIDOC CRM. It captures large and sprawling graphs of situations and attribution. The theory of what shortcuts to make was called "Fundamental Relations". Eg one FR is "thing is From place", which in CRM could mean:

thing was made in place
part of thing was made in subplace of place
thing was made by person born in place
thing was made by person who flourished (worked) in place
thing was made for important event that happened at place (eg "Vatican tiara")

Now add subprops and recursive loops at various spots, and you'll quickly see how "From" collapses a whole bunch of possible "situation subgraphs" into one easy to use relation.

References:

Large-scale Reasoning with a Complex Cultural Heritage Ontology (CIDOC CRM). Alexiev, V.; Manov, D.; Parvanova, J.; and Petrov, S. In Workshop Practical Experiences with CIDOC CRM and its Extensions (CRMEX 2013) at TPDL 2013, volume 1117, Valetta, Malta, September 2013. CEUR WS Paper slides preprint
Implementing CIDOC CRM Search Based on Fundamental Relations and OWLIM Rules. Alexiev, V. In Workshop on Semantic Digital Archives (SDA 2012), part of International Conference on Theory and Practice of Digital Libraries (TPDL 2012), volume 912, Paphos, Cyprus, September 2012. CEUR WS Paper slides published
FR Implementation (in an old Confluence, so don't mind the security warning). In particular I extracted the rules from the text, and implemented an expander from a shorthand form

p <ptop:transitiveOver> q; x p y; y q z => x p z

to GraphDB Rules (.pie) notation

p <ptop:transitiveOver> q
x p y
y q z
--------
x p z

FR Dependency Graph of relation dependencies (of course derived from the text!). It shows me that I don't have dependency loops, and have not mistyped a relation (no disconnected parts)

What I learned since is that it's better to use more generic rule structures; and push the domain-specifics into axioms. This gives various ideas how to use specialized rule constructs while keeping domain-specific stuff in axioms (not to overload the rules file with domain-specific terminology):

Extending OWL2 Property Constructs with OWLIM Rules. Alexiev, V. Technical Report Ontotext Corp, September 2014.

The recent ASHRAE 233P standard has a lot of connectivity stuff (about buildings) that we can use as inspiration.