Started in Talk2PowerSystem#17

Demo1 had 5 questions
For Demo2, we target 10 more questions

Terminology:

"original": competency question given by Statnett: shown as a quotation. We convert those to parameterized templates
"paraphrase": asking the same thing with a differnet surface form (or in different language)
"variant": asking a similar but different thing, thus multiplying the number of templates
"TODO"

Q2.1 Count by Type

list all resource types in model (what is there?)

Paraphrases:

what data do you have?
what data is there?
count resources by type
- (that's not a pure paraphrase, more like a variant)
show how many resources are there by class

Variations:

List only part of the class hierarchy, eg "count PSRs by type" or "count identified objects by type"
Various ordering options: alphabetically, or by descending count
What are the top-level non-enumeration "interesting" classes?
What is the class tree starting at a given class?

Considerations:

Use sesame:directType to list only leaf-level (concrete) classes
Disregard non-electrical (non-cim, eu, nc) classes e.g. rdf:List, rdfs:Property, owl:Class
Some variations can be answered from ontology alone, others (e.g. counts) require query. How do we handle each version?
Available ordering is determined by the data e.g. order by date, by amount, by name (alphabetically), etc. Aggregations allow ordering by count on the aggregation but possibly in different ways.
Combination of limit and ordering can be explicit "e.g. biggest 5 substations", or by default

Notes:

Would be nice to have this in UI: click on count to get a list of those objects (drill-down or follow-up)
Models are represented by named graphs but we won't tackle this until later in the project. So for this iteration we just work with all data

Q2.2 List by Type

list all resources of a certain type in model

Variations:

List resources of any of N given types
List resources by super-class vs direct class only
List resources of given type ordered by characteristic (e.g. generators by descending net max output)
List resources of given type grouped by characteristic (e.g. substations grouped by region)

Notes:

Show mRID (with link being the URL), name, direct type
Apply a default limit (eg 100) if no other limit is provided

Q2.3 List by Type and Property

As a Power System Engineer I want to get a filtered list of equipment of a certain type (e.g. generating units) so I am certain that I include all relevant equipment in my further analysis. Typical filtering options include nominal_voltage = 300 kV, bidding_zone = NO1 or MRID contains the string "3413fa")

List resources by class and filter by properties, where properties are class-dependent.

We mean here numeric, boolean or dateTime props, and a condition (equality or range) that is not a substring condition
Also filter by properties pointing at enumeration classes (e.g. by entsoe:LimitTypeKind)

Variations: This question is very suited for multiplication by adding type-dependent properties. Eg this query finds props of GeneratingUnit:

select * {
  ?p rdfs:domain  cim:GeneratingUnit
} order by ?p

TODO: why adding range returns only 19?

  ?p rdfs:domain  cim:GeneratingUnit; rdfs:range ?range

Answer: because it uses old versions of the CIM ontologies. See https://github.com/statnett/Talk2PowerSystem_PM/issues/19 for the newest versions. This variant returns 39 (TODO why?)

  ?p rdfs:domain cim:GeneratingUnit; cims:dataType|rdfs:range ?range

List all analogs / measurements / timeseries of type Active Power.
- Clarification: List all Analog Measurements that have measurementType equal to "ThreePhaseActivePower"

select * {
  ?x a cim:Analog; cim:Measurement.measurementType "ThreePhaseActivePower"
}

List transformers with rma between 0 and 2
- Note: this prop is defined in the pti: namespace (pti:PowerTransformer.rma), but we don't have that ontology. We should translate it to a NC property, or pick another
- Answer: it's not used in production
List transformers that are normally in service
- Note: cim:Equipment.normallyInService true, but in Nordic44 it's always set "true" on all equiment
Find switches that are normally closed
- Note: cim:Switch.normalOpen false

Here we already see a need for 2 specialized tools.

Property Disambiguation

Given a domain (host) class and a fuzzy prop name, find the precise full prop URL

This relates to a problem with CIM's overly-specific prop names (prefixed by the domain class or superclass).
This need has been discussed at length at issue shorten prop names in JSONLD?, with prior analysis in 2023 by Vladimir CIM Shorten Prop Names
How can the LLM know that the prop is Equipment.normallyInService not PowerTransformer.normallyInService?
It would need to trace the class hierarchy upward to find the precise superclass where the prop is attached
Prop suffixes (second part) are not globally unique, eg b is defined at the following classes:
- cim:EquivalentShunt.b
- cim:NonlinearShuntCompensatorPoint.b
- cim:PowerTransformerEnd.b
- cim:TapChangerTablePoint.b
So there is a need for a
- Notice props are called normallyInService but normalOpen; and it's legit to ask "normally closed", so indeed fuzzy matching is needed

Value Disambiguation Tool

Given a fuzzy property name and a value (mentioned implicitly or explicitly), disambiguate to a specific "property-object" pattern. This applies to Boolean, enumerated string or enumerated object (cims:stereotype "enum") properties.

Examples:

"Measurement of type active power" means cim:Measurement.measurementType "ThreePhaseActivePower"
"Switch that is normally closed" means cim:Switch.normalOpen false
"Terminals on phase A" means cim:Terminal.phases cim:PhaseCode.A

Q2.4 List by Type and Properties/Relations

List resources by class and filter by properties (attributes and/or relations), where the properties are class-dependent.

This has multiple complications compared to the previous question:

Multiple criteria
Some of the criteria may be object relations, so that will require either the Value Disambiguation tool, or chaining of queries.
We can have property chains instead of single properties as a filter (e.g. get NominalVoltage through VoltageLevel node)

Variations:

List all analogs / measurements / timeseries of type active power in substation Oslo
- Meaning: List all Analog Measurements that have Measurement.measurementType "ThreePhaseActivePower" and are about Substation with IdentifiedObject.name "OSLO". "About" means the PowerSystemResource that the Analog is referencing.
- Note: we don't have Analog Measurements of Substations in Nordic44, all Measurements are about EnergyCongestionZone.
"List all generators belonging to ScheduleResource NOKG05"
- Meaning: "List cim:GeneratingUnit with pti:GeneratingUnit.ScheduleResource pointing to the pti:ScheduleResourceGeneration with pti:SchedureResource.marketCode 'NOKG05'"

  PREFIX pti: <http://www.pti-us.com/PTI_CIM-schema-cim16#>
  PREFIX cim: <http://iec.ch/TC57/2013/CIM-schema-cim16#>
  SELECT ?generatingUnit ?generatingUnitName { 
    ?scheduleResourceGeneration a pti:ScheduleResourceGeneration; pti:ScheduleResource.marketCode "NOKG05".
    ?generatingUnit pti:GeneratingUnit.ScheduleResource ?scheduleResourceGeneration; cim:IdentifiedObject.name ?generatingUnitName.
  }

Q2.5 Search by mRID, Name, Description

List resources by class and filter by parts of mRID, Name, Description (including and/or excluding some substring)

Variants:

List all synchronous machines that have "M1" or "M2" in the name, but not "300"
- Note: Get the IdentifiedObject.name of all SynchronousMachines. On the names, do "advanced" filtering with OR and NOT.
- Much information is in names (and descriptions), so it would be nice to allow advanced filtering based on name.
- Statnett imagines users navigating in all out 77k Analogs using text search. However, the example given here is using SynchronousMachine.
- TODO: Confirm whether this is a valid need, because regex operations on a large population of objects are very expensive; whereas FTS will require more complex indexing and FTS query generation
Find the PSR "f1769ce8"
- Note: this refers to the unique (first) part of the mRID. See https://github.com/statnett/Talk2PowerSystem_PM/issues/22

Note:

This applies to props mRID, name, alias, (maybe) description
https://github.com/statnett/Talk2PowerSystem_PM/issues/68 Tool: "Identify object" handles it
We use GraphDB's Autocomplete index

Q2.6 Resources Within Area

filtered list of resources, where filter is a form of area division (schedule resource, bidding area, region...)

Clarifications needed:

Picking the right area depends on clarifying Areas and Zones
If a resource (eg ConnectivityPoint, PowerTransformer) doesn't have a link to an area, do we somehow use its relations to figure out its area? Or that is out of scope?
- For this to work on our internal models, it is very often necessary to follow the containment hierarchy (equipment -> bay -> voltagelevel -> substation) to find the relation to an area.
If numerous resources can be associated with an area (eg a Substation and all its parts), do we limit to only the top-level resource?
- OK to limit to top-level resource unless the user asked for a specific type, in which case you probably have to dig further down to find e.g. BusbarSections.

Variations:

List resources filtered by Type and Within Area
Count resources by Type and Within Area

Q2.7 Geospatial Queries

Variants:

List all <RESOURCE_TYPE> <CARDINAL_DIRECTION> of <RESOURCE>
- Example "List all substations north of Trondheim":

Variant 1:

PREFIX cim: <http://iec.ch/TC57/2013/CIM-schema-cim16#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
SELECT ?substation ?substationName ?latitude {
  ?substation a cim:Substation;
     cim:PowerSystemResource.Location ?location;
     cim:IdentifiedObject.name ?substationName.
  ?positionPoint cim:PositionPoint.Location ?location.
  ?positionPoint cim:PositionPoint.yPosition ?latitude.
  FILTER(xsd:float(?latitude) > 63.4305) # 63.4472221818584 is the latitude of Trondheim
}

Critique: I think "Trondheim" here should mean that Substation, and we should not rely on LLM's general geographic knowledge
Note: the range of the predicates xPosition, yPosition is modeled with cims:dataType and it's cim:String, not cim:Float. This should be fixed by https://github.com/statnett/Nordic44/issues/34 "fix datatypes" checkbox. But geographic coordinates will remain strings.

Variant 2: using GeoSPARQL

https://github.com/statnett/Nordic44/issues/41 : convert Nordic44 to GeoSPARQL WKT
https://github.com/statnett/Talk2PowerSystem_PM/issues/72 : Tool "Geospatial indexing"
TODO write the query

Clarifications needed:

What things have point locations? AFAIK in Nordic44 that is only Substations
What things have line locations? AFAIK in Nordic44 that is only ACLineSegments (and there are no DCLineSegments)
Should we allow other geospatial primitives, eg "within 30km of Oslo" (distance) or even "within 30km of the Line Oslo-Kristiansand" (envelope)?
- GraphDB supports older geospatial and newer GeoSPARQL that we can take advantage of.
- The former relies on the WGS84 ontology and the latter relies on GeoSPARQL 1.0
- GraphDB supports "within" but not "envelope"
Do we need to extract geospatial info from descriptions?
- Eg "SA NO2" is described as "NO2 - 2011-05-16: Southern Norway. West of Hasle (420kV), Sylling (4kV). East of Sauda (300kV)"

Q2.8 Measurements by Type and Equipment

find all measurements of a certain type (analog or digital) (Active power, voltage, current ... ) for generator/AC line/power transformer end

This overlaps with Demo1 Q1.5 Measurements.
Nordic44 only has Analog measurements "ThreePhaseActivePower" about EnergyCongestionZone
If that will remain so, we only need to add the "by type" part

Q2.9 Short Lines (Obvious errors)

find lines where the length is significantly shorter than the distance between stations to find obvious errors in the model, visualize in map.

Unlike the others, this question requires several clarifications:

Part of Demo0 is a pretty complete map locations-show.html (added 2025-03-06) of substations and lines.
What is "openable fault"? A machine-translation problem, fixed.
How do we define length and distance, is it by geographic coordinates? Distance between stations is by coordinates, length can be found on Conductor.length I believe.
Why do we need "length" of lines, isn't it enough to check the start and end coordinates (of the first and last segment)?
- The length property is used for some electrical calculations, and needs to be accurate. For many models, the individual segments don't have coordinates unfortunately.
How accurate are the coordinates and line segments? Eg there are 2 parallel lines Helsinki-Oulu, and one is shown with small right-angle segments (see below): I doubt these exist in reality
- There are parallel lines and their configuration can be quite complicated (see second image below), but the Helsinki example looks unrealistic to me. For the purposes of answering questions like "how long is the line", I think it's OK to traverse one of the possible paths through the line segment graph.
- Our internal models have very accurate placement of stations: but Lines are modeled as single segments, and single geospatial line
- So the only thing we can check is that sttraight-iline (geospatial) distance is less than Conductor.length
- Note: #72 Tool "Geospatial indexing" lists it as one of the use cases.

Nordic44 on a map Line with multiple ACLineSegment

Future extension if we have precise Line geometries:

Find the hottest point on a line
Take into account the terrain, elevation, and weather (wind, storms)

Q2.10 Lines Crossing BiddingZones

list AC and DC lines that traverse "adjacent bidding zones"

This overlaps with Demo1 Q1.4 AC Lines Crossing SubGeographicalRegions.
It also depends on Connection Questions

Q2.11 Switches in Series

"Which switches are in series with each other?"
- Elaboration: "Which switches are 'glued' with one or several ConectivityNodes where each of the ConnectivityNodes do not have anything else connected?"
More complex variant: "List all Switches in Kristiansand in series with the line going to Arendal"
- Elaboration: "List Switches (Breakers and Disconnectors) in Substation Kristiansand connected to the Bay going to Arendal (Kristiansand 300 Arendal 1 Bay)"
"Is the line Kristian-Arendal normally supplied by BusbarSection A or BusbarSection B"

Depends on Connection Questions

TODO (wait for Statnett to confirm until 2 May): work out the SPARQL for these questions.

Svein: the Topology works this out. Don't prioritize these questions. But further discussion brought them out as interesting.

Q2.12 Calculate Operational Limits

for one of those lines, look up all relevant operational limit sets (not only for the AC Line, but also including end-components on each side) and compute the resulting limit

As a Power System Engineer I would like to request the operational limits that are associated with components along an AC or DC line (identified by its name or unique ID) so that I later can process the limits and find the total limit by applying the appropriate function

This question has several parts, and each one needs to be elaborated:

"for one of those lines": the lines can be selected by any method, so this needs to be handled as a follow-up question in the conversation. But not the usual LLM follow-up where we rely on LLM's memory: instead we need to:
- Save the full and precise result-set from the previous question
- OR issue the previous question as a subquery
- OR issue the two queries in sequence, and use VALUES to pass the first result-set into the second query
- Task "Threading" of queries
"Relevant operational limit sets": how do you define "relevant"?
- Several options: List the OperationalLimitTypes you find and ask the user to follow-up with which one(s) they want, or just make a decision and tell the user which one was used.
"But also including end-components on each side": Lines only have Segments. Do you mean components (equipments) in the connected substations (eg Switches/Breakers), or something else?
- Yes. For many purposes, users consider the switchgear (Bay components like switches and current transformers) at either side of the cim:Line to be part of the line. I think the more interesting point here is that there are often several possible interpretations and the system needs to communicate how it decided to interpret any ambiguity, and be able to retry if the user asks for a different interpretation.
- This was central to the original request this example question was based on, as the analyst wanted to know whether switchgear caused bottlenecks in the transmission system.
"Compute the resulting limit using the appropriate function":
- How to compute, is it min, sum or some other operation? Or does it depend on limit kind?
- I implicitly meant CurrentLimit. (See above about asking for clarification.)
- CurrentLimits are computed through some formulas depending on the connectivity.
  - If the components are attached in series ("like a string of christmas lights") you just select the smallest number.
  - For the other case, you need to consider r and x to compute how the current is distributed between parallel segments.
- It could be easier to export the relevant entities and call an external function like computeCurrentLimit(startComponent, endComponent, components, limitType). See Code Generation for details

Limits are temperature-dependent. You can have a 20% difference depending on the weather (the colder the weather, the better for the grid).

Where to learn about OperationalLimits?

CIM Primer doesn't have material on it
Maybe we can just ask ChatGPT?

Q2.13 Count by Characteristics

Count substations by Bidding Zone
TODO: What types of compensation are there (active, reactive power; related to shunts)

Demo2 - statnett/Talk2PowerSystem GitHub Wiki

Q2.1 Count by Type

Q2.2 List by Type

Q2.3 List by Type and Property

Property Disambiguation

Value Disambiguation Tool

Q2.4 List by Type and Properties/Relations

Q2.5 Search by mRID, Name, Description

Q2.6 Resources Within Area

Q2.7 Geospatial Queries

Q2.8 Measurements by Type and Equipment

Q2.9 Short Lines (Obvious errors)

Q2.10 Lines Crossing BiddingZones

Q2.11 Switches in Series

Q2.12 Calculate Operational Limits

Q2.13 Count by Characteristics

⚠️ GitHub.com Fallback ⚠️

Demo2 - statnett/Talk2PowerSystem GitHub Wiki

Q2.1 Count by Type

Q2.2 List by Type

Q2.3 List by Type and Property

Property Disambiguation

Value Disambiguation Tool

Q2.4 List by Type and Properties/Relations

Q2.5 Search by mRID, Name, Description

Q2.6 Resources Within Area

Q2.7 Geospatial Queries

Q2.8 Measurements by Type and Equipment

Q2.9 Short Lines (Obvious errors)

Q2.10 Lines Crossing BiddingZones

Q2.11 Switches in Series

Q2.12 Calculate Operational Limits

Q2.13 Count by Characteristics

⚠️ **GitHub.com Fallback** ⚠️

⚠️ GitHub.com Fallback ⚠️