Demo2 - statnett/Talk2PowerSystem GitHub Wiki
Started in Talk2PowerSystem#17
- Demo1 had 5 questions
- For Demo2, we target 10 more questions
Terminology:
- "original": competency question given by Statnett: shown as a quotation. We convert those to parameterized templates
- "paraphrase": asking the same thing with a differnet surface form (or in different language)
- "variant": asking a similar but different thing, thus multiplying the number of templates
- "TODO"
list all resource types in model (what is there?)
Paraphrases:
- what data do you have?
- what data is there?
- count resources by type
- (that's not a pure paraphrase, more like a variant)
- show how many resources are there by class
Variations:
- List only part of the class hierarchy, eg "count PSRs by type" or "count identified objects by type"
- Various ordering options: alphabetically, or by descending count
- What are the top-level non-enumeration "interesting" classes?
- What is the class tree starting at a given class?
Considerations:
- Use
sesame:directType
to list only leaf-level (concrete) classes - Disregard non-electrical (non-
cim, eu, nc
) classes e.g. rdf:List, rdfs:Property, owl:Class - Some variations can be answered from ontology alone, others (e.g. counts) require query. How do we handle each version?
- Available ordering is determined by the data e.g. order by date, by amount, by name (alphabetically), etc. Aggregations allow ordering by count on the aggregation but possibly in different ways.
- Combination of limit and ordering can be explicit "e.g. biggest 5 substations", or by default
Notes:
- Would be nice to have this in UI: click on count to get a list of those objects (drill-down or follow-up)
- Models are represented by named graphs but we won't tackle this until later in the project. So for this iteration we just work with all data
list all resources of a certain type in model
Variations:
- List resources of any of N given types
- List resources by super-class vs direct class only
- List resources of given type ordered by characteristic (e.g. generators by descending net max output)
- List resources of given type grouped by characteristic (e.g. substations grouped by region)
Notes:
- Show mRID (with link being the URL), name, direct type
- Apply a default limit (eg 100) if no other limit is provided
As a Power System Engineer I want to get a filtered list of equipment of a certain type (e.g. generating units) so I am certain that I include all relevant equipment in my further analysis. Typical filtering options include
nominal_voltage = 300 kV
,bidding_zone = NO1
orMRID contains the string "3413fa"
)
List resources by class and filter by properties, where properties are class-dependent.
- We mean here numeric, boolean or dateTime props, and a condition (equality or range) that is not a substring condition
- Also filter by properties pointing at enumeration classes (e.g. by entsoe:LimitTypeKind)
Variations:
This question is very suited for multiplication by adding type-dependent properties.
Eg this query finds props of GeneratingUnit
:
select * {
?p rdfs:domain cim:GeneratingUnit
} order by ?p
TODO: why adding range returns only 19?
?p rdfs:domain cim:GeneratingUnit; rdfs:range ?range
Answer: because it uses old versions of the CIM ontologies. See https://github.com/statnett/Talk2PowerSystem_PM/issues/19 for the newest versions. This variant returns 39 (TODO why?)
?p rdfs:domain cim:GeneratingUnit; cims:dataType|rdfs:range ?range
- List all analogs / measurements / timeseries of type Active Power.
- Clarification: List all Analog Measurements that have
measurementType
equal to "ThreePhaseActivePower"
- Clarification: List all Analog Measurements that have
select * {
?x a cim:Analog; cim:Measurement.measurementType "ThreePhaseActivePower"
}
- List transformers with
rma
between 0 and 2- Note: this prop is defined in the
pti:
namespace (pti:PowerTransformer.rma
), but we don't have that ontology. We should translate it to a NC property, or pick another - Answer: it's not used in production
- Note: this prop is defined in the
- List transformers that are normally in service
- Note:
cim:Equipment.normallyInService true
, but in Nordic44 it's always set "true" on all equiment
- Note:
- Find switches that are normally closed
- Note:
cim:Switch.normalOpen false
- Note:
Here we already see a need for 2 specialized tools.
Given a domain (host) class and a fuzzy prop name, find the precise full prop URL
- This relates to a problem with CIM's overly-specific prop names (prefixed by the domain class or superclass).
- This need has been discussed at length at issue shorten prop names in JSONLD?, with prior analysis in 2023 by Vladimir CIM Shorten Prop Names
- How can the LLM know that the prop is
Equipment.normallyInService
notPowerTransformer.normallyInService
? - It would need to trace the class hierarchy upward to find the precise superclass where the prop is attached
- Prop suffixes (second part) are not globally unique, eg
b
is defined at the following classes:cim:EquivalentShunt.b
cim:NonlinearShuntCompensatorPoint.b
cim:PowerTransformerEnd.b
cim:TapChangerTablePoint.b
- So there is a need for a
- Notice props are called
normallyInService
butnormalOpen
; and it's legit to ask "normally closed", so indeed fuzzy matching is needed
- Notice props are called
Given a fuzzy property name and a value (mentioned implicitly or explicitly), disambiguate to a specific "property-object" pattern.
This applies to Boolean, enumerated string or enumerated object (cims:stereotype "enum"
) properties.
Examples:
- "Measurement of type active power" means
cim:Measurement.measurementType "ThreePhaseActivePower"
- "Switch that is normally closed" means
cim:Switch.normalOpen false
- "Terminals on phase A" means
cim:Terminal.phases cim:PhaseCode.A
List resources by class and filter by properties (attributes and/or relations), where the properties are class-dependent.
This has multiple complications compared to the previous question:
- Multiple criteria
- Some of the criteria may be object relations, so that will require either the Value Disambiguation tool, or chaining of queries.
- We can have property chains instead of single properties as a filter (e.g. get NominalVoltage through VoltageLevel node)
Variations:
- List all analogs / measurements / timeseries of type active power in substation Oslo
- Meaning: List all Analog Measurements that have Measurement.measurementType "ThreePhaseActivePower" and are about Substation with
IdentifiedObject.name "OSLO"
. "About" means the PowerSystemResource that the Analog is referencing. - Note: we don't have Analog Measurements of Substations in Nordic44, all Measurements are about EnergyCongestionZone.
- Meaning: List all Analog Measurements that have Measurement.measurementType "ThreePhaseActivePower" and are about Substation with
- "List all generators belonging to ScheduleResource NOKG05"
- Meaning: "List cim:GeneratingUnit with pti:GeneratingUnit.ScheduleResource pointing to the pti:ScheduleResourceGeneration with pti:SchedureResource.marketCode 'NOKG05'"
PREFIX pti: <http://www.pti-us.com/PTI_CIM-schema-cim16#>
PREFIX cim: <http://iec.ch/TC57/2013/CIM-schema-cim16#>
SELECT ?generatingUnit ?generatingUnitName {
?scheduleResourceGeneration a pti:ScheduleResourceGeneration; pti:ScheduleResource.marketCode "NOKG05".
?generatingUnit pti:GeneratingUnit.ScheduleResource ?scheduleResourceGeneration; cim:IdentifiedObject.name ?generatingUnitName.
}
List resources by class and filter by parts of mRID, Name, Description (including and/or excluding some substring)
Variants:
- List all synchronous machines that have "M1" or "M2" in the name, but not "300"
- Note: Get the IdentifiedObject.name of all SynchronousMachines. On the names, do "advanced" filtering with OR and NOT.
- Much information is in names (and descriptions), so it would be nice to allow advanced filtering based on name.
- Statnett imagines users navigating in all out 77k Analogs using text search. However, the example given here is using SynchronousMachine.
- TODO: Confirm whether this is a valid need, because regex operations on a large population of objects are very expensive; whereas FTS will require more complex indexing and FTS query generation
- Find the PSR "f1769ce8"
- Note: this refers to the unique (first) part of the mRID. See https://github.com/statnett/Talk2PowerSystem_PM/issues/22
Note:
- This applies to props mRID, name, alias, (maybe) description
- https://github.com/statnett/Talk2PowerSystem_PM/issues/68 Tool: "Identify object" handles it
- We use GraphDB's Autocomplete index
filtered list of resources, where filter is a form of area division (schedule resource, bidding area, region...)
Clarifications needed:
- Picking the right area depends on clarifying Areas and Zones
- If a resource (eg
ConnectivityPoint, PowerTransformer
) doesn't have a link to an area, do we somehow use its relations to figure out its area? Or that is out of scope?- For this to work on our internal models, it is very often necessary to follow the containment hierarchy (equipment -> bay -> voltagelevel -> substation) to find the relation to an area.
- If numerous resources can be associated with an area (eg a Substation and all its parts), do we limit to only the top-level resource?
- OK to limit to top-level resource unless the user asked for a specific type, in which case you probably have to dig further down to find e.g. BusbarSections.
Variations:
- List resources filtered by Type and Within Area
- Count resources by Type and Within Area
Variants:
- List all
<RESOURCE_TYPE>
<CARDINAL_DIRECTION>
of<RESOURCE>
- Example "List all substations north of Trondheim":
Variant 1:
PREFIX cim: <http://iec.ch/TC57/2013/CIM-schema-cim16#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
SELECT ?substation ?substationName ?latitude {
?substation a cim:Substation;
cim:PowerSystemResource.Location ?location;
cim:IdentifiedObject.name ?substationName.
?positionPoint cim:PositionPoint.Location ?location.
?positionPoint cim:PositionPoint.yPosition ?latitude.
FILTER(xsd:float(?latitude) > 63.4305) # 63.4472221818584 is the latitude of Trondheim
}
- Critique: I think "Trondheim" here should mean that Substation, and we should not rely on LLM's general geographic knowledge
- Note: the range of the predicates
xPosition, yPosition
is modeled withcims:dataType
and it'scim:String
, notcim:Float
. This should be fixed by https://github.com/statnett/Nordic44/issues/34 "fix datatypes" checkbox. But geographic coordinates will remain strings.
Variant 2: using GeoSPARQL
- https://github.com/statnett/Nordic44/issues/41 : convert Nordic44 to GeoSPARQL WKT
- https://github.com/statnett/Talk2PowerSystem_PM/issues/72 : Tool "Geospatial indexing"
- TODO write the query
Clarifications needed:
- What things have point locations? AFAIK in Nordic44 that is only Substations
- What things have line locations? AFAIK in Nordic44 that is only ACLineSegments (and there are no DCLineSegments)
- Should we allow other geospatial primitives, eg "within 30km of Oslo" (distance) or even "within 30km of the Line Oslo-Kristiansand" (envelope)?
- GraphDB supports older geospatial and newer GeoSPARQL that we can take advantage of.
- The former relies on the WGS84 ontology and the latter relies on GeoSPARQL 1.0
- GraphDB supports "within" but not "envelope"
- Do we need to extract geospatial info from descriptions?
- Eg "SA NO2" is described as "NO2 - 2011-05-16: Southern Norway. West of Hasle (420kV), Sylling (4kV). East of Sauda (300kV)"
find all measurements of a certain type (analog or digital) (Active power, voltage, current ... ) for generator/AC line/power transformer end
- This overlaps with Demo1 Q1.5 Measurements.
- Nordic44 only has Analog measurements "ThreePhaseActivePower" about EnergyCongestionZone
- If that will remain so, we only need to add the "by type" part
find lines where the length is significantly shorter than the distance between stations to find obvious errors in the model, visualize in map.
Unlike the others, this question requires several clarifications:
- Part of Demo0 is a pretty complete map locations-show.html (added 2025-03-06) of substations and lines.
-
What is "openable fault"?
A machine-translation problem, fixed. - How do we define length and distance, is it by geographic coordinates? Distance between stations is by coordinates, length can be found on
Conductor.length
I believe. - Why do we need "length" of lines, isn't it enough to check the start and end coordinates (of the first and last segment)?
- The length property is used for some electrical calculations, and needs to be accurate. For many models, the individual segments don't have coordinates unfortunately.
- How accurate are the coordinates and line segments? Eg there are 2 parallel lines Helsinki-Oulu,
and one is shown with small right-angle segments (see below): I doubt these exist in reality
- There are parallel lines and their configuration can be quite complicated (see second image below), but the Helsinki example looks unrealistic to me. For the purposes of answering questions like "how long is the line", I think it's OK to traverse one of the possible paths through the line segment graph.
- Our internal models have very accurate placement of stations: but Lines are modeled as single segments, and single geospatial line
- So the only thing we can check is that sttraight-iline (geospatial) distance is less than
Conductor.length
- Note: #72 Tool "Geospatial indexing" lists it as one of the use cases.
Future extension if we have precise Line geometries:
- Find the hottest point on a line
- Take into account the terrain, elevation, and weather (wind, storms)
list AC and DC lines that traverse "adjacent bidding zones"
- This overlaps with Demo1 Q1.4 AC Lines Crossing SubGeographicalRegions.
- It also depends on Connection Questions
- "Which switches are in series with each other?"
- Elaboration: "Which switches are 'glued' with one or several ConectivityNodes where each of the ConnectivityNodes do not have anything else connected?"
- More complex variant: "List all Switches in Kristiansand in series with the line going to Arendal"
- Elaboration: "List Switches (Breakers and Disconnectors) in Substation Kristiansand connected to the Bay going to Arendal (Kristiansand 300 Arendal 1 Bay)"
- "Is the line Kristian-Arendal normally supplied by BusbarSection A or BusbarSection B"
Depends on Connection Questions
TODO (wait for Statnett to confirm until 2 May): work out the SPARQL for these questions.
Svein: the Topology works this out. Don't prioritize these questions. But further discussion brought them out as interesting.
for one of those lines, look up all relevant operational limit sets (not only for the AC Line, but also including end-components on each side) and compute the resulting limit
As a Power System Engineer I would like to request the operational limits that are associated with components along an AC or DC line (identified by its name or unique ID) so that I later can process the limits and find the total limit by applying the appropriate function
This question has several parts, and each one needs to be elaborated:
- "for one of those lines": the lines can be selected by any method, so this needs to be handled as a follow-up question in the conversation.
But not the usual LLM follow-up where we rely on LLM's memory: instead we need to:
- Save the full and precise result-set from the previous question
- OR issue the previous question as a subquery
- OR issue the two queries in sequence, and use VALUES to pass the first result-set into the second query
- Task "Threading" of queries
- "Relevant operational limit sets": how do you define "relevant"?
- Several options: List the
OperationalLimitTypes
you find and ask the user to follow-up with which one(s) they want, or just make a decision and tell the user which one was used.
- Several options: List the
- "But also including end-components on each side": Lines only have Segments. Do you mean components (equipments) in the connected substations (eg Switches/Breakers), or something else?
- Yes. For many purposes, users consider the switchgear (Bay components like switches and current transformers) at either side of the
cim:Line
to be part of the line. I think the more interesting point here is that there are often several possible interpretations and the system needs to communicate how it decided to interpret any ambiguity, and be able to retry if the user asks for a different interpretation. - This was central to the original request this example question was based on, as the analyst wanted to know whether switchgear caused bottlenecks in the transmission system.
- Yes. For many purposes, users consider the switchgear (Bay components like switches and current transformers) at either side of the
- "Compute the resulting limit using the appropriate function":
- How to compute, is it
min, sum
or some other operation? Or does it depend on limit kind? - I implicitly meant
CurrentLimit
. (See above about asking for clarification.) -
CurrentLimits
are computed through some formulas depending on the connectivity.- If the components are attached in series ("like a string of christmas lights") you just select the smallest number.
- For the other case, you need to consider
r
andx
to compute how the current is distributed between parallel segments.
- It could be easier to export the relevant entities and call an external function like
computeCurrentLimit(startComponent, endComponent, components, limitType)
. See Code Generation for details
- How to compute, is it
Limits are temperature-dependent. You can have a 20% difference depending on the weather (the colder the weather, the better for the grid).
Where to learn about OperationalLimits?
- CIM Primer doesn't have material on it
- Maybe we can just ask ChatGPT?
- Count substations by Bidding Zone
- TODO: What types of compensation are there (active, reactive power; related to shunts)