Demo0 - statnett/Talk2PowerSystem GitHub Wiki
Our offer sec 3.3 describes a "CIM demo":
- Internal project https://gitlab.ontotext.com/sol/demonstrators/statnett that
- It's setup with Nordic44 data (see Datasets) and a GraphQL schema over "half" of the CIM ontologies.
- The data is old and patched up "manually" because that was done before Inst4CIM-KG/rdf-improved
- The ontology is also old because it was taken before Inst4CIM-KG/rdfs-improved
- It doesn't have any integration between GraphQL and LLM: instead I talked with ChatGPT and pasted queries to
graphiql
manually.
- Interactive map locations-show.html (added 2025-03-06)
- GraphDB: https://cim.ontotext.com/graphdb/ user
readonly
, password123456789
- SPARQL editor: https://cim.ontotext.com/graphdb/sparql
- SPARQL endpoint: https://cim.ontotext.com/graphdb/repositories/cim
- Ontotext Platform Workbench: https://cim.ontotext.com/workbench/
- Ontotext Platform GraphQL Playground: https://cim.ontotext.com/workbench/graphql
- Ontotext Platform Workbench Search: https://cim.ontotext.com/workbench/search
- Kibana: https://cim.ontotext.com/kibana/ (charts/maps)
Note: GraphDB-Elastic synchronization is currently disabled.
Description from offer:
This is a demo of SPARQL, GraphQL and LLM querying with CIM data. We think that it is fundamentally important for this project since it shows some pragmatic and innovative approaches for working with CIM data and simplified querying using GraphQL. We would be grateful for a chance to show you an online demo in an interview.
We used the Nordic44 grid model: Nordic44_CGM_v3_7
, which uses ontology version CGMES 2.4.15
(2020-09-04) in RDFS.
We wanted to show how CIM can be queried with GraphQL, so we converted the ontologies to Semantic Object schema (SOML) by using the open source owl2soml tool. Since some aspects of CIM ontologies are unusual or not pure RDFS, we made various changes and extensions to the owl2soml generator (see CIM-additions.md for details):
- Converted
cims:multiplicity
tomin..max
(normallyowl:min/maxCardinality
is used, andowl:FunctionalProperty
formax=1
) - Converted
cims:inverseRoleName
toinverseOf
(normallyowl:inverseOf
is used) - Handled
rdfs:comment
with datatype^^rdf:XMLLiteral
- Mapped custom datatypes to standard XSD datatypes. This pertains to both:
- CIM Primitive types: eg
cim:Float
maps toxsd:float
- CIM Non-primitive types (having
cims:stereotype "CIMDatatype"
): egcim:ActivePower
maps tocim:Float
and thereon toxsd:float
- CIM Primitive types: eg
- Allowed prop names to start with capital letter and include a dot
We initiated a discussion in the CIM Semantics WG about shortening prop names to remove the domain class name prefix. See analysis and details in CIM-shorten.md.
The key question is whether props with the same local name but different class prefix may be in "conflict", i.e. have different semantics or range, or can appear together in the same node (see CIM-shorten-potential-conflict.csv. It was confirmed that no such conflicts exist. For example, these 4 props mean the same, have the same range, and they cannot be used on the same node (i.e. the 4 mentioned classes are disjoint):
- cim:ConductingEquipment.BaseVoltage
- cim:TopologicalNode.BaseVoltage
- cim:TransformerEnd.BaseVoltage
- cim:VoltageLevel.BaseVoltage
Shortened prop names will make it easier for both humans and LLMs to write GraphQL queries. Please note that we propose this only for the SOML layer, not for the RDFS ontology and RDF data because changing those would break CIM backward compatibility: so unfortunately SPARQL cannot benefit from such simplification. Such shortening is still to be implemented in owl2soml.
So currently one needs to write a GraphQL query like this to fetch AC line segments and substations that have location: with name, description, location points and sequence, and voltage level:
query psrWithLocationPointsAndVoltage {
aCLineSegment(where: { powerSystemResource_Location: {} }) {
identifiedObject_name
identifiedObject_description
powerSystemResource_Location {
location_PositionPoints(orderBy: { positionPoint_sequenceNumber: ASC }) {
positionPoint_xPosition
positionPoint_yPosition
positionPoint_sequenceNumber
}
}
conductingEquipment_BaseVoltage {
baseVoltage_nominalVoltage
}
}
substation(where: { powerSystemResource_Location: {} }) {
identifiedObject_name
powerSystemResource_Location {
location_PositionPoints {
positionPoint_xPosition
positionPoint_yPosition
}
}
}
}
After we implement the shortening, one can write the query in a simplified form:
query psrWithLocationPointsAndVoltage {
aCLineSegment(where: { location: {} }) {
name
description
location {
positionPoints(orderBy: { sequenceNumber: ASC }) {
xPosition
yPosition
sequenceNumber
}
}
baseVoltage {
nominalVoltage
}
}
substation(where: { location: {} }) {
name
location {
positionPoints {
xPosition
yPosition
}
}
}
}
This is quite easier for an LLM to generate because it's more natural, and the LLM doesn't need to remember which abstract superclass to use in the prop name (eg identifiedObject for name, conductingEquipment for baseVoltage, and powerSystemResource for location).
CIM is a complex but straightforward ontology, so it takes some getting used to in order to write SPARQL queries. The Nordic 44 data is relatively small as KGs go, so all queries ran very fast (we wrote about 15 queries).
Writing queries to explore or count the data is easy.
- Most Connected ConnectivityNodes
Find the 5 "most connected" ConnectivityNodes by number of Terminals:
select ?conn (count(*) as ?terminals) {
?conn a cim:ConnectivityNode;
^cim:Terminal.ConnectivityNode ?terminal
} group by ?conn order by desc(?terminals) limit 5
- Details of Most Connected ConnectivityNode
Which 21 pieces of equipment are connected to the largest ConnectivityNode?
select * {
<http://www.Statnett.no/IGM/Nordic44_CGM#_f17696ad-9aeb-11e5-91da-b8763fd99c5f>
^cim:Terminal.ConnectivityNode / cim:Terminal.ConductingEquipment ?eq.
?eq cim:IdentifiedObject.name ?name; sesame:directType ?type
} order by ?name
- Disconnector and What it Disconnects.
Let's consider the "KRISTIAN300AR1 BD_S" disconnector:
select ?terminal1 ?terminal2 ?connNode ?equipment ?eq_type {
?disconnect cim:IdentifiedObject.name "KRISTIAN300AR1 BD_S".
?disconnect ^cim:Terminal.ConductingEquipment ?term1.
?term1 cim:IdentifiedObject.name ?terminal1.
?term1 cim:Terminal.ConnectivityNode ?conn.
optional {?conn cim:IdentifiedObject.name ?connNode}
?conn ^cim:Terminal.ConnectivityNode ?term2.
filter(?term1 != ?term2)
?term2 cim:IdentifiedObject.name ?terminal2.
?term2 cim:Terminal.ConductingEquipment ?eq.
?eq cim:IdentifiedObject.name ?equipment.
?eq sesame:directType ?eq_type
}
terminal1 | terminal2 | connNode | equipment | eq_type |
---|---|---|---|---|
KRISTIAN300AR1 BD_S T1 | 300KRISTIAN-ARENDAL_T1 | KRISTIAN CN 024 | 300KRISTIAN-ARENDAL | cim:ACLineSegment |
KRISTIAN300AR1 BD_S T1 | KRISTIAN300AR1 AD_S T2 | KRISTIAN CN 024 | KRISTIAN300AR1 AD_S | cim:Disconnector |
KRISTIAN300AR1 BD_S T2 | KRISTIAN300AR1 BB_S T1 | KRISTIAN CN 032 | KRISTIAN300AR1 BB_S | cim:Breaker |
Now let's try some geospatial queries. Please note that the CIM class hierarchy is deep (many abstract superclasses), so if RDFS reasoning is enabled, one should use the sesame:directType predicate to fetch only directly asserted types.
- Types of PowerSystemResource with Location
select ?type (count(*) as ?c) {
?loc a cim:Location; cim:Location.PowerSystemResources ?res.
?res sesame:directType ?type
} group by ?type order by ?type
type | c |
---|---|
cim:ACLineSegment | "68" |
cim:Substation | "44" |
- Types of PowerSystemResource with Number of Points Describing Them
select ?type ?points (count(*) as ?c) {
{select ?res (count(?pt) as ?points) {
?loc a cim:Location; cim:Location.PowerSystemResources ?res.
?pt cim:PositionPoint.Location ?loc.
} group by ?res}
?res sesame:directType ?type
} group by ?type ?points order by ?type ?points
type | points | c |
---|---|---|
cim:ACLineSegment | "2" | "66" |
cim:ACLineSegment | "4" | "2" |
cim:Substation | "1" | "44" |
-
Substation
is a point feature, so it's described by 1 point -
ACLineSegment
is a linear feature, so it's described by a polyline- The majority are simple polylines (just 2 points, i.e. a straight line)
- There are a couple described by more complex polylines
It is harder to fetch complex nested data of selected objects.
Let's select all data needed to make a geo chart, but only of one PowerSystemResource with complex polyline (4 points, so we also need the sequenceNumber)
PREFIX : <http://www.Statnett.no/IGM/Nordic44_CGM#_>
PREFIX cim: <http://iec.ch/TC57/2013/CIM-schema-cim16#>
PREFIX sesame: <http://www.openrdf.org/schema/sesame#>
select ?res ?type ?name ?descr ?voltage ?seq ?x ?y {
values ?res {:f1769a94-9aeb-11e5-91da-b8763fd99c5f :f17696a8-9aeb-11e5-91da-b8763fd99c5f}
?res sesame:directType ?type; cim:IdentifiedObject.name ?name
optional {?res cim:IdentifiedObject.description ?descr}
optional {?res cim:ConductingEquipment.BaseVoltage / cim:IdentifiedObject.name ?voltage}
?loc cim:Location.PowerSystemResources ?res.
?pt cim:PositionPoint.Location ?loc;
cim:PositionPoint.xPosition ?x;
cim:PositionPoint.yPosition ?y;
cim:PositionPoint.sequenceNumber ?seq
} order by ?res ?seq
- In this case there's no Cartesian Product (there's just 1 triple of multivalued results), so we can use a simple query
- But most of the time one needs to use complex UNION clauses for each kind of object, and each multivalued property
- Compare to the GraphQL query in the previous section, which is quite simpler, and gets all data of 2 kinds of objects with location info.
We've shown a GraphQL query in section CIM Property Shortening. Here are some more. Please note some GraphQL shorthands:
-
id
is the node URI, -
type
is rdf:type, -
name
isidentifiedObject_name
(the prop that carries a resource's name can be declared per-type).
Queries:
- Power System Resources With Name Containing "Oslo", with Location and Position Points.
query Oslo_PSRs {
powerSystemResource(where: {name: {IRE: "Oslo"}}) {
name
type
powerSystemResource_Location {
location_PositionPoints {
positionPoint_xPosition
positionPoint_yPosition
}
}
}
}
- Substations From Region "NO2 SGR", with Nominal Voltage
query NO2_Substations {
substation(where:{substation_Region:{name: {EQ: "NO2 SGR"}}}) {
id
name
substation_VoltageLevels {
voltageLevel_BaseVoltage {
baseVoltage_nominalVoltage
}
}
}
}
The Ontotext Semantic Search addon (works together with Semantic Objects) indexes CIM data in Elasticsearch. Then we use Kibana to make charts and maps. Kibana is a powerful analytics environment that works over Elastic data. Visualizations are made using the Vega-Lite declarative visualization language. Eg here is a map of substations and transmission lines, where color distinguishes the voltage and the tooltip shows basic info about these objects. The chart is coded in JSON that you can see on the right:
Now we come to the most interesting part: using LLM for CIM NLQ, by translating natural language questions to GraphQL queries.
You can try some exotic things with chatGPT, eg to read a UML diagram, give it some example GraphQL, ask it to make more queries. (Omitted)
But we already have a SOML schema, which is simple YAML and much easier for an LLM to understand and remember. We generated it from the CIM ontologies (see CIM Demo above). Using "half" of the CIM/CGMES ontologies results in a 260k chars generated SOML schema; adding Elasticsearch details (which fields to index and how) adds 2k to the size. These schemas include all kinds of detail, eg:
objects:
AccumulatorReset:
descr: This command reset the counter value to zero
inherits: ControlInterface
label: AccumulatorReset
props:
accumulatorReset.AccumulatorValue: {}
type: cim:AccumulatorReset
ControlInterface:
descr: Abstract superclass of Control
inherits: IdentifiedObjectInterface
kind: abstract
search: {nested: true}
props:
control.PowerSystemResource: {}
properties:
accumulatorReset.AccumulatorValue:
descr: The accumulator value that is reset by the command
inverseOf: accumulatorValue.AccumulatorReset
kind: object
label: AccumulatorValue
max: 1
min: 1
range: AccumulatorValue
rdfProp: cim:AccumulatorReset.AccumulatorValue
control.PowerSystemResource:
descr: 'The controller outputs used to...'
inverseOf: powerSystemResource.Controls
kind: object
label: PowerSystemResource
max: inf
min: 0
range: PowerSystemResourceInterface
rdfProp: cim:Control.PowerSystemResource
We also made a much simplified entity schema (37k) that is reduced to the bare essentials for being able to make queries. Here's an example that corresponds to the one above:
AccumulatorReset:
ISA: ControlInterface
accumulatorReset_AccumulatorValue: AccumulatorValue
ControlInterface:
ISA: IdentifiedObjectInterface
control_PowerSystemResource: [PowerSystemResourceInterface]
This allows us to communicate bigger schemas without overflowing the LLM's conversation buffer (token limit). A token is about 4 chars, and modern LLMs have much bigger token buffers than before, so even bigger schemas can be accommodated. But by simplifying the schema, we can feed the LLM huge schemas, and it can still keep its "attention span". The reduced schema was made with the soml-simplify script (which itself was generated by LLM!). Now we're ready to make a system prompt, thus asking the LM to always keep these instructions in mind:
You are Electrical CIM Navigator, a chatbot that can make GraphQL queries for the provided entity schema. Schema details:
- Object names are followed by
ISA
(parent) then props - Props are inherited along the
ISA
hierarchy - Each prop lists its range (literals in lowercase, objects in uppercase); arrays are indicated by square brackets
- Each object has default fields:
id
(its URL),type
(its RDF types)
GraphQL details:
- IMPORTANT: Prop names start with lowercase
- IMPORTANT: CIM prop names are redundant, eg in
Location
there islocation_CoordinateSystem
rather than merelycoordinateSystem
- Any object can be root of the query
- Use
where
:- Compare a field against constant with operations:
EQ, NE, LT, LE, GT, GE, IN
(list member),NIN
(not list member),RE
(regex match),IRE
(case-insensitive regex match),NRE
(no match),NIRE
(case-insensitive no match) - Check field existence:
field: {}
- Logical operations:
OR, AND, NOT
- Compare a field against constant with operations:
- Fetch a single object by id (URL):
ID:"<URL>"
- Order by field:
orderBy: {<field>: <ASC/DESC>}
- Use
limit, offset
for pagination. - IMPORTANT: apply
limit: 3
unless asked otherwise
Entity schema:
<statnett-simplified.yaml>
Now we can ask the LLM to make queries for us. Q:
is from the user, A:
from the LLM, and C:
a comment.
- A: As an Electrical CIM Navigator, I can assist you in constructing GraphQL queries based on the provided entity schema. This schema describes various electrical components and their properties, following the Common Information Model (CIM) standards. You can query for specific entities, their properties, and relationships using GraphQL.
To help you better, please provide the specific details or requirements for the query you need. For example, you might be interested in querying details about a specific type of equipment, such as
ACDCConverter
orPowerTransformer
, and their associated properties. Let me know what specific information you are looking for, and I'll assist you in constructing the appropriate GraphQL query.
C: Ok, that's quite a flourish it shows!
- Q: I want to make a map. Which resources have coordinates?
- A: To create a map, you'll need to identify resources in the schema that are associated with geographical coordinates. In the provided entity schema, the Location object is the primary resource that would contain coordinate information. The Location object is linked to various power system resources and includes properties for coordinate systems and position points. Here's a breakdown of the relevant objects and properties:
- Location (Location):
- CoordinateSystem: Links to a CoordinateSystem object that might contain coordinate system details.
- PositionPoints: An array of PositionPoint objects that should contain the actual coordinate data (latitude, longitude, and possibly altitude).
- PositionPoint (PositionPoint):
- xPosition, yPosition, zPosition: These properties represent the coordinates (longitude, latitude, and altitude/elevation respectively).
- CoordinateSystem (CoordinateSystem):
- crsUrn: This property might provide information about the coordinate reference system used.
To extract locations and their coordinates, you would typically query the Location object and include the PositionPoints in your query to get the actual coordinate data. Here's an example of how a GraphQL query might look:
query {
location {
CoordinateSystem {
crsUrn
}
PositionPoints {
xPosition
yPosition
zPosition
}
}
}
C: All of this is correct, except that in SOML, we translate all prop names to start with lowercase (which is a commonly accepted practice)
- Q: IMPORTANT: Prop names start with lowercase
- A: Thank you for the correction. In the GraphQL query, property names should indeed start with lowercase letters. Here's the revised query to fetch locations and their coordinates:
query {
location {
coordinateSystem {
crsUrn
}
positionPoints {
xPosition
yPosition
zPosition
}
}
}
This query will retrieve each location's coordinate system details and the array of position points, which include the coordinates (longitude, latitude, and altitude/elevation). Remember, xPosition and yPosition typically represent longitude and latitude, respectively, while zPosition might represent altitude or elevation.
C: It made the very natural mistake of using short prop names rather than the overly-specific CIM convention. See CIM Property Shortening for more on this. So let's correct it:
- Q: CIM prop names are redundant, eg in
Location
there islocation_CoordinateSystem
rather than merelycoordinateSystem
. And cut down on the explanations A: Understood. Here's a streamlined GraphQL query to fetch locations with their coordinates, considering the naming conventions of the CIM schema:
query {
location {
location_CoordinateSystem {
crsUrn
}
location_PositionPoints {
xPosition
yPosition
zPosition
}
}
}
Q: Refer to the schema and show me the class hierarchy 2 classes, using indentation. Start from the root, don't show props, nor the word "ISA" A: Certainly, here's the class hierarchy for ACLineSegment and Substation starting from the root, without properties and the word "ISA":
IdentifiedObjectInterface
PowerSystemResourceInterface
EquipmentInterface
ConductingEquipmentInterface
ConductorInterface
ACLineSegment
IdentifiedObjectInterface
PowerSystemResourceInterface
EquipmentContainerInterface
Substation
C: Notice that the LLM is pretty good at interpreting the schema. Note: we rename all abstract superclasses to *Interface
to conform to GraphQL terminology.
C: I continued prompting it with guidance like this:
- Apply some limit
- Here's a corrected query: notice field capitalization
- Mind your spelling. It's
location_PositionPoints
notlocation_positionPoints
- Refer to the entity schema for the precise prop names
- What resources have location?
- Give me a query to return all resources with location, and add
__typename
so we can distinguish them - Include the sequenceNumber so we can order the points
- Forget
zPosition
, it's never present - Query for
ACLineSegment
andSubstation
that have Locations. Include Voltage level so we can color them. - Forget about Voltage levels of substations since they have 2.
- Order the points by sequence
Finally, the LLM made the correct query:
query {
aCLineSegment(where: { powerSystemResource_Location: {} }) {
identifiedObject_name
identifiedObject_description
powerSystemResource_Location {
location_PositionPoints(orderBy: { positionPoint_sequenceNumber: ASC }) {
positionPoint_xPosition
positionPoint_yPosition
positionPoint_sequenceNumber
}
}
conductingEquipment_BaseVoltage {
baseVoltage_nominalVoltage
}
}
substation(where: { powerSystemResource_Location: {} }) {
identifiedObject_name
powerSystemResource_Location {
location_PositionPoints {
positionPoint_xPosition
positionPoint_yPosition
}
}
}
}
The response has a part about AC transmission line and then a part about substations:
{
"data": {
"aCLineSegment": [
{
"powerSystemResource_Location": {
"location_PositionPoints": [
{
"positionPoint_xPosition": "8.76827271306773",
"positionPoint_yPosition": "58.4417958162972",
"positionPoint_sequenceNumber": 1
},
{
"positionPoint_xPosition": "9.84915638753781",
"positionPoint_yPosition": "57.2216632463631",
"positionPoint_sequenceNumber": 2
}
]
},
"conductingEquipment_BaseVoltage": {
"baseVoltage_nominalVoltage": 300
}
},
"substation": [
{
"powerSystemResource_Location": {
"location_PositionPoints": [
{
"positionPoint_xPosition": "10.2157353737795",
"positionPoint_yPosition": "59.1144170898588",
"positionPoint_sequenceNumber": 1
}
]
}
},
Q: See this sample data. Make a map. Use color for voltage levels (low: blue, high: red). Overlay over OpenStreetMap.
A: I can guide you through the process so you can create it on your own. Based on the data structure you provided, here's a step-by-step guide:
- Read the JSON Data: Load your JSON file containing the AC transmission lines and substations data.
- Create a Base Map:
- Use a mapping library like Folium in Python.
- Initialize the map with a central location and zoom level suitable for your data.
- Plot AC Transmission Lines:
- Iterate over each aCLineSegment in your data.
- Extract the positionPoint_xPosition and positionPoint_yPosition for each point in location_PositionPoints.
- Determine the color based on baseVoltage_nominalVoltage (e.g., use blue for low voltage, red for high).
- Draw lines on the map using these coordinates and the determined color.
- Plot Substations:
- Iterate over each substation in your data.
- Extract the positionPoint_xPosition and positionPoint_yPosition for the substation's location.
- Place a marker on the map at this location (you can use a different color or icon to distinguish substations).
- Finalize and Save the Map:
- Once all lines and substations are plotted, finalize the map.
- Save or export the map to an HTML file or any other desired format.
For the actual coding, you would need a basic understanding of Python and libraries like
folium
andjson
. If you need further assistance with the coding part or have specific questions about any step, feel free to ask!
C: I hack in Perl not Python, so I don't have a working Python environment. But it's easy to set one on replit. So I used chatGPT to give me the code and put them there. The following program was made by ChatGPT.
It even picked fa.bolt
as an icon for the substations (nice touch!).
I only added the two tooltips and tweaked Map(location=[63, 13], zoom_start=5)
:
import json
import folium
# open JSON file
with open('locations-result.json', 'r') as file:
locations_data = json.load(file)
# Initialize a map
map = folium.Map(location=[63, 13], zoom_start=5)
# Process and plot AC line segments
for line in locations_data["data"]["aCLineSegment"]:
points = line["powerSystemResource_Location"]["location_PositionPoints"]
voltage = line["conductingEquipment_BaseVoltage"]["baseVoltage_nominalVoltage"]
tooltip = f'{line["identifiedObject_name"]}: {line["identifiedObject_description"]}'
color = 'blue' if voltage <= 300 else 'red'
folium.PolyLine(
[(float(p["positionPoint_yPosition"]), float(p["positionPoint_xPosition"]))
for p in points],
color=color,
tooltip=tooltip).add_to(map)
# Process and plot substations
for sub in locations_data["data"]["substation"]:
point = sub["powerSystemResource_Location"]["location_PositionPoints"][0]
tooltip = sub["identifiedObject_name"]
folium.Marker(
location=[float(point["positionPoint_yPosition"]), float(point["positionPoint_xPosition"])],
icon=folium.Icon(color='green', icon='bolt', prefix='fa'),
# icon_size=(10, 10), icon_color='green': causes no bubble, so the marker appears off-point
tooltip=tooltip).add_to(map)
# Save the map to an HTML file
map.save('locations-show.html')
The result is an HTML page with an interactive map. It shows tooltips over AClines and substations and uses color for the voltage level:
Interactive map locations-show.html (added 2025-03-06)
In summary, LLM helps a lot with analytical tasks, but the best results are obtained when the human and LLM work together.
- LLM needed plenty of advice and corrections from the human to fetch the right data. Some human expertise about the structure and meaning of data was needed.
- In contrast, it worked on the analytics part nearly by itself. It created the code to make a map on its own, and even selected the best libraries and approach to use (before that I asked it to try with Leaflet, Vega-Lite, d3 but I don't know enough about these approaches to guide it effectively.