Talk2PowerSystem Chat - statnett/Talk2PowerSystem GitHub Wiki

Implementation

General Overview

Talk2PowerSystem Chat is implemented using LangGraph and ReAct agents. LangGraph is part of the LangChain ecosystem. LangChain is a popular open-source Python framework that enables developers to quickly implement LLM applications. The LangGraph ReAct agents implementation allows for easier switching between LLM models (open-source or commercial services), making the Talk2PowerSystem Chat LLM-agnostic.

The Talk2PowerSystem ReAct agent has access to the following tools:

sparql_query: executes SPARQL queries generated by the agent.
autocomplete_search: uses GraphDB Autocomplete index to search by name and class for IRIs of named entities mentioned in the users' questions.
sample_sparql_queries (or N-Shot tool): given a user question, the tool fetches similar questions, indexed in a vector store, and their corresponding SPARQL queries, and provides them as examples to the LLM agent.
retrieve_time_series: retrieves time series from Cognite
retrieve_data_points: retrieves datapoints for one or more time series from Cognite.
now_tool: returns the current UTC date time in yyyy-mm-ddTHH:MM:SS format.

History vs memory

History keeps all messages between the user and AI intact. History is what the user sees in the UI. It represents what was actually said. Memory keeps some information, which is presented to the LLM to make it behave as if it "remembers" the conversation. Memory is quite different from history. Depending on the memory algorithm used, it can modify history in various ways: evict some messages, summarize multiple messages, summarize separate messages, remove unimportant details from messages, inject extra information (e.g., for RAG) or instructions (e.g., for structured outputs) into messages, and so on. Memory can also be shared between users.

The chat bot currently offers only "memory", which is conversation-based, but not "history".

Tools

SPARQL Query tool

The tool executes SPARQL queries generated by the agent. By default, before sending the generated query to the server, the tool performs some checks:

Validates that it is a valid SPARQL query (SELECT, ASK, CONSTRUCT or DESCRIBE: not an update), in terms of SPARQL syntax.
Obtains the known prefixes for the given repository from GraphDB's /repositories/{repositoryID}/namespaces endpoint.
If the generated query contains prefix definitions that mismatch the known prefixes, they are corrected.
If the query contains prefixes that are not defined in the query, but are in the known prefixes of the repository, the prefixes are automatically added.
For all IRIs in the query the tool checks whether they are stored in the repository by using the special predicate http://www.ontotext.com/owlim/entity#id (this is a very fast way to check for the existence of a resource node).
- An exception is made for the predicates starting with http://www.w3.org/2001/XMLSchema#, http://www.openrdf.org/schema/sesame#, http://www.ontotext.com/owlim/RDFRank# and http://www.ontotext.com/plugins/autocomplete#.

The ontology schema definition is added to the agent instructions upon creation of the agent. If the ontology schema changes, the agent must be re-created / re-initialized. The schema is configured using a turtle file, which is the output of the procedure described here.

Autocomplete search tool

The tool provides means to the LLM agent to identify named entities mentioned in the users' question. The implementation uses the GraphDB Autocomplete index whereas the LLM can search by name and class. To identify which properties to include in the autocomplete index, we use the following query

SELECT DISTINCT ?predicate (COUNT (DISTINCT ?object) AS ?uniq) {
    ?subject a cim:IdentifiedObject;
    ?predicate ?object .
    FILTER (DATATYPE(?object) = xsd:string)
}
GROUP BY ?predicate
ORDER BY DESC(?uniq)

and manually select the relevant ones based on the count and the use cases.

Currently in the autocomplete index we add the following properties[4]:

cim:IdentifiedObject.name
cim:IdentifiedObject.aliasName
cim:CoordinateSystem.crsUrn

Note: Talk2PowerSystem uses the CIM ontology, in which about 90% of instances inherit from cim:IdentifiedObject. You can see this in GraphDB> Explore> Class Hierarchy, e.g. here's the one for Nordic44:

Resources without name and mRID include enumerations and simpler "value objects" like PositionPoint, DiagramPositionPoint, SvVoltage, etc

Sample SPARQL queries (or N-Shot) tool

We index all questions from all templates from the train and dev splits of the Q&A dataset into a vector store using GraphDB Retrieval Connector plugin.The parameters in the questions are replaced with placeholders. For example, $ObjectIdentity(0, cim:SubGeographicalRegion) is replaced with <SubGeographicalRegion> and $ValueFilter(cim:GeneratingUnit, cim:GeneratingUnit.maxOperatingP, xsd:float) - with <float>

Retrieve time series tool

The time series can be filtered using the mrid argument. The value can be a single mrid or a list of multiple mrids. The time series can also be limited using the limit parameter with default value of 25. Limit of -1 means to return all time series.

Retrieve datapoints tool

The external_id is a mandatory argument, and it value must be a single external ID or a list of multiple IDs. start and end arguments are used to filter datapoints based on the timestamp. aggregates and granularity allow for retrieving of aggregation statistics over the datapoints. limit can be used to limit the results, if the query is not an aggregate query.

Now tool

The tool returns the current UTC date time in yyyy-mm-ddTHH:MM:SS format. This is useful for users' questions involving some sort of temporality relative to the time the question is posed.