Talk2PowerSystem Chat - statnett/Talk2PowerSystem GitHub Wiki
Implementation
General Overview
Talk2PowerSystem Chat is implemented using LangGraph and ReAct agents. LangGraph is part of the LangChain ecosystem. LangChain is a popular open-source Python framework that enables developers to quickly implement LLM applications. The LangGraph ReAct agents implementation allows for easier switching between LLM models (open-source or commercial services), making the Talk2PowerSystem Chat LLM-agnostic.
The Talk2PowerSystem ReAct agent has access to the following tools:
sparql_query_tool
: executes SPARQL queries generated by the agent.autocomplete_search_tool
: uses GraphDB Autocomplete index to search by name and class for IRIs of named entities mentioned in the users' questions.now_tool
: returns the current UTC date time inyyyy-mm-ddTHH:MM:SS
format.
History vs memory
History keeps all messages between the user and AI intact. History is what the user sees in the UI. It represents what was actually said. Memory keeps some information, which is presented to the LLM to make it behave as if it "remembers" the conversation. Memory is quite different from history. Depending on the memory algorithm used, it can modify history in various ways: evict some messages, summarize multiple messages, summarize separate messages, remove unimportant details from messages, inject extra information (e.g., for RAG) or instructions (e.g., for structured outputs) into messages, and so on. Memory can also be shared between users.
The chat bot currently offers only "memory", which is conversation-based, but not "history".
Tools
SPARQL Query tool
The tool executes SPARQL queries generated by the agent. By default, before sending the generated query to the server, the tool performs some checks:
- Validates that it is a valid SPARQL query (SELECT, ASK, CONSTRUCT or DESCRIBE: not an update), in terms of SPARQL syntax.
- Obtains the known prefixes for the given repository from GraphDB's
/repositories/{repositoryID}/namespaces
endpoint. - If the generated query contains prefix definitions that mismatch the known prefixes, they are corrected.
- If the query contains prefixes that are not defined in the query, but are in the known prefixes of the repository, the prefixes are automatically added.
- For all IRIs in the query the tool checks whether they are stored in the repository by using the special predicate
http://www.ontotext.com/owlim/entity#id
(this is a very fast way to check for the existence of a resource node).- An exception is made for datatypes (starting with
http://www.w3.org/2001/XMLSchema#
) and GraphDB "magic" predicates (starting withhttp://www.ontotext.com/owlim/RDFRank#
orhttp://www.ontotext.com/plugins/autocomplete#
).
- An exception is made for datatypes (starting with
The ontology schema definition is added to the agent instructions upon creation of the agent.
If the ontology schema changes, the agent must be re-created / re-initialized.
The schema is configured using a SPARQL CONSTRUCT query that returns descriptions of the ontology terms actually used in the KG,
and the used enumeration values (instances of a class, which has cims:stereotype uml:enumeration
).
In addition to these enumeration values, string enumerations
are added to the prompt - properties values, which are modeled as string, but the set of possible values is quite small (up to a hundred)[1]. To identify such properties we use the following query
SELECT DISTINCT ?predicate (COUNT (DISTINCT ?object) AS ?uniq) {
?subject a cim:IdentifiedObject;
?predicate ?object .
FILTER (DATATYPE(?object) = xsd:string)
}
GROUP BY ?predicate
ORDER BY DESC(?uniq)
and manually select the relevant ones based on the count and the use cases.
Currently, the following properties are considered as string enumeration
ones :
- cim:Measurement.measurementType
Autocomplete search tool
The tool provides means to the LLM agent to identify named entities mentioned in the users' question.
The implementation uses the GraphDB Autocomplete index whereas the LLM can search by name and class.
To identify which properties to include in the autocomplete index, we use the same procedure as for the string enumerations
[1].
Currently in the autocomplete index we add the following properties[4]:
cim:IdentifiedObject.name
cim:IdentifiedObject.aliasName
cim:CoordinateSystem.crsUrn
Note: Talk2PowerSystem uses the CIM ontology, in which about 90% of instances inherit from cim:IdentifiedObject
.
You can see this in GraphDB> Explore> Class Hierarchy, e.g. here's the one for Nordic44:
Resources without name
and mRID
include enumerations and simpler "value objects" like PositionPoint
, DiagramPositionPoint
, SvVoltage
, etc
Now tool
The tool returns the current UTC date time in yyyy-mm-ddTHH:MM:SS
format.
This is useful for users' questions involving some sort of temporality relative to the time the question is posed.
Relevant issues
[1]https://github.com/statnett/Talk2PowerSystem_PM/issues/71
[2]https://github.com/statnett/Talk2PowerSystem_PM/issues/70
[3]https://github.com/statnett/Talk2PowerSystem_PM/issues/69
[4]https://github.com/statnett/Talk2PowerSystem_PM/issues/68