Utility Theory Crash Course - apoch/curvature GitHub Wiki

Utility AI as used in Curvature

The fundamental architectural approach to AI in Curvature is utility theory. A highly simplified view of this model is that every action an agent could take is assigned a score. Each time an agent "thinks", they pick from the top-scoring options. Most AI will simply use the highest-scoring option, but you can get some interesting variety by deliberately picking from, say, the 5 top scoring options with weighted randomness.

Utility-based AI is highly reactive, flexible, and robust. If the scoring method is designed carefully, the AI will always have some reasonable and convincing options for how to behave in a given situation. Moreover, the AI can take advantage of subtle distinctions based on game rules. As an example, a utility-based agent might be a bit of a pyromaniac, and find it appealing to set enemies on fire. Should an enemy be doused in flammable gasoline, the utility or score of setting them ablaze will increase, and the agent can correspondingly use the target of opportunity.

A major advantage of utility-based AI over more rigid structures is that it can always find a good decision to make even in complex and dynamic situations. This can be a double-edged sword, however. Sometimes an AI is too subtle - making choices that are logically consistent but don't "read" well to the player. One of Curvature's primary goals is to make it easier to see how the AI behaves in a controlled environment so that designers and AI programmers can polish off the rough edges of their creations.

Curvature's internal architecture is based on the Infinite Axis Utility System by Dave Mark. This system is a powerful abstraction for building scoring functions. In other words, Curvature has a standardized palette of tools for controlling exactly how well a choice is scored by the AI. The fundamental unit of the system is the consideration. A single consideration maps an input onto a response curve and produces a score. All consideration scores are multiplied together to yield the final score for a particular behavior. Behaviors may take place in the context of a specific target (object, player, other AI NPC, etc.) or be standalone (such as playing an animation or bark).

The behaviors available to an AI are controlled by behavior sets. Sets are a powerful mechanism for adding and removing situational behaviors from an AI agent's repertoire. For example, a baseline set might include ambient and combat behaviors, whereas a more dynamic set could add "tavern-specific" behaviors to an NPC when they enter a tavern.

Each behavior is scored by the AI system in turn. The highest-scoring behavior "wins" and becomes the action the NPC will take until the next time they "think." A behavior's score is equal to the product of all of its consideration scores. This has a unique and interesting property: if any one consideration scores zero, that behavior cannot score more than zero. In other words, considerations can invalidate a behavior at any time. This is a very useful mechanic for controlling when a specific action would be inappropriate, suboptimal, or even just excessively repetitive.

The final piece to mention here is the momentum bonus. The last behavior to be chosen for execution gets a flat 25% bonus to its score on the next think cycle. This simple device ensures that two (or more) nearly-equal behaviors do not oscillate back and forth. Momentum is critical for handling cases like two equidistant targets - one target should be "chosen" by the AI, and then remain prioritized rather than having the agent switch between both targets constantly. By boosting the score of the last-made decision, the AI automatically develops a bit of hysteresis and preferential memory without needing an explicit representation of this "choosing" process in the knowledge layer.

Resources for Learning Utility Theory