K armed bandit problem - Falmouth-Games-Academy/comp250-wiki GitHub Wiki

Overview

On a dusty plain,
A many-armed bandit faces,
Their choices guns drawn

The K-armed bandit problem is described in [1]. It is a problem wherein the agent must make one of several choices. Upon making a choice, a numerical reward is given based on a stationary probability distribution. The goal is to maximise the total reward gained over a certain period. This requires an awareness of the probabilities at play, which may initially seem random. Development of this awareness can be achieved through machine learning.

Applications in Monte-Carlo

The problem partly inspired Rémi Coulom's approach to Monte-Carlo Tree Search [2]. It understands that the probability of a developing plan being the best plan is not zero; even if it seems, so far, like a bad idea. This can often be true in complex games.

For example, in an FPS, an AI agent has considered two developing choices in their plans: capture the enemy spawn point (guarded by two enemy troops), or go jump off a cliff. In certain algorithms, the "jump off a cliff" decision will be pruned from a decision tree, due to encountering the extremely unfavourable state of being dead.

However, a smart agent will evaluate this idea further despite the odds. The evaluation reveals that their respawn point is being camped by 12 rookie troops, and killing these troops would yield a higher score than capturing the original objective. Further evaluation of this initially unpromising plan might reveal that killing these troops will force them to reappear at their spawn point, leading to a final battle that will amass a bigger total score for the AI.

References

[1] R. S. Sutton and A. G. Barto, "Reinforcement learning: An introduction," 2011. Available: http://incompleteideas.net/book/bookdraft2017nov5.pdf [Accessed: 13-Feb-2019]
[2] R. Coulom, "Efficient selectivity and backup operators in monte-carlo tree search," in Proceedings of the 5th International Conference on Computers and Games, CG’06, (Berlin, Heidelberg), pp. 72–83, Springer-Verlag, 2007.