Monte Carlo Tree Search - Falmouth-Games-Academy/comp250-wiki GitHub Wiki

Overview

Water runs random,
Yet flows always to the sea,
Thus is fate surpassed

The Monte-Carlo Tree Search, commonly abbreviated MCTS, is a hot new family of AI algorithms. It combines state-of-the-art precision tree searching with wild random guesses [1] [2] [3]. It is particularly useful for real-time applications of AI in non-deterministic games, where the end game state is too far to predict, vaguely defined, or undefined altogether [3]. For example, Pac-Man has no end state--only two conditions to maintain: survive, and gain points [7].

MCTS became the hot new thing after its proposal by Rémi Coulomb in 2006, in his creation of an AI designed to become the next top speedrunner of the hot new Chinese computer game, Go (~1000-2000BC) [3]. It is distinguishable from other algorithms in that it is non-deterministic, can be stopped at any time, and uses random decisions to determine an efficient path [2].

The inspiration behind this approach is traceable to the 1940s, when Stanislaw Ulam proposed that a non-deterministic approach to problem-solving could be more useful in the developing era of fast computers [6]. This contrasts against deterministic methods, such as Minimax, which favour a thoroughly-planned approach and require heuristics to define the 'optimum' end state [8].

Method

Illustration of the Monte Carlo tree search process. Image from [9].

The technique can be summarised as five key steps:

Use a selection formula to search the tree for the most likely best plan
Append random moves to that best plan
Simulate those moves until a game-ending is found, or limit is reached and the favourability of this game state amount is roughly calculated
Climb back up the tree, distributing the reward points up to the root
Based on the current state of the tree, select the highest-scoring path

Stages

The process can be divided into the following stages, as described by [2] and [5].

1: Selection

In this stage, a selection formula is applied, searching down the tree for the highest-scoring developing plan.

When the tree is searched, the number of previous visits is tracked, such that the more promising nodes are explored first [3]. However, less-promising nodes, while explored less, are still explored. This means an approach that initially does not look promising, but is followed through with a 'killer move' [3], can still be evaluated.

The approach is reminiscent of the A* algorithm, which similarly uses heuristics to prioritise the most promising moves.

2. Expansion

The selected node is chosen for expansion. During this stage, several random, or pseudo-random, decisions are plotted. Generally, the former will explore the most possibilities, but may take longer to find a solution [citation needed].

Like the first stage, this stage can be stopped at any time. This is usually once a terminal state is found, or after a given threshold.

3. Simulation

Simulate the outcome of the actions chosen, either finding a result or stopping at a certain point and using an evaluation function.

4. Backpropagation

Update the tree based on the result of this simulation/evaluation/imaginary game playing, distributing the reward value as we traverse up to the root node. This 'strengthens' the path to a good solution such that it is more likely to be selected in the next round.

Advantages and disadvantages

Unlike methods such as Minimax, a Monte-Carlo Tree Search does not require an evaluation function to determine the game state, except where the game has ended or the desired goal has been achieved [1]. Furthermore, the absence of this evaluation function makes it useful for games whose state can change at a moment's notice, or otherwise has many unpredictable possibilities - also known as a high branching factor [5].

This also makes the method attractive to problems where the programmer does not want to write an evaluation function; for example, the trolley problem. In fact, the Monte-Carlo Tree Search could perhaps be a solution to the ethics issue prevalent in self-driving cars, where in traditional programming, the programmers and also the car must share the burden of deciding who should be sacrificed in a crash scenario. Instead of that, it is worth entertaining the idea that all of the blame should go to that one poor programmer tasked with implementing the rand() function.

Finally, the use of random plays may also help to diversify the playing decisions, where a more deterministic algorithm may become stuck searching for a perfect solution.

References

[1] C. B. Browne et al., "A Survey of Monte Carlo Tree Search Methods," in IEEE Transactions on Computational Intelligence and AI in Games, vol. 4, no. 1, pp. 1-43, March 2012. Available: http://mcts.ai/pubs/mcts-survey-master.pdf [Accessed: 12-Feb-2019]
[2] Unknown author, "Monte Carlo Tree Search - About", unknown year. [Online]. Available: http://mcts.ai/about/index.html [Accessed: 12-Feb-2019]
[3] R. Coulom, "Efficient selectivity and backup operators in monte-carlo tree search," in Proceedings of the 5th International Conference on Computers and Games, CG’06, (Berlin, Heidelberg), pp. 72–83, Springer-Verlag, 2007.
[4] T. Pepels, M. H. M. Winands and M. Lanctot, "Real-Time Monte Carlo Tree Search in Ms Pac-Man," in IEEE Transactions on Computational Intelligence and AI in Games, vol. 6, no. 3, pp. 245-257, Sept. 2014.
[5] G. N. Yannakakis and J. Togelius, "Artificial Intelligence and Games". Springer, 2018.
[6] R. Eckhardt, "Stan ulam, john von neumann, and the monte carlo method," Los Alamos Science, vol. 15, no. 131-136, p. 30, 1987.
[7] J. Pittman, "The Pac-Man Dossier", February 2019. [Online]. Available: http://www.gamasutra.com/view/feature/132330/the_pacman_dossier.php?page=3 [Accessed: 13-Feb-2019]
[8] D. Higginbotham, "An Exhaustive Explanation of Minimax, a Staple AI Algorithm", January 2012. [Online]. Available: http://www.flyingmachinestudios.com/programming/minimax/ [Accessed: 15-Feb-2019]
[9] G. Chaslot, M. Winands, and H. Herik, "Parallel monte-carlo tree search," pp. 60–71, 09 2008.