UCT - peregrineshahin/ChessProgrammingWiki GitHub Wiki

title: UCT

Home * Search * Monte-Carlo Tree Search * UCT

UCT (Upper Confidence bounds applied to Trees),

a popular algorithm that deals with the flaw of Monte-Carlo Tree Search, when a program may favor a losing move with only one or a few forced refutations, but due to the vast majority of other moves provides a better random playout score than other, better moves. UCT was introduced by Levente Kocsis and Csaba Szepesvári in 2006 [1], which accelerated the Monte-Carlo revolution in computer Go [2] and games difficult to evaluate statically. If given infinite time and memory, UCT theoretically converges to Minimax.

RAVE

Most contemporary implementations of MCTS are based on some variant of UCT [4]. Modifications have been proposed, with the aim of shortening the time to find good moves. They can be divided into improvements based on expert knowledge and into domain-independent improvements in the playouts, and in building the tree in modifying the exploitation part of the UCB1 formula, for instance in Rapid Action Value Estimation (RAVE) [5] considering transpositions.

PUCT

Chris Rosin's PUCT modifies the original UCB1 multi-armed bandit policy by approximately predicting good arms at the start of a sequence of multi-armed bandit trials ('Predictor' + UCB = PUCB) [6]. A variation of PUCT was used in the AlphaGo and AlphaZero projects [7] , and subsequently also in Leela Zero and Leela Chess Zero [8].

Quotes

Gian-Carlo Pascutto

Quote by Gian-Carlo Pascutto in 2010 [9]:

There is no significant difference between an [alpha-beta search](Alpha-Beta "Alpha-Beta") with heavy [LMR](Late_Move_Reductions "Late Move Reductions")  and a [static evaluator](Evaluation "Evaluation") (current state of the art in [chess](Chess "Chess")) and an UCT searcher with a small exploration constant that does playouts (state of the art in [go](Go "Go")).

The shape of the [tree](Search_Tree "Search Tree") they search is very similar. The main breakthrough in Go the last few years was how to backup an uncertain Monte Carlo score. This was solved. For chess this same problem was solved around the time [quiescent search](Quiescence_Search "Quiescence Search") was developed.

Both are producing strong programs and we've proven for both the methods that they scale in strength as hardware speed goes up.

So I would say that we've successfully adopted the simple, brute force methods for chess to Go and they already work without increases in computer speed. The increases will make them progressively stronger though, and with further software tweaks they will eventually surpass humans.

Raghuram Ramanujan et al.

Quote by Raghuram Ramanujan, Ashish Sabharwal, and Bart Selman from their abstract On Adversarial Search Spaces and Sampling-Based Planning [10]:

UCT has been shown to outperform traditional [minimax](Minimax "Minimax") based approaches in several challenging domains such as [Go](Go "Go") and [KriegSpiel](KriegSpiel "KriegSpiel"), although minimax search still prevails in other domains such as [Chess](Chess "Chess"). This work provides insights into the properties of adversarial search spaces that play a key role in the success or failure of UCT and similar sampling-based approaches. We show that certain "early loss" or "shallow trap" configurations, while unlikely in Go, occur surprisingly often in games like Chess (even in grandmaster games). We provide evidence that UCT, unlike minimax search, is unable to identify such traps in Chess and spends a great deal of time exploring much deeper game play than needed.

Publications

2000 ...

Peter Auer, Nicolò Cesa-Bianchi, Paul Fischer (2002). Finite-time Analysis of the Multiarmed Bandit Problem. Machine Learning, Vol. 47, No. 2, pdf

2005 ...

2006

Levente Kocsis, Csaba Szepesvári (2006). Bandit based Monte-Carlo Planning. ECML-06, LNCS/LNAI 4212, pdf
Levente Kocsis, Csaba Szepesvári, Jan Willemson (2006). Improved Monte-Carlo Search. pdf
Sylvain Gelly, Yizao Wang (2006). Exploration exploitation in Go: UCT for Monte-Carlo Go. pdf
Sylvain Gelly, Yizao Wang, Rémi Munos, Olivier Teytaud (2006). Modiﬁcation of UCT with Patterns in Monte-Carlo Go. INRIA

2007

Yizao Wang, Sylvain Gelly (2007). Modifications of UCT and Sequence-Like Simulations for Monte-Carlo Go. IEEE Symposium on Computational Intelligence and Games, Honolulu, USA, 2007, pdf
Shugo Nakamura, Makoto Miwa, Takashi Chikayama (2007). Improvement of UCT using evaluation function. 12th Game Programming Workshop 2007
Tristan Cazenave, Nicolas Jouandeau (2007). On the Parallelization of UCT. CGW 2007, pdf » Parallel Search
Jean-Yves Audibert, Rémi Munos, Csaba Szepesvári (2007). Tuning Bandit Algorithms in Stochastic Environments. pdf

2008

Nathan Sturtevant (2008). An Analysis of UCT in Multi-player Games. CG 2008
Guillaume Chaslot, Mark Winands, Jaap van den Herik (2008). Parallel Monte-Carlo Tree Search. CG 2008, pdf

2009

Rémi Coulom (2009). The Monte-Carlo Revolution in Go. JFFoS'2008: Japanese-French Frontiers of Science Symposium, slides as pdf
Fabien Teytaud, Olivier Teytaud (2009). Creating an Upper-Confidence-Tree program for Havannah. Advances in Computer Games 12, inria-00380539 as pdf » Havannah