Home - reporkey/Berkeley-Pacman GitHub Wiki
This is written to describe our 3-person team’s approach to project2 of COMP90054. In this project, we aim at designing Pacman agents to compete with other Pacman teams in a Pacman game. The number of eaten foods in a limited time by different teams was used to judging the winner.
We designed three technologies, which are Monte Carlo Tree Search (MCTS), Approximate Q-learning (AQ), and Value Iteration (VI). After comparing the performance of the different technologies, VI was selected to be the most appropriate approach (Just for our implementations, and not means it’s the best approach under all situations).
These technologies were chosen based on two aspects: The expected performance and the difficulty degree of putting it into effort. VI was believed to be efficient and not hard to implement, while Monte Carlo Tree Search and approximate Q-learning were thought to have excellent performance.[1][2]
After comparing so many technologies: MCT, AQ, VI, and MCT with AQ, we finally decided to use the VI approach with the second strategy as the final submitted version.
This version is not perfect because it can’t assure to defeat all the staff teams every time, and sometimes it loses to other student teams.
There are still many ways to improve all of these approaches, like setting better features and rewards, increasing the training time and the iteration times, and merging different technologies together. We also believe that if we add the corner detection or more selection models (if-else) into our approaches, they would perform better.
In the course of our work, all three of us in the team contributed. We didn’t work in isolation, but communicated with each other frequently. All the achievements are the result of our joint efforts.
[1] Guillaume Chaslot, Sander Bakkes, Istvan Szita and Pieter Spronck∗ Monte-Carlo Tree Search: A New Framework for Game AI
Universiteit Maastricht / MICC P.O. Box 616, NL-6200 MD Maastricht, The Netherlands.
[2] Melo, Francisco S. "Convergence of Q-learning: a simple proof" (PDF).
[3] Unimelb COMP90054 Tim Miller AI Planning for Autonomy 8-9. Markov Decision Processes (MDPs)
- Zichun Zhu - [email protected] - 784145
- Xinmiao Zhang - [email protected] - 990601
- Zhuorui Cai - [email protected] - 1003142