Design Choices - JianhengShi/Pacman GitHub Wiki

Section Overview

This section starts with an introduction to the methods that we have experimented with for this project. Intuitions of the agent designs and the pros and cons of the methods are discussed and performance overviews are given in this part. Then the section moves to a comparison of the methods where the performance differences between methods and staff agents are demonstrated and reasoned. Lastly, our final design choice is made and justified.



Table of Contents



Methods Tried

A* Search with Complete Decision Logic

This is the method we focus on during the whole cycle of evolution. For details, see AI-Method-1 and [Evolution-and-Experiments](Evolution and Experiments). It has a really high performance in the contest. Besides, its high interpretability helps us to optimize the strategy, including the tuning of parameters and the change of behaviour. However, because Heuristic Search Algorithms is too simple and straightforward, our agent cannot learn in a large number of competitions, namely, it is not inherently intelligent. Thus, it took us a lot of time and energy to debug and improve this method.

Performance in the contest: Around staff team super

Position Points Win Tie Lost Failed Score
A* Search 1/86 471 157 0 13 1 3717

Hill Climbing:

This is a simple idea we first came out with intuitive think of improving the default baseline method. For details, see AI-Method-2. Its implementation is simple and straightforward, but its performance is fairly limited. Maybe we can improve its performance by implementing more precise decision logic for different cases. However, we did not focus on this method because it was easily defeated by our other methods.

Performance in the contest: Around staff team basic


Monte Carlo Tree Search:

We came out with this method because we knew from the lectures that Monte Carlo Tree Search is very suitable for real-time AI planning for dynamic games like Pacman. For details, see AI-Method-3. We added MCTS to the main method as an attempt. It turns out that it can help simplify the complicated logic of using only A* searching. However, it is not significantly improved in performance compared with the main method, so we did not choose it finally.

Performance in the contest: Above staff team top

Position Points Win Tie Lost Failed Score
MCTS 30/86 249 97 3 70 17 1184

Approximate Q Learning:

This is our tentative verification of Q Learning methods. It is a model-free reinforcement learning technique to make the Pacman agents learn policies through experience and win the game automatically by applying the learned policies. For details, see AI-Method-4. However, due to the inherent complexity of the problem, the features and reward used could not fully capture the states and the outcome of our initial attempt of using approximate q-learning alone was poor. Therefore, to improve the performance, we proposed a hybrid approach by combining approximate q-learning with classical planning which is the following method.

Performance in the contest: Similar to baseline team


Approximate Q Learning Hybrid:

This is is an implementation of the future improvements proposed in the above Approximate Q-Learning Approach. For details, see AI-Method-5. By combining the reinforcement learning approach with classical planning techniques, we effectively improved the performance of our agents. Through this method, we explored more possibilities for Q learning. But since it did not beat the A* algorithm agent in the contest, we did not use it in the final contest.

Performance in the contest: Above staff team medium

Position Points Win Tie Lost Failed Score
Approximate Q Learning Hybrid 76/86 123 37 12 121 0 -930

Back to top



Comparison of the Methods:

Choice 1    A* Search vs Hill Climbing

Performance Comparison

In order to execute Heuristic Search Algorithms, we totally completed two versions of agents. One is based on the A* algorithm, and the other is based on the Hill Climbing algorithm. Since the Test Contest had not started when these two versions were completed, we used the built-in layouts to conduct internal competitions between these two agents. The results are shown in the following table ((9/10) means 9 out of 10 games won):

Layouts Winner
alleyCapture A* Search (9/10)
bloxCapture A* Search (8/10)
crowdedCapture A* Search (10/10)
defaultCapture A* Search (10/10)
distantCapture A* Search (8/10)
fastCapture A* Search (7/10)
jumboCapture A* Search (10/10)
mediumCapture A* Search (7/10)
officeCapture A* Search (10/10)
strategicCapture A* Search (10/10)

Agent Choice

The results of the contest showed that the A* algorithm achieved an overwhelming victory. Therefore, we finally decided to use this version as our initial choice.

Analysis and Discussion

In fact, even though the A* version completely beats the Hill Climbing version, this does not mean that the A* algorithm is completely better than the Hill Climbing algorithm in the Pacman competition. By comparing the code logic, we found that the Hill Climbing version of the agent has many logical contradictions, and the judgment and decision-making of Pacman's behaviour is not perfect, and the overall completion is far from the A* version. However, since both of them are based on Heuristic Search Algorithms, the final performance should not be significantly different. In view of the good logical judgment and better performance of the A* version, we decided to further optimize and improve it.

Back to top


Choice 2    A* Search vs Monte Carlo Tree Search

Performance Comparison

For analysis and comparison, we wrote an algorithm that mixes Monte Carlo and A* Search and then compared it with the pure A* Search method. The original intention was that Monte Carlo Search Tree might be more effective when escaping. For straightforward comparison, We used the contest to conduct competitions between these A* Search and Hybrid MCTS. The results are shown in the following table:

version Position Points Win Tie Lost Failed Score
A* Search 1/86 471 157 0 13 1 3717
MCTS 30/86 294 97 3 70 17 1184

Agent Choice

The results of the contest showed that the A* algorithm performed is better than the Monte Carlo tree search method in the contest. Therefore, we chose to use the A* algorithm method.

Analysis and Discussion

It is worth mentioning that our hybrid MCTS method is not a global MCTS method, it is built on the decision logic of our initial A* method: MCTS only works when being chased by defenders. By observing the game replay, it is found that most of the games MCTS lost are due to the long time that the agent has been chased, which reduces the efficiency of eating dots. By adjusting the threshold used to determine whether the agent should go home, we may solve this problem. Generally, the main reason why our MCTS is defeated by our initial A* version is that we did not spend a lot of time improving it, just like we did in the A* search method. If there is more time, we would like to try to improve the MCTS method. The inspiration MCTS gave us is its online simulation ideas and the uncertainty introduced, which often has a positive effect on games like Pacman.

Back to top


Choice 3    Approximate Q-Learning

Performance Comparison

The pure approximate Q-Learning method was ruled out straight away and was not used in any pre-contest tournament because of its poor performance. The below table shows its contest results out of 10 games per map against the baseline team:

Layouts Win Lose Tie Fail
alleyCapture 2 7 0 1
bloxCapture 1 9 0 0
crowdedCapture 2 6 0 2
defaultCapture 6 3 0 1
distantCapture 5 3 1 1
fastCapture 2 8 0 0
jumboCapture 5 2 1 2
mediumCapture 5 3 1 1
officeCapture 3 1 5 1
strategicCapture 3 6 0 1

Agent Choice

The results shown above indicate that approximate Q-Learning was barely more effective than the baseline approach and further improvements needed to be made in order to compete with the staff teams.

Analysis and Discussion

The reason that the approximate Q-Learning agents suffered a poor performance was that it failed to learn some critical policies such as to deliver the eaten food home. This led to two problems: First, the approximate Q-Learning agents could only score passively when they were chased off by the opponent Ghosts and ran back home to avoid being eaten. Second, the approximate Q-learning agents kept eating food when there were only two beans left and thus crashed the program. As discussed in AI Method 4, such failure in learning indicated two issues of our design: the features used cannot represent the state effectively and the reward was too sparse.

Back to top


Choice 4    A* Search vs Approximate Q-Learning Hybrid

Performance Comparison

In the approximate Q-Learning Hybrid approach, classical planning techniques were used in two cases: guiding the agent Pacman to deliver food home for active scoring and guiding the agent Ghost to where the opponent Pacman appeared for defending. We had the hybrid agent compete with the baseline agents and the staff agents in the pre-contest tournaments. The results of the contests are shown below.

Layouts Win Lose Tie Fail
alleyCapture 6 0 4 0
bloxCapture 5 2 3 0
crowdedCapture 10 0 0 0
defaultCapture 9 0 1 0
distantCapture 9 0 1 0
fastCapture 10 0 0 0
jumboCapture 8 0 2 0
mediumCapture 5 0 5 0
officeCapture 9 0 1 0
strategicCapture 10 0 0 0

Table 1. Comparing to the baseline agent

Fig 1. Comparing to staff agents

Agent Choice

Although the hybrid agents performed significantly better than the pure approximate Q-Learning agents, it was still not as efficient as the A* agents in the pre-contest tournaments. Therefore, A* Search is still our first-choice algorithm.

Analysis and Discussion

It is worth noticing that the improvement of the performance is not because the agent learnt when to deliver the eaten food home for scoring but using a decision tree to switch between reinforcement learning and classical planning techniques. Since the rules for the decision tree is pre-defined, it will not always yield the optimal action. However, on the other hand, since we only used classical planning techniques in two cases where the agents failed to learn the effective policy, the decent performance of the hybrid agents proved that the approximate Q-Learning agents did effectively learn some policies such as eating the food, avoiding the opponent Ghosts, and defending the food on their own side.

Back to top



Final Design Choice

Algorithm Choice

The following table lists the pros and cons of the different AI planning algorithms or hybrid algorithms we implemented. To see the detailed implementation of each method, please go to the corresponding separate wiki file.

Method Performance Pros Cons
A* Search + Complete Decision Logic Around staff team super High performance, high interpretability, and fast calculation Not intelligent or general enough, need to be implemented with complete logic with every specific case
Hill Climbing Around staff team basic Simple and quick Low performance due to short-sightedness
A* Search + MCTS Above staff team top Simplified logic, and more general with planning Low calculation speed due to simulation
Approximate Q-Learning Similar to the baseline team Model-free, and resource-saving compared to traditional Q-learning Features and reward hard to assign, and poor interpretability during test
A* Search + Approximate Q-Learning Above staff team medium Improved performance of Approximate Q-Learning Switching between reinforcement learning and classical planning may lead to non-optimal actions

According to their performance in the Test Contest, we chose the first method as our final agent.


Game Strategy Choice

Attack

Through experiments, we find that the aggressive strategy of eating dots is more effective, namely, both agents play the role of attackers most of the time. Because when facing intelligent opponents, defence is very difficult and inefficient. As the saying goes: attack is the best defence. You can't win the Pacman game by defending.

This is a comparison between our less-defence version and more-defence version:

  • Less-defence version:
Position Points Win Tie Lost Failed Score
Less-denfence 2/76 195 65 0 10 0 1352
  • More-defence version:
Position Points Win Tie Lost Failed Score
More-denfence 14/81 167 55 2 23 4 225

Defence

For defence cases, the agent first finds a path to the enemy's Pacman. If there is no feasible path, then the agent will go to the location where the last dot is eaten by the enemy's Pacman.


For more information about this section, please refer to Stage 4: More Defensive.

Back to top

⚠️ **GitHub.com Fallback** ⚠️