Evolution and Experiments - JianhengShi/Pacman GitHub Wiki
This section focuses on the evolution of the strategies that the A* Search agents executed throughout the project. We have gone through five evolution stages before coming to the final implementation. In the first stage, the framework for A* Search agents was built and the fundamental contest strategies were established. In the second stage, we improved agent performance by making the agents less greedy and more aware of the risks of being eaten. Then in the following stage, a strategy that allows agents to break from being stuck in the same area was introduced. In stage four, we explored a more defence-prone approach. Finally, we enhanced the food-eating strategy and made our agents more efficient in the last stage.
- Stage 1: Establish Framework and Fundamental Strategies
- Stage 2: More Aware of Risks
- Stage 3: Break Free From Being Stuck
- Stage 4: More Defensive
- Stage 5: Eat Food Faster
In this stage, we built two agents using the method discussed in A* Search with Complete Decision Logic. The design of these two agents laid the foundation for all the further improvements and was used as a framework throughout the whole project cycle. There were four key strategies at the core of this design.
Implementation details can be found in commit 3e8672e.
Our program constantly monitors the map, especially the inherently dangerous zones(eg. dead ends or areas around opponent ghosts). The agents are prevented from entering such areas in some cases.
As shown in the figure, green tiles are dead ends and blue tiles are their exit.
Since Pacman is a dynamic game, the dangerous areas constantly change. We monitor game states and calculate dangerous positions for our agents constantly.
Based on the game states, such as the positions of the dangerous zones or whether the opponents have eaten the capsules, agents decide their next moves by following a predefined decision tree. Depending on the move to take, different goal states are passed into the A* search algorithm. Below is a chart illustrating the moves that agents take under different situations.
Essentially, the dot eating problem can be abstracted into a more complex version of the Travelling Salesman Problem. In addition to collecting all the dots in the shortest possible time, agents also need to consider the dynamic state space, the stochastic actions, partially observable locations, and multiple agents. Due to the complexity of the problem, we decided to use the greedy algorithm to design the Eat Dots function.
Specifically, we need to make our two agents as a combination to get the highest efficiency of eating dots, instead of considering only one agent itself. Intuitively, if two agents are separated, eating dots from different areas will avoid the conflict between the two agents and improve the performance. The logic of eating beans is such that Pacman will, in principle, go for the dot closest to it, but if that target is also teammate's target, it goes for the dot second closest to it. This strategy was intended to prevent two agents from moving in the same direction or region, but it was not always effective. In the final evolution stage, we made some improvements over the routing of the agents.
The Catch function in our initial design was fundamental. Simply put, when an enemy Pacman was detected, the agent closest to it would hunt it down.

Our agents are on the red team.
The above figure demonstrated a problem of our agents in this stage: when there is no capsule on the map and our agent is chased by ghosts, they still keep trying to eat dots instead of going home to escape. This radical strategy eventually led to their capture by the enemy's ghosts. We fixed this problem in the next evolution stage.
Position - 4/14 | Percentile - 29% | Above staff_team_medium
| Position | Points | Win | Tie | Lost | Failed | Score | |
|---|---|---|---|---|---|---|---|
| Stage 1 Performance | 4/14 | 250 | 82 | 4 | 44 | 0 | 1293 |
The design of agents in this stage laid the foundation for all the subsequent improvements. The early-stage contest results indicated that A* Searching algorithm with complete decision logic is a promising method and we could further improve agent performance by exploring different logics and tuning parameters. However, just like any other fundamental approach, it has several shortcomings, including but not limited to:
- our agent behaved exceedingly greedy and may ignore going home
- dangerous area range judged by the program is defective thus restricts the performance of Pacman
- sometimes our agent may get stuck by one or two enemies at the centerline
- two agents lack cooperation when eating dots
In the previous stage, our agents sometimes behave exceedingly greedy and put themselves at risk. In response to this problem, we made the following improvement:
- When our agents are being chased, they will try to eat capsule or go home instead of continuing eating dots.
Implementation details can be found in commit fe5f1af.
When being chased, Pacman will first try to eat the capsule; if that fails (e.g. there are no capsules left or no path), it will follow the nearest path back to its own area. If neither of these strategies works, Pacman will simply stay put.
The figure below demonstrates how the actions are chosen when Pacman is being chased.
The introduction of this strategy updates the decision tree for actions that we used in the game:
This is almost the final form of our decision tree structure. Although we made a few minor changes in the later stages, the fundamental logic of the decision tree has not changed much.

Our agents are on the blue team.
The blue Pacman at the bottom finds it is chased by ghosts and there is no way to the capsule, so it chooses to go home.
Position - 2/14 | Percentile - 14% | Above staff_team_top
| Position | Points | Win | Tie | Lost | Failed | Score | |
|---|---|---|---|---|---|---|---|
| Stage 2 Performance | 2/14 | 314 | 104 | 2 | 21 | 6 | 2092 |
Adopting the Being Chased strategy is an important improvement for our agents. It allows Pacman to basically deal with all the main dangers that can be encountered during the game, improving its performance in contests to a great extent. After this, we have further optimized the function and analyzed the dangers in different situations to allow Pacman to execute the corresponding escape strategy.
Because we prevent the agents from moving to the dangerous areas in the map, the positions that the agents can move to are narrowed down. This sometimes causes the agents to get stuck in the middle line with one or two enemies as shown in the figure below.

Implementation details can be found in commit 4e393c5.
To fix this problem, we added a function to determine whether our agent is stuck or not. In the function, we use a list to keep track of Pacman's position over time, and if it moves back and forth across a region more than a threshold, it is determined to be stuck.
def ifStuck(self):
'''
Check if the agent is stuck. If stuck, return True.
'''
oddCount = 0
evenCount = 0
for i in range (0,self.stuckThreshold-1,2):
if self.pos[i] == self.pos[i+2]:
evenCount += 1
for i in range (1,self.stuckThreshold-2,2):
if self.pos[i] == self.pos[i+2]:
oddCount += 1
if evenCount == self.stuckThreshold // 2:
if oddCount == self.stuckThreshold // 2 - 1:
return True
else:
return FalseWhen the agent is detected as stuck, we try to break the tie by choosing a random action from all the safe actions(the actions that will not lead the agents to dangerous positions) instead of the safe action with the highest heuristic value. If there are no safe actions, a random legal action will be taken.
def breaktie(self,gameState):
safeAction = []
dp = self.notGo2(gameState)
legalActions = gameState.getLegalActions(self.index)
x, y = gameState.getAgentPosition(self.index)
for action in legalActions:
dx, dy = Actions.directionToVector(action)
nextPos = (int(x+dx), int(y+dy))
if nextPos not in dp:
if action != Directions.STOP:
safeAction.append(action)
if len(safeAction) != 0:
out = random.choice(safeAction)
else:
out = random.choice(legalActions)
return out
Our agent is on the red side.
As you can see from this gif, when it is stuck at the centerline of the map, it chose a random but safe move: move down.
Position - 1/19 | Percentile - 5% | Above staff_team_super
| Position | Points | Win | Tie | Lost | Failed | Score | |
|---|---|---|---|---|---|---|---|
| Stage 3 Performance | 1/19 | 479 | 159 | 2 | 19 | 0 | 3365 |
The contest results proved that this improvement is quite effective. Stuck detection and break is a very useful tool. Its logic is simple, easy to understand, and not difficult to implement, but can be applied to multiple situations.
One common problem of the previous three stages is that our agents play aggressively. Previously, the agents only tried to eat the opponent Pacman in two situations provided the opponent Pacman had not eaten capsules: First, when the agents were next to the opponent Pacman. Second, they had finished eating all the dots of the opponent side.
In this implementation, we made our agents more defensive. As long as our scores were higher than the opponents and the opponent had not eaten capsules, our agents would focus on catching the other team’s Pacman. To do so, we used a set to track all the dots from the last step and compared the dots in the set with the dots in the game. If there was one dot missing, then our agent would move towards that position. This way, our agents could keep moving closer to the opponent Pacman until the opponent Pacman became observable.
Implementation details can be found in commit 3e8672e.

Our agents are on the blue team.
As shown in the gif, when our scores were higher than the opponents, one of our agents goes back home to defend. When there is one dot eaten by the enemy's Pacman, our agent will go to that position. This strategy makes our agents more effective in defending.
Position - 14/81 | Percentile - 17% | Above staff_team_top
| Position | Points | Win | Tie | Lost | Failed | Score | |
|---|---|---|---|---|---|---|---|
| Stage 4 Performance | 14/81 | 167 | 55 | 2 | 23 | 4 | 225 |
It turns out that spending more time on defence could compromise agents' performance, especially when the opponent Pacman is good at avoiding ghosts. An effective defence strategy requires agents to be able to predict the opponents' move and cooperate together, which are difficult to achieve. Therefore, the defence strategies that we explored in this stage were not included in the final implementation.
Since in the previous implementations, two agents share one food position list and the Pacman will always go for the closest or second closest food on the list, they often move in the same direction and share many common paths. This is not efficient. As we found from the previous contests, the agents are most efficient in eating food when they are separated and focus on eating food in their own areas. To improve this, we created two food position lists in different orders and made each agent follow one list. This strategy effectively prevented two Pacman agents from moving in the same direction or region, and hence improved the overall performance.
Implementation details can be found in commit 22f5df0.
Eating Strategy before improvement:

Eating Strategy after improvement:

Position - 1/86 | Percentile - 1% | Above staff_team_super
| Position | Points | Win | Tie | Lost | Failed | Score | |
|---|---|---|---|---|---|---|---|
| Stage 5 Performance | 1/86 | 471 | 157 | 0 | 13 | 1 | 3717 |
This improvement has proved to be very effective. It further improves and stabilizes the performance of our agent in the competition. This strategy is also included in the final implementation.