Strategy - Adamatoma/helloword GitHub Wiki

Description

General Strategy Design

Our general design is based on Heuristic Search Algorithm and some ideas from approximate Q-learning. We have balanced many different strategies on deciding the way of cooperation between team agents (two offensive agents or one offensive agent and one defensive agent). In the final version, two offensive agents strategy is adopted, while we have defensive mechanism during non-pacman state. Our agents are able to eat dots, escape from ghosts, eat capsule, kill ghosts, and find different way to enter the enemy territory. The idea of combining greedy best heuristic and approximate Q-learning and game theory is the main frame work for almost all mechanisms, we separate all goals into different feature and balance with relevant weights to calculate Q(s, a), and then choose max Q(s, a) value and action. The reason is that we start the project with analyzing the given baseline team and continue expand our ideas based on this frame work. In addition, we use Heuristic Search Algorithm to find an alternative path without crossing enemy territory (with A-star search). Based on the experience of assignment 1, it is easy to implement our ideas with Heuristic Search Algorithm.

Detailed Strategy Design

The Detailed strategy design contains how does our pacman act during specific scenarios.

Ghost state:

food lists are divided into two part based on location (up-area and down area) and assigned to two agents, so that agents would find different path to eat foods.
When agent(s) detect an enemy pacman, the closer agent would go to chase that enemy pacman. We assume that almost all foods is located at corner, in this way, if the enemy pacman tries to eat foods in the corner, it will decrease the distance between our agent and enemy pacman so that we could be easier catch enemy pacman. Hence, we could successfully protect our foods or slow down the enemy's speed. Furthermore, when the enemy pacman is stucked by two agents in a tunnel, agents would collaboratively kill the enemy pacman.
When our agent is close to the boundary and have a tie problem (repeat 5 times same position) with an enemy defender, our agent will use Heuristic Search Algorithm (limited in our territory) to find a safe path to a new entrance.

Pacman state:

Pacman will escape when detect an enemy defender based on the maze distance between the pacman and the enemy defender.
Avoid go into a dead corner when the pacman is in the escaping state.
The situation for going back: a) No capsule left b) Not enough time when carrying at least one food c) Eat enough foods (Total food - 2)

Challenges Experienced

Balance the weight of new features at the beginning stage At the beginning stage, we add many new features to implement our mechanisms, we have to rebalance all weights when new features is added. Because new features might influence previous mechanisms.
How to avoid two agents that are scared by one enemy defender We found our agents always choose similar path to eat food even if their strategies are different, and always these two agents always escape together and chased by one enemy defender. To avoid this, based on the height of the map we separate the food list into up-area and down-area, and assign different food list to agents.
How to avoid tie position problem with finding an alternative path We found our agent always have a tie position with an enemy defender close to the boundary. Based on Game Theory, this is an equilibrium state because both teams are eager to reach an optimal state without considering the choice from the opponent team. So we have to compromise and choose another path. If we are stuck in the boundary in the safe area, it is not a good place to go into enemy territory (easy to be caught). Hence, we use Heuristic Search Algorithm (limited in our territory) to find a new entrance.
Avoid dead end during escaping We found our agent always go into dead end during escaping, and then get caught by enemy defenders. Hence, we detect if next available position is a dead end by calculating the position that could be expanded.

Possible Improvements

Better method of finding new entrance to enemy territory Although we separate the food list and agents have different goal food, sometimes they still go into enemy territory at similar position. It would be better if one agent go into enemy territory at the top of the boundary, while the other agent goes from the bottom.
Better cooperation between agents There might be a better method of assign applicable foods to relevant agent, rather that simply separate foods by half-half location. There might be a better time to eat capsule based on number of food left or time left, rather than on the way of escaping.

Experimental Section

Performance

Approaches	Justification	Performance
“Corner Food” feature	When the pacman is escaping, it will not eat corner food	Great
“Eat scared enemies” feature	When enemy defender is scared, our pacman will chase and kill them	Great
Time & Score Trade Off	Pacman will go back when it is carrying enough foods or have no enough time left	Good
“Path Planning”	Detour to enter enemy territory when equilibrium state happen	Acceptable

Discussion

For the “Corner Food” approach, it performs pretty good when eat a corner food without enemy around. When the enemy defender continues to catch our pacman once we enter their territory, this feature will help our pacman safely walk around and sometimes steals food along the way because it will not directly go into a dead corner. However, there might be a better strategy that balance the distance to food and back and distance between our agent and enemy, to decide whether we go into the corner for food during escaping.

For the “Path Planning” approach, it performs very well when the maze is large and there are a lot of walls at the boundary. In this way, our entry position could be far away enough so that two agents won't be struggle with one enemy defender. Otherwise, our two agents are easy struggle with one enemy defender and waste a few time on struggling.

For the “Eat scared enemies” approach, our pacman will catch and kill scared enemies once we detect them, in this stage, it performs good. However, if the scared enemies keep escaping our agent will keep chasing. While at the same time, it would reach more values if we eat as many as possible foods during enemies are scared rather keep chasing them.

The “Time & Score Trade Off” approach, it performs well on going home when enough foods is carried or less time left. However, our agents sometimes carry 1-2 foods during escaping and pass the path close to the boundary without going back. It would earn more value if pacman go back.