AI Method 1 - leungjunbob/AIP-project GitHub Wiki

MiniMax Agent

This agent is developed by Ying Zhu.

The Minimax agent, implemented in agents/t_085/myTeam.py, is our best-performing agent for the Splendor game. It uses the Minimax algorithm with alpha-beta pruning to make strategic decisions that optimize its own score while minimizing the opponent's potential gains.

The Minimax agent has demonstrated outstanding performance in competitive play. It has achieved an impressive record of 29 wins out of 40 games (Win Rate: 72.5%), exceeding the threshold required for full marks. In local comparisons, Minimax secured a 90% winning rate (out of 20 games) against SARSA and a 95% winning rate (out of 20 games) against MCTS.

Justification for Technique

The choice to use Minimax for the Splendor game agent is mainly because it backtracks all possible moves by both the agent and the opponent, effectively modeling not only its own moves, but also the opponent's behaviour. Minimax ensures that the agent makes the best possible move assuming the opponent also plays optimally. This predictive capability of opponent's best move is significant because Splendor is essentially a zero-sum game, where one player's gain directly results in the other player's loss. For instance, if I reserve a card, the opponent cannot buy it. If I attract a noble, the opponent loses that opportunity.

On the other side, while techniques like search algorithm (BFS or A*) are easier to implement, they are typically used for single-agent pathfinding problems and do not inherently account for the moves of an opponent. Therefore, Minimax provides a more robust and theoretically grounded approach for the competitive dynamics of Splendor.

Problem Modelling and Implementation Design

This agent uses a game theory technique, Minimax. The Minimax algorithm is a game theory decision technique, which recursively generates boards for all given actions for as much as turns possible (referred to as 'depth') within the limited thinking time of 1 second. It aims to maximise its own score possible while minimising opponent's score possible. The score is determined by a Splendor-related heuristic, which will be detailed later. The action associated with the game state with the highest evaluation score is then selected.

For each round, the agent will evaluate the moves on its turn, then the opponent's next turn, and then its own moves again, continuing this process until reaching the MAX_TIME limit of 0.9 seconds.

1. Depth Decision: Iterative Deepening

While the traditional minimax algorithm employs a fixed depth, where the agent explores a pre-determined maximum depth for each round, I chose to use iterative deepening to enable dynamic exploration of the game tree. Iterative Deepening is to conduct multiple minimax searches at increasing depths, starting from depth 1 up to a predefined maximum depth (MAX_DEPTH).

In the SelectAction method, the agent iterates over increasing depths from 1 to MAX_DEPTH, performing a search at each depth level (minimax_decision function) until either the maximum depth is reached or the time limit is exceeded. This approach ensures that the agent makes the best possible decision within the given time constraints, gradually increasing the depth of the search to explore deeper into the game tree as time allows. The choice of MAX_DEPTH as a constant set to 15 is not intended as a target depth for the minimax algorithm to reach, but rather as a depth that is practically unattainable within the given time frame, allowing for deeper exploration of the tree whenever possible. In the event that the minimax search conducted by the minimax_decision function reaches the time limit before completing a search at a certain depth level—let's say depth 7 — the agent will still have access to the returned search result from the previous depth (in this case, depth 6). This ensures a fallback to the last completed depth and avoids the need for random actions.

The major reason, or benefit of the iterative deepening is the dynamic exploration of the game tree. After testing, it was found that the algorithm can explore deeper in later rounds of the game, as demonstrated in Figure 1. Initially, the search depth ranges from 6 to 8, but towards the end of the game, it can reach up to 13. This increase in search depth in later rounds is possibly due to decreased legal actions or more actions/branches being pruned as the game progresses. The decreased number of legal actions can be due to a decreased amount of available resources, such as gems and cards. The less the legal actions, the deeper the search can reach. Moreover, players often have clearer objectives and optimal strategies towards the end of game. This can lead to a reduction in the expanded actions as players focus on specific winning moves or strategies. Alpha-beta pruning becomes more effective in these situations, allowing the algorithm to discard large portions of the game tree that do not need to be explored, thus enabling deeper searches in the remaining relevant parts of the tree.

A demonstration of the maximum depth reached by iterative minimax for each round

Figure 1: A demonstration of the maximum depth reached by iterative minimax for each round

Conversely, employing a predetermined depth does offer its own advantages. It may be more computationally efficient, as it only involves conducting a single search within the same time frame, potentially enabling deeper exploration. These benefits and drawbacks will be further evaluated in subsequent sections, along with considerations for future improvements.

2. Minimax Search: minimax_decision

The agent uses the minimax_decision function to recursively evaluate the successor states. The function starts by initialising variables for the agent's ID (agent_id), the opponent's ID (opponent_id), the best action found so far (best_action), and the best evaluation score found so far (best_eval). Then it iterates over all actions the agent can take from the current game state. For each action, it generates the successor state, calls min_value function to start the recursive evaluation between min_value and max_value functions, and compares the evaluation score with the best evaluation score found so far. If the current action leads to a higher score, it updates best_eval and best_action. When all actions have been evaluated, or the MAX_TIME has been reached, the function returns the action with the best evaluation score found so far.

This function includes nested max_value and min_value functions to handle the recursive evaluation process and alpha-beta pruning:

  • Max Value Function: The max_value function calculates the maximum possible score the agent can achieve from a given state, considering the opponent's optimal responses. If the current depth is 0 (meaning it has reached to the last depth), it evaluates the state using the evaluate_state function and return the evaluation score. Otherwise, it iterates over possible actions, generates successor states, and recursively calls the min_value function to get the opponent's best response. It updates the maximum evaluation score (max_eval) and prunes branches that cannot influence the final decision using alpha-beta pruning.
  • Min Value Function: The min_value function calculates the minimum possible score the opponent can force the agent into, assuming optimal play by the opponent. Similar to max_value, it evaluates potential outcomes recursively, updating the minimum evaluation score (min_eval) and pruning branches using alpha-beta pruning. When it reaches to the last depth, it calls the evaluate_state function to return an evaluation score.

Alpha-beta pruning is implemented in both max_value and min_value functions to improve efficiency. Beta is defined by beta = min(beta, eval_score) in min_value function, represents the best value that the minimising player can guarantee at any point; whereas alpha is defined by alpha = max(alpha, eval_score) in max_value function, represents the best value that the maximising player can guarantee at any point. During the recursive evaluation process, if the algorithm finds that a certain branch cannot produce a better outcome than the current alpha or beta values, it prunes that branch. This is achieved through the following conditions in the max_value and min_value functions:

# In max_value function
if beta <= alpha:
    break

# In min_value function
if beta <= alpha:
    break

By implementing alpha-beta pruning, the Minimax algorithm can skip large portions of the game tree, significantly improving computational efficiency. This allows the algorithm to evaluate more nodes within the given time frame, enhancing overall performance.

3. Evaluation Score: 'evaluate_state' and 'evaluate_player_score'

The agent determines the score associated with the successor state at depth==0(last depth) using the evaluate_state function. This function first obtain the opponent's id by (1- self.id), then calls evaluate_player_score function for both agents to calculate their scores based on successor broad state. The difference between their scores is then returned as the evaluation score ('eval_score') for the minimax decision. Using the difference in scores between the agent and the opponent aligns with the objective of the Minimax algorithm: to maximise its own score while minimising the opponent's potential gains. The higher the difference, the better the action.

$$ \textrm{Evaluation Score} (eval\textunderscore score) = h(self) - h(opponent) $$

where $h()$ is the heuristic function calculated in evaluate_player_score function, where

$$ h() = (100 \times winLoss + 1.5 \times points + 2.5 \times nobleProgress + prestige + gems) \times 0.9^{depth} $$

The evaluate_player_score function evaluates the score of a given agent (whether the self agent or the opponent agent) based on a heuristic to determine the overall gain or status of an agent considering all their current resources, including his points, cards, gems held, progress towards nobles and decay factor of depth. This heuristic is derived by a hill-climbing algorithm of Minimax from Joshua Hepworth's paper in 2016. The heuristic include the following features:

  • Win/Loss: calculated by calculate_win_loss function. It returns 1 if the given agent has won the game (score >= 15), return -1 if the opponent agent has win the game, otherwise returns 0.
  • Points: the score of an agent in the game in this turn, ranged from [0, 15, or higher]. It is obtained by self.game_rule.calScore(state, agent_id).
  • Noble Progress: is calculated by calculate_progress_toward_nobles function, ranged from [0, numberOfNoblesOnBroad]. The function calculates the normalised score representing how close the player is to meeting the requirements for each noble tile. It iterates over the noble requirements and calculates the proportion of required cards that the player already possesses, normalise scores for each noble, and return the sum of normalised scores.
  • Prestige(Cards): the number of cards held. It is obtained by len(state.agents[agent_id].cards.values()).
  • Gems: the number of gems held. It is obtained by sum(state.agents[agent_id].gems.values()).
  • Decay factor of depth ($0.9^{depth}$): This decay factor is used because the Minimax algorithm cannot predict which card can be drawn next after any card is purchased. This introduces extra uncertainties in evaluating the future broad state in 'evaluate_player_score' function. Therefore, the exponential decay based on depth is introduced to future simulation as a quantification of these uncertainties in simulating many future rounds.

I have also tested the selected heuristic functions with some variations (different weights, or decay factor), the comparison and result can be found in Experiment 1. Overall, this heuristic function has demonstrated the best performance and is therefore selected.

Advantages and Disadvantages

Advantages:

  1. Dynamic Exploration of the Tree for each round: The minimax with iterative deepening allows for dynamic exploration of the game tree, enabling deeper exploration in later rounds where more actions are pruned (detailed in above "Depth Choice: Iterative Deepening" section).
  2. Optimal Use of Time: By conducting multiple searches with increasing depths, the agent maximise its use of available thinking time, ensures the agent always makes the best possible decision within given time constraints.
  3. Fallback: In our minimax approach, in case the search at a deeper depth level exceeds the time limit, the agent can fall back to the results of the previous, shallower search. This ensures that the agent always has a decision to make, even if deeper exploration is not possible. On the other side, for a event that the search cannot be finished in the pre-determined depth approach, there is no fallback to a shallower search, which can lead to suboptimal, or unfavourable decision-making.
  4. Opponent Consideration: Minimax inherently considers the opponent's possible moves and future movements. This capability is ideal for zero-sum games, where one player's gain is another player's loss. By anticipating the opponent's actions, the agent can make more informed and strategic decisions.

Disadvantages:

  1. Partial Information: Minimax struggles with partial information, such as predicting which cards will be drawn in future rounds. As the depth of the simulation increases, the game tree becomes increasingly inaccurate, and gaps may appear in the board state. Although a decay factor for depth mitigates this issue to some extent, it can only partially address the inaccuracies.

  2. Computational Speed: Minimax algorithm is computationally intensive, especially for complex game scenarios such as Splendor. For each round, if a minimax decision explores $n$ depth, and the average number of actions is $a$, the algorithm must generates $a^n$ boards. Moreover, because we are using an iterative deepening approach, for each round, it will perform approximately 8 searches, generating $$\sum_{n=1}^{15} a^n$$ broads or until the time limit is reached. Consequently, the Minimax algorithm generally cannot reach the end-game condition when one of the agents reaches 15 points (although it is possible, it is very time-consuming). Therefore, we must use a heuristic function (evaluation score) to evaluate the board state and make decisions.

  3. Redundant Work & Less Deep Exploration (compared to a fixed depth minimax): Iterative deepening involves multiple searches at increasing depth, some parts of the game tree are explored multiple times. This can lead to inefficiencies in computational resources usage. In contrast, if a predetermined depth is used, there is only 1 search, which may have the opportunity to explore a deeper into the game tree within the same time frame. However, this could possibly been improved for future development, which will be detailed later.

Solved Challenges

1. Encountering Timeout Issues:

I encountered frequent timeout issues, both on my laptop and on the server.

These timeouts were especially prevalent on my laptop at the beginning and were due to several factors. For example, the total thinking time is limited to 1 second as required, and I had set MAX_TIME to 0.95 seconds. In the max_value and min_value functions, I checked the elapsed time against MAX_TIME. Whenever the elapsed time exceeded MAX_TIME, the loop would break, and the function would return the value for the minimax decision process immediately. However, due to the use of multiple functions, even if the current running function breaks and returns, there might still be additional functions that need to process before a best action can be returned in the SelectAction function. This delay can lead to timeouts.

Solution: To overcome this issue, I implemented multiple checkpoints for checking the elapsed time in every function. This ensures that there are sufficient breaks and enough time left to return the action without exceeding the time limit.

After implementing the minimax algorithm and testing it on my local machine for over 100 rounds, I uploaded it to the server but received 40 invalid games due to timeout issues. The MAX_TIME I set was 0.99 seconds. Although this seemed like an ambitious time limit, it worked perfectly for over 100 runs on my local machine (CPU...). The tested thinking time never exceeds 0.995. I posted an issue within our team's Git repository and sought advice on the ED discussion forum. I found that a student suggested the server might be slower than our local machines. My guess is that the server runs multiple processes simultaneously, slowing its speed.

Solution: I had to adjust the MAX_TIME of 0.99, decreasing it gradually and testing it on the server to mitigate timeout issues. After testing, I found 0.90 might be the optimal MAX_TIME considering the speed of the server.

2. Managing Recursive Calls:

Managing recursive calls in the Minimax algorithm can be challenging due to the need to track the current depth, correct transition of turns between agent and opponent, and ensure the algorithm doesn't exceed the call stack limits.

Solution: I designed the max_value and min_value functions are designed to call each other recursively, representing the alternating turns of the agent and the opponent. This structure allows the algorithm to explore the game tree deeply while maintaining the correct order of moves. Each recursive call to max_value and min_value functions includes a current_depth parameter that is decremented (current_depth - 1) each time the function is called recursively. This allows the algorithm to accurately track the current depth and ensures that the evaluation stops at the correct depth. In this way, if current_depth==0, then we know that this recursive process has reached to the last depth and is ready for evaluation.

3. Computational Complexity:

As I suggested in disadvantage section, the minimax algorithm is highly computationally intensive. It will lead to poor performance given the strict 1-second time limit. The algorithm must expand a vast number of nodes, resulting in a very large game tree. If the Minimax algorithm cannot explore the tree to a sufficient depth and is limited to shallow searches, it will not be able to provide the best actions.

Solution: Alpha-Beta Pruning is a crucial optimisation technique used in the Minimax algorithm to mitigate the computational complexity, and therefore is the first choice I implemented. During the recursive evaluation process, if the algorithm finds that a certain branch cannot produce a better outcome than the current alpha or beta values, it prunes that branch. In this way, the Minimax algorithm can skip large portions of the game tree, significantly improving computational efficiency. As a result, the algorithm can evaluate more nodes within the given time frame, enhancing overall performance.

4. Heuristic Decision:

Creating a heuristic that accurately evaluates game states is critical for the Minimax algorithm's success. However, the design of the heuristic as for evaluation of broad at the last depth is complicated, involves multiple aspects. the weights are hard to determined so extensive trials is required to compare its performance.

Solution: I based the heuristics on established research and verify this heuristic through extensive testing compared to multiple agents, and using multiple variation of this heuristic. By balancing various factors like win/loss conditions, points, noble progress, prestige, and gems, I ensured that the heuristic provided a reliable measure of each game state's value.

Future Improvements

  1. Enhanced Alpha-Beta Pruning with Iterative Deepening: As mentioned before, a pre-determined max depth is possibly allowing for a deeper exploration of the tree whereas the iterative deepening requires multiple searches which is computationally inefficient. However, this can be improved by incorporating transposition tables to enhance the efficiency of alpha-beta pruning. Transposition tables can store early shallow searches store alpha and beta values. When a game state is revisited at a deeper level, the algorithm can retrieve stored alpha and beta values from transposition tables allow for earlier pruning, skipping redundant evaluations. As a result, iterative deepening can significantly improve the efficiency of alpha-beta pruning, allowing deeper exploration of the game tree within the same time constraints and enhancing the minimax agent's performance.
  2. Incorporate Advanced Heuristics: Introducing more advanced heuristics could significantly improve the evaluation function used in the Minimax algorithm. If historical game data is available, heuristics can be derived from machine learning models trained on this data. Alternatively, the evaluation function can be refined by systematically adjusting the weights. This can be achieved through automated scripts running extensive experiments to tune the weights or by employing the hill-climbing algorithm mentioned in Joshua Hepworth's paper. Such improvements can lead to more accurate evaluations and better decision-making by the agent.
  3. Incorporate other technique with Minimax: Research has demonstrated the effectiveness of hybrid algorithms that combine Minimax with other techniques, such as Monte Carlo Tree Search (MCTS) for modeling the opponent's moves. I have attempted this approach, but encountered challenges in implementation. Future work can focus on refining this hybrid approach to leverage the strengths of both Minimax and MCTS, potentially improving the agent's performance in more complex scenarios.
⚠️ **GitHub.com Fallback** ⚠️