Experiment - leungjunbob/AIP-project GitHub Wiki
Experiment 1: Minimax Heuristic Function Test
Trial | Participant | Avg. Score (over 20 games) | Wins (over 20 games) |
---|---|---|---|
1 | Minimax (Heuristic 1) | 13.90 | 9 (45%) |
1 | BFS | 12.45 | 11 (55%) |
2 | Minimax (Heuristic 2) | 13.40 | 12 (60%) |
2 | BFS | 12.45 | 8 (40%) |
3 | Minimax (Heuristic 3) | 14.32 | 13 (65%) |
3 | BFS | 13.80 | 7 (35%) |
Note. The 'BFS' agent refer to the
t_085/otherAgents/bfs-y.py
agent, developed previously for individual submission, and was used as a baseline/benchmark for performance comparison.
Heuristics Tested:
-
Heuristic 1: $100 \times \text{winLoss} + 2 \times \text{points} + 2 \times \text{nobleProgress} + \text{prestige} + \text{gems}$
-
Heuristic 2: $100 \times \text{winLoss} + 1.5 \times \text{points} + 2.5 \times \text{nobleProgress} + \text{prestige} + \text{gems}$
-
Heuristic 3 (Heuristic 2 with depth decay): $(100 \times \text{winLoss} + 1.5 \times \text{points} + 2.5 \times \text{nobleProgress} + \text{prestige} + \text{gems}) \times 0.9^{\text{depth}}$
Discussion:
Heuristic 1 heavily weights the win/loss condition and equally values points and noble progress. While it achieved a decent average score and win rate, it performed slightly worse than the BFS agent in terms of wins. This indicates that while it can generate competitive scores, it may not consistently secure victories.
Heuristic 2 reduced the weight of points and increased the weight of noble progress compared to Heuristic 1. This adjustment resulted in a higher win rate of 60%, suggesting that emphasising noble progress is beneficial.
To further enhance performance and mitigate the problem of partial information, as detailed in Approach 1, we added a depth decay factor to quantify the uncertainties associated with newly drawn cards on the board in future rounds. Heuristic 3 builds upon Heuristic 2 by incorporating this depth decay factor, which decreases the weight of the heuristic as the depth increases. This approach resulted in the highest average score and win rate, making it the most effective heuristic among the three. The depth decay helps the agent prioritize immediate gains while considering future uncertainties. Consequently, Heuristic 3 is used in our minimax approach, allowing for considerations of future uncertainties and enhancing overall strategic decision-making.
Experiment 2: Performance Comparison of 3 Approaches
Trial | Participant | Avg. Score (over 20 games) | Wins (over 20 games) |
---|---|---|---|
1 | SARSA | 9.95 | 2 (10%) |
1 | Minimax | 14.90 | 18 (90%) |
2 | MCTS | 6.88 | 1 (5%) |
2 | Minimax | 16.35 | 19 (95%) |
3 | SARSA | 16.15 | 17 85%) |
3 | MCTS | 11.05 | 11 (15%) |
4 | SARSA | 14.25 | 13 (65%) |
4 | BFS | 10.75 | 7 (35%) |
5 | Minimax | 15.15 | 14 (70%) |
5 | BFS | 9.3 | 6 (30%) |
6 | MCTS | 16.00 | 11 (55%) |
6 | BFS | 13.38 | 9 (45%) |
Note. The 'BFS' agent refer to the
t_085/otherAgents/bfs-y.py
agent, developed previously for individual submission, and was used as a baseline/benchmark for performance comparison.
Discussion:
The experiment compared the performance of three different approaches: SARSA, Minimax, and MCTS, against a breadth-first search(BFS) agent as a baseline or benchmark agent in six different trials.
Trials 1-3: Minimax, MCTS, and SARSA Competing with Each Other
The Minimax agent significantly outperformed the SARSA agent with a win rate of 90%, and MCTS agent with a win rate of 95%. These results indicate that our Minimax agent is more effective than both the MCTS and SARSA approaches. One reason for MCTS's lower performance compared to Minimax is its narrower search depth and lack of pruning. On average, MCTS reached a depth of 5, whereas Minimax reached an average depth of 8. The narrower search likely led to sub-optimal results. Specifically, the lack of pruning in MCTS resulted in more ineffective branches being explored, wasting time and limiting the depth of the search within the 1-second constraint. Disadvantages and future improvements for MCTS are detailed in Approach 3: MCTS.
SARSA also underperformed compared to Minimax, likely due to its inability to effectively simulate the entire game state. However, SARSA outperformed MCTS in Trial 3, further highlighting the importance of depth and pruning in tree search algorithms to enhance computational efficiency and performance.
Trials 4-6: Competing Against the Baseline (BFS)
Not only did we compare the three approaches against each other, but we also conducted individual comparisons with a BFS algorithm as a performance benchmark. All three algorithms achieved excellent results, outperforming our baseline agent (BFS). This indicates that all our approaches are valid and effective. The BFS agent, which uses greedy heuristics, was less effective compared to the MCTS, SARSA, and Minimax algorithms.
Among them, the Minimax algorithm performed best, achieving the highest win rate of 70%, while MCTS had the highest average score, reaching 16.00. This win rate is rather surprising, considering the dominance of 90% and 95% win rates in trials 1-3, yet Minimax only achieved a 70% win rate in competition with the simpler BFS algorithm. This surprising result suggests that the BFS agent's straightforward strategy might have occasionally exploited certain weaknesses in the Minimax agent's approach. This could be because the Minimax algorithm assumes that the opponent will always play optimally, aiming to maximize their own score while minimizing the agent's score. This assumption can be a disadvantage when facing simpler strategies, such as BFS with greedy searches, which do not necessarily follow the same logic or complexity as the Minimax agent.
On the other hand, the MCTS agent showed competitive performance against the BFS agent, indicating that MCTS can be effective under certain conditions, despite its general underperformance compared to Minimax. This further highlights the importance of analysing an AI agent's performance universally, as an agent may outperform specific agents but not perform well against others. It is crucial to assess the performance through thorough experiments to ensure the agent is effective under various circumstances and against any opponent.
Overall, the experiments validate the Minimax approach as the most effective strategy for our Splendor game AI.