Approximate q function - PrincekinNicholas/PacMan-AI-Planning GitHub Wiki

Introduction

Approximate Q learning is a reinforcement learning technique based on Q-learning. It introduces the feature vector and the weight vector. The goal of approximate Q-Learning is to learn a policy, which tells an agent what action to take under what circumstances.

Implementation

Feature Vector
features = {'distanceWithNearestFood', 'distanceToGhost'}

distanceWithNearestFood -> scan the food list, return the nearest one. The new feature is updated with the distance to this food.

distanceToGhost -> observing the opponent list, return the closest distance value to the ghost.

numberOfActions -> return the number of the legal actions for the successor. (haven't used in this version)

Weights vector
The weight vector is initialized at the begining: weights = {'distanceWithNearestFood': 100, 'distanceToGhost': 1, 'numberOfActions': 1}
Update weights vector
current Q value -> features * weights
next Q value -> get the legal action list. For each action, Q-value is equal to the features' * weights' for this state. Return the maximum Q-value.
the weight for each feature:
w' = w' + alpha * ( r + discountFactor * nextQ - currentQ) * feature_value
Performance
The approximate Q learning is extremely time-consuming, it takes up too much time to train the agent. Meanwhile, the step for one game is limited, which means it's unrealistic to do online learning. Although, it is possible to run thousands of games in local machine, to train a resealable weight vector in advance. However, the final performance of approximate Q learning isn't as satisfied as we expect, we decided to abolish this method in final approach.




the weights is updated... updatedWeight


calculate the new value to choose final action: newValue

⚠️ **GitHub.com Fallback** ⚠️