Approximate q function - PrincekinNicholas/PacMan-AI-Planning GitHub Wiki
Approximate Q learning is a reinforcement learning technique based on Q-learning. It introduces the feature vector and the weight vector. The goal of approximate Q-Learning is to learn a policy, which tells an agent what action to take under what circumstances.
- Feature Vector
- features = {'distanceWithNearestFood', 'distanceToGhost'}
distanceWithNearestFood -> scan the food list, return the nearest one. The new feature is updated with the distance to this food.
distanceToGhost -> observing the opponent list, return the closest distance value to the ghost.
numberOfActions -> return the number of the legal actions for the successor. (haven't used in this version)
- Weights vector
- The weight vector is initialized at the begining: weights = {'distanceWithNearestFood': 100, 'distanceToGhost': 1, 'numberOfActions': 1}
- Update weights vector
- current Q value -> features * weights
next Q value -> get the legal action list. For each action, Q-value is equal to the features' * weights' for this state. Return the maximum Q-value.
the weight for each feature:
w' = w' + alpha * ( r + discountFactor * nextQ - currentQ) * feature_value - Performance
- The approximate Q learning is extremely time-consuming, it takes up too much time to train the agent. Meanwhile, the step for one game is limited, which means it's unrealistic to do online learning. Although, it is possible to run thousands of games in local machine, to train a resealable weight vector in advance. However, the final performance of approximate Q learning isn't as satisfied as we expect, we decided to abolish this method in final approach.
the weights is updated...
calculate the new value to choose final action: