Value iteration - HiIAmTzeKean/SC3000-Artificial-Intelligence GitHub Wiki
tags:
- 🌱
- AI
- ComputerScience
- MDP date: 10--Feb--2023
Makes use of Bellman equation to create a Bellman update step.
- Note that value iteration finds the
$a \in A(s)$ that maximises the current iteration of the state - Information to update the current iteration makes use of experience from the previous iteration
Set treshold to some delta
Repeat
// This outer repeat loop is a while loop that continues till convergence. Note that this outer loop is the one that is increasing the iteration count. The for loop is to update all V(s) values in the current iteration
For each s in S
// this for loop is to update the values in the current iteration. Suppose it is the first iteration, then most of the values might be negative.
Perform bellman update
Until (V'(s) - V(s)) less than threshold
The algorithm is guaranteed to converge due to the property of contraction. A contraction occurs when there is a single fixed point and if the function applied causes the value to move towards the fixed point.
-
$v_0$ is chosen arbitrarily with terminal state value 0 - Array of value state
- 2 arrays, one for old value
$v_i(s)$ and another one$v_{i+1}(s)$ - Single array with update in-place
- 2 arrays, one for old value
- When the number of possible actions are huge the computation of
$\displaystyle \max_{a\in A}{Q(s,a)}$ can be costly at each stage- Policy iteration could be more desirable instead
Links: