Policy iteration - HiIAmTzeKean/SC3000-Artificial-Intelligence GitHub Wiki
tags:
- 🌱
- AI
- ComputerScience
- MDP date: 20--Apr--2023
- Policy evaluation
- Fix a policy for each state
- Compute till delta for V(s) is below threshold
- Policy improvement
- Compute best policy for each state
$\pi(s)=\displaystyle \arg \max_a{\sum{P(s'|s,a)[R(s,a,s')+\gamma V(s')]}}$
Links: