Policy iteration - HiIAmTzeKean/SC3000-Artificial-Intelligence GitHub Wiki


tags:

  • 🌱
  • AI
  • ComputerScience
  • MDP date: 20--Apr--2023

Policy iteration

Idea

  • Policy evaluation
    • Fix a policy for each state
    • Compute till delta for V(s) is below threshold
  • Policy improvement
    • Compute best policy for each state
    • $\pi(s)=\displaystyle \arg \max_a{\sum{P(s'|s,a)[R(s,a,s')+\gamma V(s')]}}$

Links:

⚠️ **GitHub.com Fallback** ⚠️