Monte Carlo Control - HiIAmTzeKean/SC3000-Artificial-Intelligence GitHub Wiki
tags:
- 🌱
- AI
- ComputerScience date: 24--Apr--2023
- Using experience to learn about the environment
- State transition probability not known
- Q(s,a) cannot be determined as in Value iteration
- Episodes are generated and the agent will learn from the trajectory
-
Epsilon soft policy
- To encourage exploration and exploitation
Repeat till T iterations
Loop
Generate episodes
End
// Policy Evaluation
For each s,a
For each episode
R <- Calculate G_t of s,a
Append R to Return(s,a)
Q(s,a) <- Average(Return(s,a))
End
// Policy improvement step
For each state
a* <- arg max_a Q(s,a)
update pi(s) with Epsilon soft policy
End
End
Links: