Monte Carlo Control - HiIAmTzeKean/SC3000-Artificial-Intelligence GitHub Wiki

tags:

🌱
AI
ComputerScience date: 24--Apr--2023

Monte Carlo Control

Idea

Using experience to learn about the environment
- State transition probability not known
- Q(s,a) cannot be determined as in Value iteration
Episodes are generated and the agent will learn from the trajectory
- First visit heuristic
- Every visit heuristic
Epsilon soft policy
- To encourage exploration and exploitation

Pseudocode

Repeat till T iterations
    Loop
        Generate episodes
    End
    // Policy Evaluation
    For each s,a
        For each episode
            R <- Calculate G_t of s,a
            Append R to Return(s,a)
        Q(s,a) <- Average(Return(s,a))
    End
    // Policy improvement step
    For each state
        a* <- arg max_a Q(s,a)
        update pi(s) with Epsilon soft policy
    End
End

Links:

⚠️ GitHub.com Fallback ⚠️