Soft policy - HiIAmTzeKean/SC3000-Artificial-Intelligence GitHub Wiki


tags:

  • 🌱
  • AI
  • ComputerScience
  • MDP date: 20--Feb--2023

Soft policy

Basis is to encourage exploration. Linked to Exploration-Exploitation dilemma.

Known as $\epsilon$-greedy action chosen for Q(s,a) where action with highest value will be chosen with $p=1-\epsilon$ or some random action will be chosen with $p=\epsilon$.

$$\pi(s) = \begin{cases} p=1-\epsilon \\ p=\epsilon \end{cases}$$


Links:

⚠️ **GitHub.com Fallback** ⚠️