Discounted sum of rewards - HiIAmTzeKean/SC3000-Artificial-Intelligence GitHub Wiki


tags:

  • 🌱
  • AI
  • ComputerScience
  • MDP date: 19--Apr--2023

Discounted sum of rewards

Idea

  • Current rewards now are more valuable than rewards in the future
  • Without discounting rewards, algorithm may run into infinite time horizon

Solution

  • Discounting future rewards with $0\lt\gamma\lt1$
  • $G_t=R_t+\gamma R_{t+1}+\gamma^2R_{t+2}+...=\sum_0^K{\gamma^i R_{t+i}}$
  • $\displaystyle\sum_0^\infty{\gamma^i R_{t+i}}=\frac{R_{max}}{1-\gamma}$
  • Bounds the accumulated rewards to ensure that algorithm converges

Relation to Dynamic programming

  • $G_t=R_t+\gamma R_{t+1}+\gamma^2R_{t+2}+…$
  • $G_t=R_t+\gamma G_{t+1}$

Links:

⚠️ **GitHub.com Fallback** ⚠️