Discounted sum of rewards - HiIAmTzeKean/SC3000-Artificial-Intelligence GitHub Wiki
tags:
- 🌱
- AI
- ComputerScience
- MDP date: 19--Apr--2023
- Current rewards now are more valuable than rewards in the future
- Without discounting rewards, algorithm may run into infinite time horizon
- Discounting future rewards with
$0\lt\gamma\lt1$ $G_t=R_t+\gamma R_{t+1}+\gamma^2R_{t+2}+...=\sum_0^K{\gamma^i R_{t+i}}$ $\displaystyle\sum_0^\infty{\gamma^i R_{t+i}}=\frac{R_{max}}{1-\gamma}$ - Bounds the accumulated rewards to ensure that algorithm converges
Relation to Dynamic programming
$G_t=R_t+\gamma R_{t+1}+\gamma^2R_{t+2}+…$ $G_t=R_t+\gamma G_{t+1}$
Links: