Discounted sum of rewards - HiIAmTzeKean/SC3000-Artificial-Intelligence GitHub Wiki

tags:

🌱
AI
ComputerScience
MDP date: 19--Apr--2023

Discounted sum of rewards

Idea

Current rewards now are more valuable than rewards in the future
Without discounting rewards, algorithm may run into infinite time horizon

Solution

Discounting future rewards with $0\lt\gamma\lt1$
$G_t=R_t+\gamma R_{t+1}+\gamma^2R_{t+2}+...=\sum_0^K{\gamma^i R_{t+i}}$
$\displaystyle\sum_0^\infty{\gamma^i R_{t+i}}=\frac{R_{max}}{1-\gamma}$
Bounds the accumulated rewards to ensure that algorithm converges

Relation to Dynamic programming

$G_t=R_t+\gamma R_{t+1}+\gamma^2R_{t+2}+…$
$G_t=R_t+\gamma G_{t+1}$

Links:

⚠️ GitHub.com Fallback ⚠️