Markov Decision Process - HiIAmTzeKean/SC3000-Artificial-Intelligence GitHub Wiki
tags:
- 🌱
- AI
- ComputerScience
- MDP date: 19--Apr--2023
- Sequential decision problem
- Fully observable
- Stochastic environment
- Markovian transition model
- Additive rewards
- Reward function of R(s)
- At each time step, the expected return is denoted as such
$G_t=R_t+R_{t+1}+R_{t+2}+...$ - Runs into the problem of infinite time horizon
- Discounted sum of rewards
- Reward function of R(s)
- Searching for policy that maximises reward
$\pi(s)$
- Value function
Links: