Markov Decision Process - HiIAmTzeKean/SC3000-Artificial-Intelligence GitHub Wiki

tags:

🌱
AI
ComputerScience
MDP date: 19--Apr--2023

Markov Decision Process

Properties

Sequential decision problem
Fully observable
Stochastic environment
Markovian transition model
Additive rewards
- Reward function of R(s)
  - At each time step, the expected return is denoted as such
  - $G_t=R_t+R_{t+1}+R_{t+2}+...$
  - Runs into the problem of infinite time horizon
- Discounted sum of rewards

Solution

Searching for policy that maximises reward
- $\pi(s)$
Value function

Links:

⚠️ GitHub.com Fallback ⚠️