reinforcement learning - feliyur/exercises GitHub Wiki
RL equations / definitions source: CMU 106017-s17 course slides.
BSP source: A unified framework for data association aware robust belief space planning and perception, Pathak et al. 2018, IJRR
Reinforcement Learning
Markov Decision Process (MDP)
Source: wikipedia.
4-tuple: $$\left( \mathcal{S}, \mathcal{A}, \mathbb{P}_a(s^\prime \mid s, a), r(s) \right)$$
Where $\mathcal{S}$ is the state space and $\mathcal{A}$ the action space.
Partially-Observable Markov Decision Process (POMDP)
Source: wikipedia and this site (accessed January 3rd, 2021).
7-tuple:
$$\left( \mathcal{S},\mathcal{A},\mathbb{P}_a(s^\prime \mid s, a),r(s),\Omega,~\mathbb{P}(z\in Omega \mid s\in \mathcal{S}) \right)$$
Where $\mathcal{S}$ is the state space, $\mathcal{A}$ the action space and $\Omega$ the observation space.
Objective:
$$ \max_\pi~ \mathbb{E} \left[ \sum_{t=0}^\infty \gamma^t r(s_t) \right] $$ where $0<\gamma<1$ is the discount rate.
State-Value Function
$$ V^{\pi}(s) \doteq \operatorname{\mathbb{E}}{\pi}\left[ \sum{k=0}^\infty \gamma^k r_{t+k+1} \mid s_t = s \right] = \sum_{a\in \mathcal{A}}\pi (a\mid s) Q^\pi (s, a) $$
Bellman Equation:
$$ V^{\pi}(s) = \sum_{a\in \mathcal{A}} \pi (a\mid s) \sum_{s^\prime \in \mathcal{S}} P(s^\prime\mid s, a) \left[R(s, a, s^\prime) + \gamma V^{\pi}(s^\prime) \right] \ = \mathbb{E}_\pi \left[R(s, a, s^\prime) + \gamma V^{\pi}(s^\prime) \right] $$
Action-Value Function
$$ Q^{\pi} (s, a)\doteq \mathbb{E}\pi \left[ \sum{k=0}^\infty \gamma^k r_{t+k+1} \mid s_t =s, a_t =a \right] = \sum_{s^\prime \in \mathcal{S}} P(s^\prime\mid s, a) \left[ R(s, a, s^\prime) + \gamma V^\pi (s^\prime) \right] $$
Bellman Equation:
$$ Q^{\pi}(s, a) = \sum_{s^\prime \in \mathcal{S}} P(s^\prime\mid s, a) \left[R(s, a, s^\prime) + \gamma \sum_{a^\prime \in \mathcal{A}} \pi(a^\prime \mid s^\prime) Q^\pi(s^\prime, a^\prime) \right] \ = \mathbb{E}{s^\prime} \left[R(s, a, s^\prime) + \gamma \mathbb{E}{a^\prime \sim \pi(s^\prime) }\left[ Q^\pi(s^\prime, a^\prime)\right] \right] $$
Belief Space Planning
Objective:
$$ J(a_{k:k+L}) = \mathbb{E}{z{k+1:k+L}}\left[ \sum_{l=1}^L c_l(b[k+l]) \right] $$
Simulators
Mujoco on Ubuntu 18.04
sudo apt install libosmesa6-dev libgl1-mesa-glx libglfw3
sudo apt install patchelf
ls -al /usr/lib/x86_64-linux-gnu/libG*
sudo ln -s /usr/lib/x86_64-linux-gnu/libGL.so.1 /usr/lib/x86_64-linux-gnu/libGL.so