reinforcement learning - feliyur/exercises GitHub Wiki

RL equations / definitions source: CMU 106017-s17 course slides.

BSP source: A unified framework for data association aware robust belief space planning and perception, Pathak et al. 2018, IJRR

Reinforcement Learning

Markov Decision Process (MDP)

Source: wikipedia.

4-tuple: $$\left( \mathcal{S}, \mathcal{A}, \mathbb{P}_a(s^\prime \mid s, a), r(s) \right)$$

Where $\mathcal{S}$ is the state space and $\mathcal{A}$ the action space.

Partially-Observable Markov Decision Process (POMDP)

Source: wikipedia and this site (accessed January 3rd, 2021).

7-tuple: $$\left( \mathcal{S},\mathcal{A},\mathbb{P}_a(s^\prime \mid s, a),r(s),\Omega,~\mathbb{P}(z\in Omega \mid s\in \mathcal{S}) \right)$$

Where $\mathcal{S}$ is the state space, $\mathcal{A}$ the action space and $\Omega$ the observation space.

Objective:

$$ \max_\pi~ \mathbb{E} \left[ \sum_{t=0}^\infty \gamma^t r(s_t) \right] $$ where $0<\gamma<1$ is the discount rate.

State-Value Function

$$ V^{\pi}(s) \doteq \operatorname{\mathbb{E}}{\pi}\left[ \sum{k=0}^\infty \gamma^k r_{t+k+1} \mid s_t = s \right] = \sum_{a\in \mathcal{A}}\pi (a\mid s) Q^\pi (s, a) $$

Bellman Equation:

$$ V^{\pi}(s) = \sum_{a\in \mathcal{A}} \pi (a\mid s) \sum_{s^\prime \in \mathcal{S}} P(s^\prime\mid s, a) \left[R(s, a, s^\prime) + \gamma V^{\pi}(s^\prime) \right] \ = \mathbb{E}_\pi \left[R(s, a, s^\prime) + \gamma V^{\pi}(s^\prime) \right] $$

Action-Value Function

$$ Q^{\pi} (s, a)\doteq \mathbb{E}\pi \left[ \sum{k=0}^\infty \gamma^k r_{t+k+1} \mid s_t =s, a_t =a \right] = \sum_{s^\prime \in \mathcal{S}} P(s^\prime\mid s, a) \left[ R(s, a, s^\prime) + \gamma V^\pi (s^\prime) \right] $$

Bellman Equation:

$$ Q^{\pi}(s, a) = \sum_{s^\prime \in \mathcal{S}} P(s^\prime\mid s, a) \left[R(s, a, s^\prime) + \gamma \sum_{a^\prime \in \mathcal{A}} \pi(a^\prime \mid s^\prime) Q^\pi(s^\prime, a^\prime) \right] \ = \mathbb{E}{s^\prime} \left[R(s, a, s^\prime) + \gamma \mathbb{E}{a^\prime \sim \pi(s^\prime) }\left[ Q^\pi(s^\prime, a^\prime)\right] \right] $$

Belief Space Planning

Objective:

$$ J(a_{k:k+L}) = \mathbb{E}{z{k+1:k+L}}\left[ \sum_{l=1}^L c_l(b[k+l]) \right] $$

Simulators

Mujoco on Ubuntu 18.04

sudo apt install libosmesa6-dev libgl1-mesa-glx libglfw3
sudo apt install patchelf
ls -al /usr/lib/x86_64-linux-gnu/libG*
sudo ln -s /usr/lib/x86_64-linux-gnu/libGL.so.1 /usr/lib/x86_64-linux-gnu/libGL.so