reinforcement learning - feliyur/exercises GitHub Wiki
RL equations / definitions source: CMU 106017-s17 course slides.
BSP source: A unified framework for data association aware robust belief space planning and perception, Pathak et al. 2018, IJRR
Source: wikipedia.
4-tuple:
Where
Source: wikipedia and this site (accessed January 3rd, 2021).
7-tuple:
$$\left( \mathcal{S},\mathcal{A},\mathbb{P}_a(s^\prime \mid s, a),r(s),\Omega,~\mathbb{P}(z\in Omega \mid s\in \mathcal{S}) \right)$$
Where
$$
\max_\pi~ \mathbb{E} \left[ \sum_{t=0}^\infty \gamma^t r(s_t) \right]
$$
where
$$ V^{\pi}(s) \doteq \operatorname{\mathbb{E}}{\pi}\left[ \sum{k=0}^\infty \gamma^k r_{t+k+1} \mid s_t = s \right] = \sum_{a\in \mathcal{A}}\pi (a\mid s) Q^\pi (s, a) $$
$$ Q^{\pi} (s, a)\doteq \mathbb{E}\pi \left[ \sum{k=0}^\infty \gamma^k r_{t+k+1} \mid s_t =s, a_t =a \right] = \sum_{s^\prime \in \mathcal{S}} P(s^\prime\mid s, a) \left[ R(s, a, s^\prime) + \gamma V^\pi (s^\prime) \right] $$
$$ Q^{\pi}(s, a) = \sum_{s^\prime \in \mathcal{S}} P(s^\prime\mid s, a) \left[R(s, a, s^\prime) + \gamma \sum_{a^\prime \in \mathcal{A}} \pi(a^\prime \mid s^\prime) Q^\pi(s^\prime, a^\prime) \right] \ = \mathbb{E}{s^\prime} \left[R(s, a, s^\prime) + \gamma \mathbb{E}{a^\prime \sim \pi(s^\prime) }\left[ Q^\pi(s^\prime, a^\prime)\right] \right] $$
Objective:
$$ J(a_{k:k+L}) = \mathbb{E}{z{k+1:k+L}}\left[ \sum_{l=1}^L c_l(b[k+l]) \right] $$
sudo apt install libosmesa6-dev libgl1-mesa-glx libglfw3
sudo apt install patchelf
ls -al /usr/lib/x86_64-linux-gnu/libG*
sudo ln -s /usr/lib/x86_64-linux-gnu/libGL.so.1 /usr/lib/x86_64-linux-gnu/libGL.so