Pool of materials - rl-reading-circle/board GitHub Wiki

Pool of materials on RL

Please add here materials for RL that you find interesting and intend to present during reading circle.

Theory

Subdivision into pieces:

  1. Intro pp. 13-22 (chapter of bandit can be skipped because problem setting is different in only this chapter)
  2. Finite Markov Decision Process (1) pp. 47-57
  3. Finite Markov Decision Process (2) pp. 58-69
  4. Dynamic Programming pp. 73-89
  5. Monte Carlo Method (1) pp. 91-103
  6. Monte Carlo Method (2) pp. 103-116 (can skip 5.8, 5.9)
  7. Temporal-Difference Learning (1) pp. 119-129
  8. Temporal-Difference Learning (2) pp. 129-138
  9. n-step Boot strapping pp. 141-157 (can skip 7.4 and 7.6)
  10. Planning and Learning with Tabular Methods (1) pp. 159-174
  11. Planning and Learning with Tabular Methods (2) pp. 177-188
  12. On-policy Prediction with Approximation (1) pp. 197-209
  13. On-policy Prediction with Approximation (2) pp. 210-222
  14. On-policy Prediction with Approximation (3) pp. 222-236
  15. On-policy Control with Approximation pp. 243-256
  16. Eligibility Trace (1) pp. 288-301
  17. Eligibility Trace (2) pp. 303-318 (can skip 12.6, 12.9)
  18. Policy Gradient Methods pp. 321-337

Practice