machine_learning - cccnqu/ai106b GitHub Wiki

機器學習

  1. Regression : output a scalar
  2. Classification : output a class
  3. Structured Learning : output a sequence, matrix, graph, tree ....

強化學習

Q-Learning & SASA

Q-Learning : Off-Policy (永遠選最近的一條路,即使那條路有很多危險)

q[s][a] = (1-rate) * q[s][a] + rate * (r + decay * argmax(q[s]))

SASA : On-Policy (會懂得躲避危險區域)

q[s][a] = (1-rate) * q[s][a] + rate * (r + decay * q[s1][a1])

DQN (Deep Q Network)

用神經網路來學習 q[s][a] 的值。