neural - cccbook/aijs GitHub Wiki
主題:神經網路
數學基礎 (重要 !)
理論基礎
-
自動微分 --
後半 Becoming a Backprop Ninja 當中,da,db,dc,dx 代表梯度,而不是微分符號 (這很容易誤解)
da = a.grad
db = b.grad
....
dx = x.grad
為了避免誤解,我改用 gx 代表 f 對 x 的梯度,也就是:
ga = a.grad
gb = b.grad
....
gx = x.grad
於是我們得到
所以才會有下列推導
var x = a * b;
// and given gradient on x (gx), we saw that in backprop we would compute:
var ga = b * gx;
var gb = a * gx;
這個推導其實對應到下列程式
var multiplyGate = function(){ };
multiplyGate.prototype = {
forward: function(u0, u1) {
// store pointers to input Units u0 and u1 and output unit utop
this.u0 = u0; // u0 = a
this.u1 = u1; // u1 = b
this.utop = new Unit(u0.value * u1.value, 0.0);
return this.utop;
},
backward: function() {
// take the gradient in output unit and chain it with the
// local gradients, which we derived for multiply gate before
// then write those gradients to those Units.
this.u0.grad += this.u1.value * this.utop.grad; // da = b * dx
this.u1.grad += this.u0.value * this.utop.grad; // db = a * dx
}
}