02 Matrix Representation of a Neural Net Layer - chanchishing/Introduction-to-Deep-Learning GitHub Wiki
Matrix Representation
For a Neural Net layer $l$ that takes its input from its previous layer $l-1$, we can represent the input, weight/bias parameters and output of the layer using matrix notation as follows:
$$ z^{(l)}=W^{(l)}a^{(l-1)}+b^{(l)} $$
$$ a^{(l)}=\sigma(z^{(l)}) $$
where
- $a^{(l)}$ is the activation output of layer $l$
- $a^{(l-1)}$ is the activation input form previous layer $l-1$
- $\sigma()$ is the activation function
- $W^{(l)}$ is the weight parameter of the layer
- $b^{(l)}$ is the bias parameter of the layer
For a layer with $n^{(l-1)}$ input neurons and $n^{(l)}$ output neurons, the dimension of the matrices are as follows:
| Matrix/Vector | Dimension (single sample) | Dimension (Batch/GPU Processing B samples) |
|---|---|---|
| $z^{(l)}$ | $n^{(l)} \times 1$ | $n^{(l)} \times B$ |
| $b^{(l)}$ | $n^{(l)} \times 1$ | $n^{(l)} \times 1$ |
| $W^{(l)}$ | $n^{(l)} \times n^{(l-1)}$ | $n^{(l)} \times n^{(l-1)}$ |
| $a^{(l-1)}$ | $n^{(l-1)} \times 1$ | $n^{(l-1)} \times B$ |
| $a^{(l)}$ | $n^{(l)} \times 1$ | $n^{(l)} \times B$ |
Note: $b^{(l)}$ is broadcast across all $B$ samples during batch computation.
In general the number of parameter in a Neural Net with $L$ layers, the total number of parameter is the sum of no. of entries in the Weight Matrix and Bias Vector of all layers:
$$ \begin{alignat*}{3} \text{Total Number of Parameters} &= \sum_{l=1}^{L} (n^{(l)} \times n^{(l-1)}) + (n^{(l)} \times 1)&&\ &= \sum_{l=1}^{L} (n^{(l)}) \times (n^{(l-1)} + 1) && \ \end{alignat*} $$