02 Matrix Representation of a Neural Net Layer - chanchishing/Introduction-to-Deep-Learning GitHub Wiki

Matrix Representation

For a Neural Net layer $l$ that takes its input from its previous layer $l-1$, we can represent the input, weight/bias parameters and output of the layer using matrix notation as follows:

$$ z^{(l)}=W^{(l)}a^{(l-1)}+b^{(l)} $$

$$ a^{(l)}=\sigma(z^{(l)}) $$

where

$a^{(l)}$ is the activation output of layer $l$
$a^{(l-1)}$ is the activation input form previous layer $l-1$
$\sigma()$ is the activation function
$W^{(l)}$ is the weight parameter of the layer
$b^{(l)}$ is the bias parameter of the layer

For a layer with $n^{(l-1)}$ input neurons and $n^{(l)}$ output neurons, the dimension of the matrices are as follows:

Matrix/Vector	Dimension (single sample)	Dimension (Batch/GPU Processing B samples)
$z^{(l)}$	$n^{(l)} \times 1$	$n^{(l)} \times B$
$b^{(l)}$	$n^{(l)} \times 1$	$n^{(l)} \times 1$
$W^{(l)}$	$n^{(l)} \times n^{(l-1)}$	$n^{(l)} \times n^{(l-1)}$
$a^{(l-1)}$	$n^{(l-1)} \times 1$	$n^{(l-1)} \times B$
$a^{(l)}$	$n^{(l)} \times 1$	$n^{(l)} \times B$

Note: $b^{(l)}$ is broadcast across all $B$ samples during batch computation.

In general the number of parameter in a Neural Net with $L$ layers, the total number of parameter is the sum of no. of entries in the Weight Matrix and Bias Vector of all layers:

$$ \begin{alignat*}{3} \text{Total Number of Parameters} &= \sum_{l=1}^{L} (n^{(l)} \times n^{(l-1)}) + (n^{(l)} \times 1)&&\ &= \sum_{l=1}^{L} (n^{(l)}) \times (n^{(l-1)} + 1) && \ \end{alignat*} $$