07 Convolutional Neural Networks (CNNs) Idea - chanchishing/Introduction-to-Deep-Learning GitHub Wiki

Why CNNs Are Preferred Over Fully Connected Networks for Images

When working with image data, Fully Connected (FC) Networks are usually not a good choice. Convolutional Neural Networks (CNNs) are specifically designed to handle images more efficiently and effectively.

Problems with Fully Connected Networks on Images

Too Many Parameters An image may contain thousands of pixels.
For example, a 64 × 64 RGB image has:

$$ 64 \times 64 \times 3 = 12,288 $$

input features.

If this image is flattened and connected to a hidden layer of just 1000 neurons, the number of weights is:

$$ 12,288 \times 1000 = 12,288,000 $$

This creates a huge number of parameters, which leads to:

high memory usage
slow training
higher risk of overfitting

Ignores Spatial Structure Images have an important 2D spatial structure:

nearby pixels are often strongly related
edges, corners, and textures are local patterns

If we flatten the image into a 1D vector, this spatial relationship is lost.
A Fully Connected Network treats every pixel as just another independent input feature.

Key Insights of CNNs

CNNs solve these problems using two important ideas:

Local Connectivity Instead of connecting each neuron to the entire image, a CNN neuron only looks at a small local region of the input, called a receptive field.

For example:

a filter may only look at a 3 × 3 patch
this allows the network to detect local patterns such as:
- edges
- corners
- textures

This makes sense because useful visual features are often local.

Weight Sharing A CNN uses the same filter (kernel) across different parts of the image.

This means:

the same feature detector is applied everywhere
the network can detect the same pattern no matter where it appears
the number of parameters is drastically reduced

For example, a 3 × 3 filter only has 9 weights (per input channel), and those same weights are reused as the filter slides across the image.

Why These Ideas Help

Because of local connectivity and weight sharing, CNNs have several major advantages:

Far Fewer Parameters CNNs require much fewer weights than Fully Connected Networks, making them:

more memory efficient
faster to train
less prone to overfitting

Preserve Spatial Information CNNs work directly on the 2D image grid, so they naturally capture spatial patterns in the image.
Better Feature Detection Across Locations Since the same filter is used everywhere, CNNs can detect a feature even if it appears in different parts of the image.

This property helps CNNs become more robust to small translations of objects in images.

Note: More precisely, convolution gives translation equivariance (if the object shifts, the feature map also shifts). Pooling and deeper layers help produce more translation invariance.