CNN - iffatAGheyas/computer-vision-handbook GitHub Wiki
π€ Module 8: Deep Learning in Computer Vision
Deep learning has revolutionised visual understanding, powering everything from image classification to advanced segmentation and specialised applications like OCR and medical imaging. In this module, weβll cover the fundamentals of CNNs, build a classifier from scratch, and explore pixel-level segmentation architectures.
β Topics Covered
- CNN Architecture Fundamentals
- Image Classification with CNNs (from scratch)
- Semantic & Instance Segmentation (U-Net, Mask R-CNN)
- OCR, Face Recognition & Medical Imaging
π― 1. CNN Architecture Fundamentals
A Convolutional Neural Network (CNN) learns hierarchical feature representations directly from raw pixels:
- Convolutional layers detect local patterns (edges, textures, shapes).
- Pooling layers downsample spatial dimensions, reducing computation.
- Fully connected layers perform high-level reasoning and classification.
π¦ Typical CNN Pipeline
Input Image β [ Conv β ReLU β Pool ] Γ N β Flatten β Dense β Output
π οΈ Layer Roles
Layer Type | Role |
---|---|
Conv2D | Extract local features (edges, textures) |
MaxPooling2D | Reduce feature map size (downsampling) |
Flatten | Convert 2D maps to 1D feature vector |
Dense | Final classifier (outputs class scores) |
π§ͺ2. Project: Image Classification with CNNs (From Scratch)
Task: Build a custom CNN to distinguish Baby vs Doll images.
π Dataset Structure
trainingset/
βββ Baby/
βββ Doll/
testset/
βββ Baby/
βββ Doll/
ποΈ Model Architecture
Layer | Output Shape | # Parameters |
---|---|---|
Conv2D (32 filters) | (126, 126, 32) | 896 |
MaxPool2D | (63, 63, 32) | 0 |
Conv2D (64 filters) | (61, 61, 64) | 18,496 |
MaxPool2D | (30, 30, 64) | 0 |
Flatten | (57,600) | 0 |
Dense (128 units) | (128) | 7,372,928 |
Dropout (0.5) | (128) | 0 |
Dense (2 units) | (2) | 258 |
Total Params | β | 7.39 M |
π Training accuracy reached 100% in 15 epochs.
π§ͺ Full Code: CNN for Baby vs Doll Classification (From Scratch)
import os
import math
import numpy as np
import matplotlib.pyplot as plt
import cv2
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import (
Conv2D, MaxPooling2D,
Flatten, Dense, Dropout
)
from tensorflow.keras.preprocessing.image import ImageDataGenerator
# 1) Paths & hyperparameters
train_dir = "trainingset" # contains subβfolders 'Baby/' and 'Doll/'
test_dir = "testset" # contains subβfolders 'Baby/' and 'Doll/'
IMG_SIZE = (128, 128)
BATCH_SIZE = 8
EPOCHS = 15
SEED = 42
# 2) Data generators
train_datagen = ImageDataGenerator(rescale=1./255)
train_gen = train_datagen.flow_from_directory(
train_dir,
target_size=IMG_SIZE,
batch_size=BATCH_SIZE,
class_mode="categorical",
shuffle=True,
seed=SEED
)
test_datagen = ImageDataGenerator(rescale=1./255)
test_gen = test_datagen.flow_from_directory(
test_dir,
target_size=IMG_SIZE,
batch_size=1,
class_mode="categorical",
shuffle=False
)
# 3) Build a small CNN from scratch
model = Sequential([
Conv2D(32, (3,3), activation="relu", input_shape=(*IMG_SIZE, 3)),
MaxPooling2D(2,2),
Conv2D(64, (3,3), activation="relu"),
MaxPooling2D(2,2),
Flatten(),
Dense(128, activation="relu"),
Dropout(0.5),
Dense(2, activation="softmax")
])
model.compile(
optimizer="adam",
loss="categorical_crossentropy",
metrics=["accuracy"]
)
model.summary()
# 4) Train
model.fit(
train_gen,
epochs=EPOCHS
)
# 5) Predict on the test set
preds = model.predict(test_gen, steps=test_gen.samples)
pred_classes = np.argmax(preds, axis=1)
true_classes = test_gen.classes
idx2label = {v:k for k,v in test_gen.class_indices.items()}
# 6) Plot every test image with Actual vs Predicted
num_images = test_gen.samples
cols = 5
rows = math.ceil(num_images/cols)
fig, axes = plt.subplots(rows, cols, figsize=(cols*3, rows*3))
axes = axes.flatten()
for i in range(num_images):
# load & display image
img_path = os.path.join(test_dir, test_gen.filenames[i])
img = cv2.imread(img_path)
img_rgb = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
actual = idx2label[ true_classes[i] ]
predicted = idx2label[ pred_classes[i] ]
axes[i].imshow(img_rgb)
axes[i].set_title(f"A:{actual}\nP:{predicted}", fontsize=8)
axes[i].axis("off")
# hide any extra subplots
for j in range(num_images, len(axes)):
axes[j].axis("off")
plt.tight_layout()
plt.show()
π 3. Test Predictions & Output
The model was evaluated on a test set of 10 images β 5 images of babies and 5 of dolls.
However, the model predicted all test images as "Baby", failing to correctly classify any of the doll images. This highlights a common issue when using a very small dataset and training a CNN from scratch with only a few epochs.
π§ͺ This was a toy model trained on just 40 images over 15 epochs. With a larger training set and more training epochs, the model would likely achieve significantly better generalization and accuracy.
π β Summary Table
Concept | Description |
---|---|
Model Type | Custom CNN (Keras Sequential API) |
Input Shape | 128 Γ 128 RGB images |
Classes | Baby, Doll |
Accuracy (Train) | 100% after 15 epochs |
Accuracy (Test) | 90% (9/10 correct predictions) |
Next up: Weβll dive into semantic and instance segmentation architectures like U-Net and Mask R-CNN for pixel-level understanding of images.