Semantic and Instance Segmentation - iffatAGheyas/computer-vision-handbook GitHub Wiki

🧠 Semantic & Instance Segmentation with U-Net

Segmentation divides an image into regions based on visual similarity. There are two main types:

Type	What It Does
Semantic Segmentation	Labels each pixel with a class (e.g., all “cats”)
Instance Segmentation	Labels pixels and separates individual objects (e.g., Cat 1 vs Cat 2)

In short:
– Semantic = “what is where”
– Instance = “which one is where”

🧩 What Is U-Net? :contentReference[oaicite:1]{index=1}

U-Net is a deep convolutional network designed for semantic segmentation, originally developed for biomedical images. It’s now widely used in:

Medical imaging 🧬
Satellite imagery 🌐
Cell segmentation 🧫
Road-scene parsing 🚗

🏗️ U-Net Architecture Overview

U-Net has a U-shaped design, consisting of:

Block	Role
Encoder	Downsamples image, learns features
Bottleneck	Connects encoder to decoder
Decoder	Upsamples and refines segmentation map
Skip Connections	Transfer spatial detail from encoder to decoder

Input → [Conv → ReLU → MaxPool] … → Bottleneck → [UpConv → Concat → Conv] → Output

✅Skip connections help preserve edge details that would otherwise be lost during downsampling.

📥 Input / Output

Input: RGB or grayscale image
Output: Pixel-wise prediction map (same width × height as input)

Each pixel is labelled with:

A class (e.g., background, cat, road)
Or a binary mask (foreground vs background)

🧪 Code Snippet: Simple U-Net in Keras

from tensorflow.keras.layers   import Conv2D, MaxPooling2D, Conv2DTranspose, concatenate, Input
from tensorflow.keras.models   import Model

def simple_unet(input_shape=(128, 128, 1)):
    inputs = Input(input_shape)

    # Encoder
    c1 = Conv2D(16, 3, activation='relu', padding='same')(inputs)
    p1 = MaxPooling2D((2, 2))(c1)

    c2 = Conv2D(32, 3, activation='relu', padding='same')(p1)
    p2 = MaxPooling2D((2, 2))(c2)

    # Bottleneck
    c3 = Conv2D(64, 3, activation='relu', padding='same')(p2)

    # Decoder
    u1    = Conv2DTranspose(32, 2, strides=2, padding='same')(c3)
    concat1 = concatenate([u1, c2])
    c4    = Conv2D(32, 3, activation='relu', padding='same')(concat1)

    u2    = Conv2DTranspose(16, 2, strides=2, padding='same')(c4)
    concat2 = concatenate([u2, c1])
    c5    = Conv2D(16, 3, activation='relu', padding='same')(concat2)

    outputs = Conv2D(1, 1, activation='sigmoid')(c5)
    return Model(inputs, outputs)

⚙️ How It Works in Practice

Step	What Happens
Input	A 128 × 128 image (e.g., grayscale cell image)
Forward	U-Net encodes features down and decodes them back up
Output	A 128 × 128 segmentation map with pixel-wise class or mask

You can use this model for:

Binary segmentation (sigmoid + binary crossentropy)
Multi-class segmentation (softmax + categorical crossentropy)

✅ Summary Table

Concept	Description
Model	U-Net
Task	Semantic segmentation (pixel labeling)
Input	Image (e.g., 128 × 128 × 3)
Output	Segmentation mask (same size as input)
Key Feature	Skip connections for detail recovery
Best Used In	Medical imaging, remote sensing, road-scene parsing