Semantic and Instance Segmentation - iffatAGheyas/computer-vision-handbook GitHub Wiki

🧠 Semantic & Instance Segmentation with U-Net

Segmentation divides an image into regions based on visual similarity. There are two main types:

Type What It Does
Semantic Segmentation Labels each pixel with a class (e.g., all “cats”)
Instance Segmentation Labels pixels and separates individual objects (e.g., Cat 1 vs Cat 2)

In short:
– Semantic = “what is where”
– Instance = “which one is where”


🧩 What Is U-Net? :contentReference[oaicite:1]{index=1}

U-Net is a deep convolutional network designed for semantic segmentation, originally developed for biomedical images. It’s now widely used in:

  • Medical imaging 🧬
  • Satellite imagery 🌐
  • Cell segmentation 🧫
  • Road-scene parsing 🚗

🏗️ U-Net Architecture Overview

U-Net has a U-shaped design, consisting of:

Block Role
Encoder Downsamples image, learns features
Bottleneck Connects encoder to decoder
Decoder Upsamples and refines segmentation map
Skip Connections Transfer spatial detail from encoder to decoder
Input → [Conv → ReLU → MaxPool] … → Bottleneck → [UpConv → Concat → Conv] → Output

✅Skip connections help preserve edge details that would otherwise be lost during downsampling.

📥 Input / Output

Input: RGB or grayscale image
Output: Pixel-wise prediction map (same width × height as input)

Each pixel is labelled with:

  • A class (e.g., background, cat, road)
  • Or a binary mask (foreground vs background)

🧪 Code Snippet: Simple U-Net in Keras

from tensorflow.keras.layers   import Conv2D, MaxPooling2D, Conv2DTranspose, concatenate, Input
from tensorflow.keras.models   import Model

def simple_unet(input_shape=(128, 128, 1)):
    inputs = Input(input_shape)

    # Encoder
    c1 = Conv2D(16, 3, activation='relu', padding='same')(inputs)
    p1 = MaxPooling2D((2, 2))(c1)

    c2 = Conv2D(32, 3, activation='relu', padding='same')(p1)
    p2 = MaxPooling2D((2, 2))(c2)

    # Bottleneck
    c3 = Conv2D(64, 3, activation='relu', padding='same')(p2)

    # Decoder
    u1    = Conv2DTranspose(32, 2, strides=2, padding='same')(c3)
    concat1 = concatenate([u1, c2])
    c4    = Conv2D(32, 3, activation='relu', padding='same')(concat1)

    u2    = Conv2DTranspose(16, 2, strides=2, padding='same')(c4)
    concat2 = concatenate([u2, c1])
    c5    = Conv2D(16, 3, activation='relu', padding='same')(concat2)

    outputs = Conv2D(1, 1, activation='sigmoid')(c5)
    return Model(inputs, outputs)

⚙️ How It Works in Practice

Step What Happens
Input A 128 × 128 image (e.g., grayscale cell image)
Forward U-Net encodes features down and decodes them back up
Output A 128 × 128 segmentation map with pixel-wise class or mask

You can use this model for:

  • Binary segmentation (sigmoid + binary crossentropy)
  • Multi-class segmentation (softmax + categorical crossentropy)

Summary Table

Concept Description
Model U-Net
Task Semantic segmentation (pixel labeling)
Input Image (e.g., 128 × 128 × 3)
Output Segmentation mask (same size as input)
Key Feature Skip connections for detail recovery
Best Used In Medical imaging, remote sensing, road-scene parsing

Click here for Semantic & Instance Segmentation with Mask R-CNN.