Semantic and Instance Segmentation - iffatAGheyas/computer-vision-handbook GitHub Wiki
🧠 Semantic & Instance Segmentation with U-Net
Segmentation divides an image into regions based on visual similarity. There are two main types:
Type | What It Does |
---|---|
Semantic Segmentation | Labels each pixel with a class (e.g., all “cats”) |
Instance Segmentation | Labels pixels and separates individual objects (e.g., Cat 1 vs Cat 2) |
In short:
– Semantic = “what is where”
– Instance = “which one is where”
🧩 What Is U-Net? :contentReference[oaicite:1]{index=1}
U-Net is a deep convolutional network designed for semantic segmentation, originally developed for biomedical images. It’s now widely used in:
- Medical imaging 🧬
- Satellite imagery 🌐
- Cell segmentation 🧫
- Road-scene parsing 🚗
🏗️ U-Net Architecture Overview
U-Net has a U-shaped design, consisting of:
Block | Role |
---|---|
Encoder | Downsamples image, learns features |
Bottleneck | Connects encoder to decoder |
Decoder | Upsamples and refines segmentation map |
Skip Connections | Transfer spatial detail from encoder to decoder |
Input → [Conv → ReLU → MaxPool] … → Bottleneck → [UpConv → Concat → Conv] → Output
✅Skip connections help preserve edge details that would otherwise be lost during downsampling.
📥 Input / Output
Input: RGB or grayscale image
Output: Pixel-wise prediction map (same width × height as input)
Each pixel is labelled with:
- A class (e.g., background, cat, road)
- Or a binary mask (foreground vs background)
🧪 Code Snippet: Simple U-Net in Keras
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Conv2DTranspose, concatenate, Input
from tensorflow.keras.models import Model
def simple_unet(input_shape=(128, 128, 1)):
inputs = Input(input_shape)
# Encoder
c1 = Conv2D(16, 3, activation='relu', padding='same')(inputs)
p1 = MaxPooling2D((2, 2))(c1)
c2 = Conv2D(32, 3, activation='relu', padding='same')(p1)
p2 = MaxPooling2D((2, 2))(c2)
# Bottleneck
c3 = Conv2D(64, 3, activation='relu', padding='same')(p2)
# Decoder
u1 = Conv2DTranspose(32, 2, strides=2, padding='same')(c3)
concat1 = concatenate([u1, c2])
c4 = Conv2D(32, 3, activation='relu', padding='same')(concat1)
u2 = Conv2DTranspose(16, 2, strides=2, padding='same')(c4)
concat2 = concatenate([u2, c1])
c5 = Conv2D(16, 3, activation='relu', padding='same')(concat2)
outputs = Conv2D(1, 1, activation='sigmoid')(c5)
return Model(inputs, outputs)
⚙️ How It Works in Practice
Step | What Happens |
---|---|
Input | A 128 × 128 image (e.g., grayscale cell image) |
Forward | U-Net encodes features down and decodes them back up |
Output | A 128 × 128 segmentation map with pixel-wise class or mask |
You can use this model for:
- Binary segmentation (sigmoid + binary crossentropy)
- Multi-class segmentation (softmax + categorical crossentropy)
✅ Summary Table
Concept | Description |
---|---|
Model | U-Net |
Task | Semantic segmentation (pixel labeling) |
Input | Image (e.g., 128 × 128 × 3) |
Output | Segmentation mask (same size as input) |
Key Feature | Skip connections for detail recovery |
Best Used In | Medical imaging, remote sensing, road-scene parsing |