computer vision - iffatAGheyas/computer-vision-handbook GitHub Wiki
📘 What is Computer Vision?
Computer Vision is a field of Computer Science and Artificial Intelligence that teaches machines to “see” and understand the visual world—much like humans do with their eyes and brains. Whereas humans rely on biological eyes and neural processes, computers use cameras and sophisticated algorithms or models to interpret images and videos.
🎥 Real-World Analogy
Consider how you:
- Look at a traffic light and decide when to cross the road
- Recognise a friend in a photograph
- Watch a video clip and understand who is speaking and what’s happening
Computer Vision aims to replicate these everyday tasks in software.
🔎 Core Tasks in Computer Vision
-
Image Classification
“Is this image a cat or a dog?” -
Object Detection
“Where is the car in this image?” -
Segmentation
“Which pixels belong to the car?” -
Video Analysis
“Is the person walking or running?”
🔄 How It Works (Overview)
-
Input
An image or a video frame -
Processing
The system uses algorithms to extract features (e.g. edges, shapes) -
Understanding
The model classifies, detects or tracks based on those features -
Output
A decision (e.g. “face detected”) or an action (e.g. unlock phone)