computer vision - iffatAGheyas/computer-vision-handbook GitHub Wiki

📘 What is Computer Vision?

Computer Vision is a field of Computer Science and Artificial Intelligence that teaches machines to “see” and understand the visual world—much like humans do with their eyes and brains. Whereas humans rely on biological eyes and neural processes, computers use cameras and sophisticated algorithms or models to interpret images and videos.


🎥 Real-World Analogy

Consider how you:

  • Look at a traffic light and decide when to cross the road
  • Recognise a friend in a photograph
  • Watch a video clip and understand who is speaking and what’s happening

Computer Vision aims to replicate these everyday tasks in software.


🔎 Core Tasks in Computer Vision

  • Image Classification
    “Is this image a cat or a dog?”

  • Object Detection
    “Where is the car in this image?”

  • Segmentation
    “Which pixels belong to the car?”

  • Video Analysis
    “Is the person walking or running?”


🔄 How It Works (Overview)

  1. Input
    An image or a video frame

  2. Processing
    The system uses algorithms to extract features (e.g. edges, shapes)

  3. Understanding
    The model classifies, detects or tracks based on those features

  4. Output
    A decision (e.g. “face detected”) or an action (e.g. unlock phone)