Glossary - trap-fish/uav-human-detection GitHub Wiki

Glossary for Object Detection (YOLO-centric)

CPU (Central Processing Unit)

The primary processing unit of a computer, responsible for executing general-purpose operations. It is less efficient than GPUs for deep learning tasks due to its limited parallelism.

GPU (Graphics Processing Unit)

A processor optimized for parallel computations, commonly used for training and inference of deep learning models like YOLO due to its high throughput.

NPU (Neural Processing Unit)

A dedicated hardware unit designed to accelerate neural network computations, often integrated into mobile or edge devices for efficient inference.

UAV (Unmanned Aerial Vehicle)

A drone or aerial platform used in various applications, including remote sensing and object detection with models such as YOLO.

YOLO (You Only Look Once)

A family of real-time object detection models that predict bounding boxes and class probabilities in a single forward pass, enabling fast and accurate detection.

LEAF-YOLO

Custom YOLO build on YOLOv7, consists of LELAN modules and CSPRes2Block within the backbon

HIC-YOLO

Head, Involution and CBAM-YOLOv5 elements of a modified YOLOv5 model

Involution

A type of neural network operation that replaces standard convolution with data-dependent dynamic kernels, enhancing feature learning efficiency and reducing computation.

CBAM-YOLOv5 (Convolutional Block Attention Module YOLOv5)

An enhanced YOLOv5 variant that integrates attention mechanisms (CBAM) to improve focus on important image regions, boosting detection accuracy.

Head (in Object Detection Networks)

The component of a model responsible for predicting object classes and bounding boxes based on features extracted by the backbone and neck.

mAP50 (Mean Average Precision at IoU 0.5)

An evaluation metric that measures detection accuracy at an IoU threshold of 0.5, indicating how well predicted boxes match ground truth.

mAP95 (Mean Average Precision at IoU 0.5:0.95)

A stricter evaluation metric that averages precision across multiple IoU thresholds (0.5 to 0.95), offering a more comprehensive assessment of model performance.

int8 (Integer 8-bit)

A data type used in model quantization to reduce size and computational load, enabling faster inference, especially on edge devices, with minimal accuracy loss.

IoU (Intersection over Union)

A metric used to evaluate the overlap between predicted and ground truth bounding boxes. It is central to calculating mAP.

Anchor Boxes

Predefined bounding boxes with different scales and aspect ratios used to detect objects of various sizes, helping the model generalize across shapes.

Bounding Box

A rectangular region predicted by an object detection model that defines the location and extent of an object within an image.

Confidence Score

The probability that a detected object exists within a predicted bounding box, used to filter predictions in post-processing.

Non-Maximum Suppression (NMS)

A technique that removes redundant bounding boxes by retaining the one with the highest confidence score and suppressing overlapping detections.

Inference

The process of using a trained model to make predictions on new, unseen data.

Backbone

The feature extraction portion of an object detection model that processes raw input images and outputs feature maps, such as CSPDarknet53 or ResNet.

Neck

The intermediate layers that fuse and refine features from the backbone, such as PANet or FPN, enhancing multi-scale detection capabilities.

Quantization

A model compression technique that reduces the precision of weights and activations, typically to int8, for faster and smaller models.

Pruning

A process of removing unnecessary weights or neurons from a neural network to reduce its size and improve inference speed.

ONNX (Open Neural Network Exchange)

An open format for representing machine learning models, enabling interoperability between different frameworks and hardware accelerators.