Tutorial On Using Deep Learning in OpenCV - ECE-180D-WS-2024/Wiki-Knowledge-Base GitHub Wiki

Introduction to OpenCV and DNN

Deep learning/Deep neural networks (DNN) are a subset of machine learning that aims to mimic the information processing network of the brain, instead of using predetermined algorithms. These neural networks are biologically inspired, the network is given data and can learn, generalize, and even predict future events. OpenCV (Open Computer Vision Library) is a powerful open-source library that provides various tools used in computer vision, which allows computers to process and interpret visual information in images and videos. OpenCV offers a range of complexity, from simple image processing to more advanced tasks like object detection and machine learning. The integration of DNNs into OpenCV allows users to utilize state-of-the-art deep learning models for various computer vision tasks, such as image classification, object detection, semantic segmentation, and more. While there are many pre-trained models for image classification, OpenCV is highly optimized for Intel processors, potentially achieving faster processing speeds on well-optimized systems.

bar-graph-image-classification-speed-comparison-on-cpu-using-different-frameworks

You can see that OpenCV processing speeds are comparable to PyTorch and that it is much faster than TensorFlow.

OpenCV itself doesn’t natively support training DNNs, however, it does allow you to integrate pre-trained deep-learning models. By using the ‘dnn’ module in OpenCV, users can load pre-trained deep learning models from popular frameworks like TensorFlow, Caffe, and PyTorch. Using these libraries, computers can "see" and classify images, something humans can do innately. Using these libraries, computers can "see" and classify images—a feat that comes naturally to humans, bridging the gap between visual perception and machine intelligence. The DNN module in OpenCV has an extensive list of features and applications, including image classification, object detection, segmentation, text detection/recognition, pose estimation, depth estimation, face verification/detection, and person re-identification. This tutorial will discuss an example of using OpenCV to classify a simple image of a tiger.

Tutorial for Image Classification Using OpenCV DNN

To support the above applications, a variety of pre-trained models need to be implemented into OpenCV. Unfortunately, these models are not supported by a single unifying framework, so we need to install additional frameworks to accomplish all these tasks. We need the model weights file and the protobuf to install pre-trained TensorFlow models.pbtxt file to configure the model. To install Torch and PyTorch, all you require are the pre-trained weight files. This tutorial will use a pre-trained model based on the Caffe framework, which has already been trained with a huge data set. This will allow for analysis of a varied range of images. In this tutorial, we'll be utilizing Python, not only because it aligns with the programming language we’ve been using throughout the class, but also due to its versatility, extensive libraries, and utilization in the field of data science and machine learning.

Below is a high-level overview of how the DNN protocol approaches image classification:

  1. Load the class names text file and extract the necessary labels.
  2. Load the neural network
  3. Load the image
  4. Propagate the image through the model and output results.

We will use this image of a tiger and pass it through the model to test the model. For this tutorial, we will load this picture of a tiger and pass it through the model. We expect the model to output a prediction classifying the image with a confidence interval.

example-input-image-of-tiger

This image is a good representative image of a tiger; most individuals would be able to confidently identify it as a tiger. Seeing how confident the neural network is in classifying the image would be a good metric for evaluating the effectiveness and accuracy of the model's predictions on familiar and easily recognizable images.

First, we must import the modules and load the class text files.

import cv2
import numpy as np

with open('../../input/classification_classes_ILSVRC2012.txt', 'r') as f:
   image_net_names = f.read().split('\n')
class_names = [name.split(',')[0] for name in image_net_names] remote 

Now, we have a list of all the pre-trained class names, properly separated into individual strings using the comma (',') as a delimiter. With this list, the model can label the output after processing the image. The next step is to load the pre-trained DNN model, which requires weight files and config files discussed above.

model = cv2.dnn.readNet(model='../../input/DenseNet_121.caffemodel', config='../../input/DenseNet_121.prototxt', framework='Caffe')

The readNet() function from the DNN module takes the weight file, config, and DNN framework we've chosen as function inputs. After loading the DNN framework, the image must be preprocessed and put in the correct file format before pushing it through the neural network.

image = cv2.imread('../../input/image_1.jpg')
blob = cv2.dnn.blobFromImage(image=image, scalefactor=0.01, size=(224, 224), mean=(104, 117, 123))

The blobFromImage() function is used to preprocess images before passing them through a neural network, conveniently encapsulating several preprocessing steps such as mean subtraction, channel swapping, and conversion to blob (a multi-dimensional array that matches the input requirements of the neural network). Note that the image dimensions are now [1,3,224,224], which effectively adds an additional dimension to our input image. This is because most deep learning models expect batch input, as training over multiple samples avoids overfitting to an individual sample. Finally, we propagate the image through the model.

model.setInput(blob)
outputs = model.forward()

Forward propagation of the image through the model requires us to set the input to the image, then the forward() command pushes the image through the model. The 'outputs' variable will contain the predictions for the image processing. At this point, the neural network processing is complete, and the information stored needs to be parsed. We need to extract the highest confidence class name and assign it to the image which is done with the following code block.

final_outputs = outputs[0]
final_outputs = final_outputs.reshape(1000, 1)
label_id = np.argmax(final_outputs)
probs = np.exp(final_outputs) / np.sum(np.exp(final_outputs))
final_prob = np.max(probs) * 100.
out_name = class_names[label_id]
out_text = f"{out_name}, {final_prob:.3f}"
cv2.putText(image, out_text, (25, 50), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 255, 0), 2)
cv2.imshow('Image', image)
cv2.waitKey(0)
cv2.imwrite('result_image.jpg', image)

Our final result should look like this:

image-classification-result

This shows that the Caffe framework can process the image and identify it as a tiger, with around 91% confidence.

Deep Dive into OpenCV DNN Module

The DNN module in OpenCV, while not capable of training models, offers a significant breadth of functionality for applying pre-trained models to real-world tasks. One of the major advantages of using the OpenCV DNN module is its optimization for various hardware platforms, including CPUs, GPUs, and even specialized hardware like Intel's Movidius Neural Compute Stick. This hardware flexibility ensures that models can be deployed efficiently across different devices, from powerful servers to edge devices with limited computational resources.

Image Classification with Different Architectures

While our tutorial used a DenseNet-121 model based on the Caffe framework, OpenCV's DNN module supports a wide variety of other architectures and frameworks. For instance, models such as ResNet, VGG, and MobileNet, which are available in TensorFlow and PyTorch formats, can also be loaded and utilized in OpenCV. This flexibility allows users to choose the most suitable model architecture based on the specific needs of their application. For example, MobileNet models are optimized for mobile and embedded applications, providing a good balance between accuracy and computational efficiency.

Object Detection Using OpenCV DNN

Beyond image classification, the OpenCV DNN module excels in object detection tasks. Pre-trained models like YOLO (You Only Look Once), SSD (Single Shot MultiBox Detector), and Faster R-CNN (Region-based Convolutional Neural Networks) can be integrated into OpenCV for real-time object detection. These models can detect multiple objects within an image, providing bounding boxes and class labels for each detected object. The following example demonstrates how to use a YOLO model for object detection:

model = cv2.dnn.readNetFromDarknet('yolov3

Conclusion

As shown in the tutorial, OpenCV is extremely simple to integrate with pre-trained neural networks to achieve simple image classification tasks. Although OpenCV is limited by untrainable models, a framework that doesn't support training can still be utilized for inferences and more efficient real-time processing.

References

https://learnopencv.com/deep-learning-with-opencvs-dnn-module-a-definitive-guide/

https://learnopencv.com/deep-learning-with-opencvs-dnn-module-a-definitive-guide/

https://pyimagesearch.com/2017/08/21/deep-learning-with-opencv/

https://github.com/opencv/opencv/wiki/Deep-Learning-in-OpenCV

https://opencv.org/blog/learning-opencv/#:~:text=Pre%2Dprocessing%20and%20Augmentation%3A%20Before,both%20preprocessing%20and%20model%20deployment.