Increment_2 - acvc279/Python-Project GitHub Wiki

Data Description

We have already pretrained weights was trained in COCO dataset. It is a large-scale object detection, segmentation, and captioning dataset with several features. COCO is a database that aims to facilitate future research in the areas of object detection, instance segmentation, image captioning, and individual key points localization. This means we can detect and recognize 80 different kind of common everyday objects by using this pretrained model “resnet50_coco_best_v2.1.0.h5”.

RetinaNet Architecture

In this architecture they are mainly four components. • Bottom-up Pathway: The backbone network for example, ResNet that calculates feature maps at various scales, regardless of the size of the input image or the backbone. To brief, The ResNet is used to construct the bottom-up pathway. It is made up of many no. of convolution modules that each have a many number of convolution layers. When we go up, the spatial dimension shrinks by half and each convolution module's output is used in the top-down pathway. • Top-down pathway and Lateral connections: The top-down pathway upsamples the spatially smaller feature maps from higher pyramid levels, while the lateral connections combine top-down and bottom-up layers of the same spatial size. To be noted that the resolution of the semantic information will restores by doing top-down pathway and with the help of lateral connections we can get a precise information of an object. • Classification subnetwork: It predicts that the object is being present at spatial location for anchor box, object box. • Regression subnetwork: It regresses the offset for each ground-truth object's bounding boxes from the anchor boxes.

Brief about Resnet50

Resnet is a residual network which is a one of the type of Neural network that was introduced in the year of 2015 which was first used in the paper of “Deep residual learning for image recognition” and then it was a huge success. Resnet network inspired by VGG-19 which uses 34 layer and also the short-cut connection is added. Those short-cut connections convert it into residual network. When we solve the complex problems, in deep neural network we add more layers to improve accuracy but after the threshold limit crosses they is no increase in accuracy and found the performance degrade for adding more layers on the top of the network. Here we can see the difference between the plain deep neural network and Resnet below:

Input: At first, we are going to give video frame or image.

ImageAI: It is a Computer vision library. To perform Video or Image Object Detection and Extraction, ImageAI provides a set of classes and functions that are both efficient and simple to use. ImageAI allows you to perform deep learning algorithm like RetinaNet. With this algorithm we can analyse images and detection tasks that we can run. ImageAI is a Python library that allows developers, researchers, and students to create self-contained Deep Learning and Computer Vision applications and systems with just a few lines of code.

We imported cv2 for solving computer vision problems.

We imported os for interacting with our os.

Get the current working directory and initialize to the variable path.

Create an instance of class ObjectDetection

Set the retinaNet model which we downloaded

Accepts the string which is in Execution path to the model file

From the path model must be loaded through this function which specified the above functioncall

From the above code here is the output which prints the path.

That's the function that detects objects after the model has been loaded. It can be used multiple times to find objects in a variety of frames. Predicting bound box and object class:

output: Here is the output which detects the objects with bounding boxes and also gives the detects object with probability and the box points.