YOLOv3 - person-in-hangang/HanRiver GitHub Wiki

reference https://pjreddie.com/darknet/yolo/

YOLO: Real-Time Object Detection

You only look once (YOLO) is a state-of-the-art, real-time object detection system. On a Pascal Titan X it processes images at 30 FPS and has a mAP of 57.9% on COCO test-dev.

Comparison to Other Detectors

YOLOv3 is extremely fast and accurate. In mAP measured at .5 IOU YOLOv3 is on par with Focal Loss but about 4x faster. Moreover, you can easily tradeoff between speed and accuracy simply by changing the size of the model, no retraining required!

How It Works

Prior detection systems repurpose classifiers or localizers to perform detection. They apply the model to an image at multiple locations and scales. High scoring regions of the image are considered detections.

We use a totally different approach. We apply a single neural network to the full image. This network divides the image into regions and predicts bounding boxes and probabilities for each region. These bounding boxes are weighted by the predicted probabilities.

Our model has several advantages over classifier-based systems. It looks at the whole image at test time so its predictions are informed by global context in the image. It also makes predictions with a single network evaluation unlike systems like R-CNN which require thousands for a single image. This makes it extremely fast, more than 1000x faster than R-CNN and 100x faster than Fast R-CNN. See our paper for more details on the full system.

YOLOv3 Architecture

YOLOv3 is similar in structure to the typical FPN above.

The left side of the figure above is a typical SSD-like structure, with feature maps from the front of feature extract lacking expression.
To compensate for this, expand the size of the feature map back to deconvolution, as shown on the right, to pull out the high-level feature.
Used by concatting feature maps on the left and right (such as importing location information from the left) to improve expression

How to load in Android

Load convolutional net from *.cfg and *.weights files and read labels name (COCO Dataset) in assets folder when calls onCameraViewStarted() using Dnn.readNetFromDarknet(String path_cfg, String path_weights). NOTE: this repo doesn't contain weights file. You have to download it from YOLO site.

In Android

    @Override
    public void onCameraViewStarted(int width, int height) {

        String modelConfiguration = getAssetsFile("yolov3-tiny.cfg", this);
        String modelWeights = getAssetsFile("yolov3-tiny.weights", this); 
        net = Dnn.readNetFromDarknet(modelConfiguration, modelWeights); 
    }