SSD - Lab41/attalos GitHub Wiki
Single Shot MultiBox Detector
Single Shot MultiBox Detector (SSD) is a fast method for detecting objects in images using a deep neural network. The network produces thousands of predictions at various scales and aspect ratios before performing non-maximum suppression, resulting in a handful of final tags. The following page provides some links to help in setting up and understanding SSD.
Set Up
Follow the instructions from Wei Liu's SSD Github page to install the necessary packages, prepare the data, and train/evaluate caffe models. This page also contains links to models trained on VOC0712, MSCOCO, and ILSVRC2015.
Make sure $CAFFE_ROOT
is set to your Caffe directory and that $PYTHONPATH
includes $CAFFE_ROOT/python
. We also had to add /opt/conda/bin/python
to the $PYTHONPATH
. Make sure that $CAFFE_ROOT/python
appears first in $PYTHONPATH
, otherwise running ./data/VOC0712/create_data.sh
will not work. If running on a Docker container, you may need to apt-get install -y python-numpy
, and set export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/lib64
to get rid of CUDA errors in running SSD. Though running make pycaffe
is not required, it is recommended, particularly if you're having issues with creating the LMDB files (using ./data/VOC017/create_data.sh
).
Comparing VOC_SSD_300 vs COCO_SSD_500
The paper discusses using the COCO model on their architecture diagram rather than the VOC. The two networks are vastly different, particularly past the VGG-16 layers. Though the COCO model acquires a conv9_2 layer, average pooling still remains at the last layer of the network. The thinking is that this adds on another 'multi-scale' feature map for detection.
![VOC0712 SSD 300x300] (https://github.com/Lab41/attalos/blob/master/analysis/ssd/images/SSD_300_deploy_prototext.png) VOC0712 SSD 300x300 deploy
![MS COCO SSD 500x500] (https://github.com/Lab41/attalos/blob/master/analysis/ssd/images/ssd_coco_500_network.png) COCO SSD 500x500 deploy
The Training Architecture
As discussed on the paper, the training objective is derived from the "Multi-box objective", which has been extended to handle multiple object categories. The loss function is the weighted sum of the confidence loss (conf) and localisation loss (loc) - many of these occur after the VGG-16/pool5 part of the network. The weighted (conf) sum is evaluated at the end of the network, and conv (6-9) layers are averaged just once, which is then fed into a (global) averaging function, also at the end of the network, compared to GoogLeNet's multiple averaging functions (from inception layers).
Using a Trained SSD Model
After following the steps through "Preparation," you can run your own test images through SSD using the Python notebook found in $CAFFE_ROOT\examples\ssd_detect.ipynb
.
Tracking down layer responsible for object detections
SSD attempts to find objects of various sizes and scales using multiple layers, each detecting different objects. After feeding an image through the network, it is not immediately clear which layer is responsible for a high confidence detection. In order to solve this problem, this Python notebook feeds an image forward through the model, then traces back to find the specific layer and features responsible for any high confidence detection. This "high confidence" threshold is tunable, but the network filters down to the top 200 detections after performing non-maximum suppression.
SSD Layer/Label/Shape Statistics for VOC0712
We can also look at which layers produce high confidence predictions for various inputs. This Python notebook runs each image through the network, and produces a couple heat maps for Layer vs Label and Layer vs Object Size. Not surprisingly, earlier layers produce predictions for smaller objects.
It also appears as though some layers respond more strongly to different object types. However, the relation between object type and size in each image has not been explored.
Parsing an SSD LMDB File
This final Python notebook notebook parses an SSD training or testing LMDB file to pull out images by index, along with associated tagged objects and their bounding boxes.
Testing on SSD500 with the MSCOCO dataset
Keep in mind that the tests we've done so far have been on SSD300 for VOC0712. The SSD paper references the architecture for MSCOCO/SSD500.
Drawing a Caffe Network
I found it very helpful to have a graphical representation of SSD's network. Luckily, I found Christopher Bourez's blog which includes a nice tutorial on Caffe. To draw the network, from the command line enter
python $CAFFE_ROOT/python/draw_net.py $CAFFE_ROOT/models/VGGNet/VOC0712/SSD_300x300/test.prototxt my_net.png
The SSD models are too large to display here, but another model from Christopher's blog is shown below.
In order to use draw_net.py
, I had to first install both pydot and graphviz.
conda install pydot
conda install graphviz