Use SSD and CNN to Improve Performance - myzwisc/CS766-Project GitHub Wiki
We combine SSD and CNN to achieve improvement compared to Faster RCNN. Basically we use a CNN layer as follows:

The dataset is till human images, but is a little different. Now our training set contains 428 images, and we use 55 images for validation. The last 55 images are used for testing. The total number of classes is 42 instead of 3, so the task is in general much more difficult than the previous one. Some image examples are shown below.

The experimental results is as follows. On validation set and test set, we achieve 82% classification accuracy. The processing speed of our system is 110 faces/0.2s, which is much faster than simply using Faster RCNN.

The experiment on the video is in the following link: https://www.youtube.com/watch?v=cfmeJWfjWuU
We conclude that our method is fast enough in doing real-time detection, can handle more difficult classification tasks, and also is often more accurate. The drawbacks, however, is that our method not an end-to-end model, and is hard to detect small objects.