Experiments on Faster RCNN - myzwisc/CS766-Project GitHub Wiki

Our dataset for Faster RCNN contains 360 images of humans, which has three different classes - "zyw", "myz", "lmt". We use 300 of them for training and the rest 60 for testing. The dataset is taken from different perspectives, different posts, different overlapping by other desired (person) and non-desired (paper) objects.

We trained our faster-RCNN model from scratch on a NVIDIA GTX1060 GPU. After training for 14 hours, the total loss of this model is as the following figure shows. Then we tested our model on a video. The test running speed is 7 FPS ( processing 417 frames in 59 seconds) on NVIDIA GTX1060 GPU. The result is as the following figure shows.

Also see https://www.youtube.com/watch?v=6Ost1_BSwAk for the performance of Faster R-CNN on the video.

Our own faster-RCNN model has a good capability to simultaneously detect and recognize faces of certain persons no matter the face is intact or overlapped by other objects. However, some performance are needed to improve. According the video result showed above, we can tell that a small number of faces are recognized as other labels. That’s because the face recognition problem is quite different with gen- eral object recognition. There are less differences between different person faces than that between different objects. For example, to some extent, people faces look similar but cars are quite different from cats. So, much more faces data are needed in faces recognition problem , like WIDER FACE dataset which has labels 393,703 faces. To sum up, if we want to improve our recognition precision, we need more face images of certain persons. Another thing is the running speed. Compared with the 60 FPS of SSD, 7 FPS of faster-RCNN is more lower. So our next step is to try SSD model and use it to train our own face dataset.