DNN Efficiency - alalek/opencv GitHub Wiki
Configuration:
OS: Linux 4.8.0-34-generic x86_64
Compiler: gcc 5.4.0
CPU: Intel® Core™ i7-6700K CPU @ 4.00GHz x 8
GPU: Intel® HD Graphics 530 (Skylake GT2)
The best observed median time of single image forward pass (in milliseconds):
CPU
All calculations are done in float32.
Model | DNN, C++ | DNN, Halide | Intel-Caffe, MKLDNN | TensorFlow | Torch w. MKL |
---|---|---|---|---|---|
AlexNet | 14.52 | 22.31 | 11.95 | ||
GoogLeNet | 17.37 | 32.43 | 9.43 | ||
ResNet-50 | 40.01 | 76.13 | 22.75 | ||
SqueezeNet v1.1 | 4.68 | 6.61 | 3.05 | ||
Inception-5h | 19.30 | 35.27 | 14.6 | ||
ENet @ 512x256 | 65.93 | 42.16 | 226 | ||
OpenFace (nn4.small2) | 4.20 | 8.14 | 25.44 | ||
MobileNet-SSD @ 300x30020 classes, Caffe | 22.71 | 54.36 | 27.79 | ||
MobileNet-SSD @ 300x30090 classes, TensorFlow | 25.15 | 60.95 | 35.86 |
GPU (OpenCL 2.0):
All computations in float-32.
Model | DNN, OpenCL backend | DNN, Halide | clCaffe, MKL |
---|---|---|---|
AlexNet | 15.81 | 48.45 | 15.16 |
GoogLeNet | 20.59 | 89.53 | 19.56 |
ResNet-50 | 37.19 | 183.67 | 63.26 |
SqueezeNet v1.1 | 6.50 | 15.7 | 6.05 |
Inception-5h | 22.68 | 92.33 | |
ENet @ 512x256 | 34.89 | 48.92 | |
OpenFace (nn4.small2) | 10.55 | 37.59 | |
MobileNet-SSD @ 300x30020 classes, Caffe | 172.13 (before #10341)26.66 (with #10341) | 100.31 | 369.91 |
MobileNet-SSD @ 300x30090 classes, TensorFlow | 203.47 (before #10341)45.11 (with #10341) | 93.34 |
Scripts
TensorFlow
import numpy as np
import tensorflow as tf
import time
with tf.gfile.FastGFile('opencv_extra/testdata/dnn/ssd_mobilenet_v1_coco.pb') as f:
graph_def = tf.GraphDef()
graph_def.ParseFromString(f.read())
with tf.Session() as sess:
sess.graph.as_default()
tf.import_graph_def(graph_def, name='')
# Generate input
np.random.seed(2701)
inp = np.random.standard_normal([1, 300, 300, 3]).astype(np.float32)
# Get output tensor
outTensors = [sess.graph.get_tensor_by_name('num_detections:0'),
sess.graph.get_tensor_by_name('detection_scores:0'),
sess.graph.get_tensor_by_name('detection_boxes:0'),
sess.graph.get_tensor_by_name('detection_classes:0')]
def run():
out = sess.run(outTensors, feed_dict={'image_tensor:0': inp})
# Warm up
for _ in range(3):
run()
# Measure
N = 10
start = time.time()
for _ in range(N):
run()
print 1e+3 * (time.time() - start) / N
Torch
require 'nn'
require 'dpnn'
require 'image'
torch.setdefaulttensortype('torch.FloatTensor')
net = torch.load('opencv_extra/testdata/dnn/openface_nn4.small2.v1.t7')
input = torch.FloatTensor(torch.LongStorage({1, 3, 96, 96}))
net:evaluate()
-- Warm up
for i = 1,3 do
output = net:forward(input)
end
N = 10
timer = torch.Timer()
start = timer:time().real
for i = 1,N do
output = net:forward(input)
end
print(1000 * (timer:time().real - start) / N)
References
- OpenCV's deep learning module, https://github.com/opencv/opencv/tree/master/modules/dnn.
- Intel-Caffe, https://github.com/intel/caffe.
- clCaffe, https://github.com/01org/caffe.
- TensorFlow, https://www.tensorflow.org/.
- Torch, http://torch.ch/.
- Halide, http://halide-lang.org/.