GPU Benchmarks - ProGamerGov/neural-style-pt GitHub Wiki
The timing modification of neural_style_time.py is used to accurately track the time it takes to complete style transfer.
The following parameters are used to generate the timing data:
python3 neural_style_time.py -backend nn -optimizer lbfgs -num_iterations 500 -print_iter 0
python3 neural_style_time.py -backend nn -optimizer adam -num_iterations 500 -print_iter 0
python3 neural_style_time.py -backend cudnn -optimizer lbfgs -num_iterations 500 -print_iter 0
python3 neural_style_time.py -backend cudnn -optimizer adam -num_iterations 500 -print_iter 0
python3 neural_style_time.py -backend cudnn -cudnn_autotune -optimizer lbfgs -num_iterations 500 -print_iter 0
python3 neural_style_time.py -backend cudnn -cudnn_autotune -optimizer adam -num_iterations 500 -print_iter 0
- Each test is run 3 times, and then the average of those 3 runs if rounded to the nearest second.
Speed can vary a lot depending on the backend and the optimizer.
Here are some times for running 500 iterations with -image_size=512
on a Tesla K80 with different settings:
-backend nn -optimizer lbfgs
: 117 seconds-backend nn -optimizer adam
: 100 seconds-backend cudnn -optimizer lbfgs
: 124 seconds-backend cudnn -optimizer adam
: 107 seconds-backend cudnn -cudnn_autotune -optimizer lbfgs
: 109 seconds-backend cudnn -cudnn_autotune -optimizer adam
: 91 seconds
Here are the same benchmarks on a GTX 1080:
-backend nn -optimizer lbfgs
: 56 seconds-backend nn -optimizer adam
: 38 seconds-backend cudnn -optimizer lbfgs
: 40 seconds-backend cudnn -optimizer adam
: 40 seconds-backend cudnn -cudnn_autotune -optimizer lbfgs
: 23 seconds-backend cudnn -cudnn_autotune -optimizer adam
: 24 seconds
Here are the same benchmarks on a NVIDIA GRID K520:
-backend nn -optimizer lbfgs
: 236 seconds-backend nn -optimizer adam
: 209 seconds-backend cudnn -optimizer lbfgs
: 226 seconds-backend cudnn -optimizer adam
: 200 seconds-backend cudnn -cudnn_autotune -optimizer lbfgs
: 226 seconds-backend cudnn -cudnn_autotune -optimizer adam
: 200 seconds
Here are the same benchmarks on a Tesla T4 with different settings:
-backend nn -optimizer lbfgs
: 72 seconds-backend nn -optimizer adam
: 66 seconds-backend cudnn -optimizer lbfgs
: 48 seconds-backend cudnn -optimizer adam
: 40 seconds-backend cudnn -cudnn_autotune -optimizer lbfgs
: 51 seconds-backend cudnn -cudnn_autotune -optimizer adam
: 43 seconds
Here are the same benchmarks on a Tesla P100-PCIE-16GB with different settings:
-backend nn -optimizer lbfgs
: 61 seconds-backend nn -optimizer adam
: 47 seconds-backend cudnn -optimizer lbfgs
: 37 seconds-backend cudnn -optimizer adam
: 23 seconds-backend cudnn -cudnn_autotune -optimizer lbfgs
: 39 seconds-backend cudnn -cudnn_autotune -optimizer adam
: 25 seconds