Converting to the new gpu back end(gpuarray) - Theano/Theano GitHub Wiki

This page describes how to use the new gpu back-end instead of the current/old one.

Installation:

We strongly recommend that you use conda/anaconda to install Theano and pygpu, especially on Windows.
Both are available with conda conda install theano pygpu.
RECOMMENDED: You can install the latest beta, release candidate or release like this conda install -c mila-udem -c mila-udem/label/pre theano pygpu

Windows cleanup:

Remove any previous install of gcc/mingw.
Remove Visual C++ for python (or any MSVC that you installed for Theano).
Remove previous installs of Theano and Python.

Note that we only support clean install with conda/anaconda on windows. You are welcome to try another configuration, but we won't help you make it work.

Code changes:

If you use: conv3d2d, dnn_conv3d, replace with the new 3d abstract conv: theano.tensor.nnet.conv3d()
If you use: dnn_conv2d: replace with the 2d abstract conv: theano.tensor.nnet.conv2d()
If you use: dnn_pool: replace with the new 3d pooling interface. (It wasn't useful for 2d pooling): theano.tensor.signal.pool.pool_{2d,3d}
If you use: dnn_batch_normalization_train() or dnn_batch_normalization_test(), use theano.tensor.nnet.bn.batch_normalization_{train,test} instead.
grep for "import theano.sandbox.cuda" in your files.
- If you find such import, they will need to be converted. In many cases, you can stop using a gpu interface and use the CPU interface. This will make your code work with both CPU and GPU back-ends.
- Now all convolution are available on the CPU
- Now all pooling are available on the CPU
- If there are others, check in the CPU interface, otherwise, you can probably change theano.sandbox.cuda to theano.gpuarray

Config changes:

The following Theano config keys sections don't have any effect on the new backend and should be removed:
- nvcc.*
- cuda.root
- lib.cnmem (replace by gpuarray.preallocate) Important: The default changed to be faster, but cause more memory fragmentation. To keep the speed and remove the fragmentation, use the flag gpuarray.preallocate=1 (or any value greater then 0, see the dot. To have the old default of Theano, use the flag: gpuarray.preallocate=-1

Safety checks:

Check that it still trains and has the same speed (we don't expect problems, but it is better to be safe!)

What to expect?

Maybe a small run time speed up (0-10%).
Maybe a run time slow down if you are using one of the op not yet ported (we did 98+%)
A compilation speed up.
Support for multiple dtypes including float16 for many ops.
cuDNN RNN wrapper (It need that you use it manually)
float16 for storage (computation in float32 for now, so work even on non Pascal GPU) See https://github.com/Theano/Theano/issues/2908 for exact status.