Comparison of existing implementation - abergeron/compyte GitHub Wiki

functionality gpu nd array(python interface) Theano CudaNdarray GPUmat GPU(single/double)
backend cuda/opencl cuda cuda
dtype float32 {u}int{8,16,32,64} complex64 (float64 and complex128 possible) float32 float32, complex32, float64, complex64
ndim generic generic generic
memory layout generic generic generic
contiguous transfer to/from gpu Yes Yes Yes
not contiguous transfer to/from gpu copy if needed copy if needed copy if needed
ascontiguousarray Yes No No
asfortranarray Yes No No
copy Yes Yes Yes, clone()
zeros Yes Yes Yes
empty Yes No Yes: GPUsingle();setSize();GPUallocVector()
len Yes Yes Yes: length()
subtensor(var[…]) Yes Yes Yes
subtensor(var[N]) Yes Yes Yes
subtensor(var[strides with step]) Yes Yes Yes
subtensor(var[strides with neg start/stop/step]) Yes Yes Yes
subtensor(var[ tuple with mix of slice, integer and numpy.int64]) Yes Yes No
elemwise generic with dimensions collapsing, mixed dtype as gpu nd array as gpu nd array
elemwise with broadcasting Yes Yes Yes
reduction sum/prod generic for ndim and any combination of reduced axis sum only with this pattern: 1, 11, 10, 01, 001, 010, 100, 110, 011, 111, 0011, 0101, 0111, 1011, 1111, pattern 1+ use only 1 block sum
__setitem__ Yes (with broadcast if necessary) The value must be a CudaNdarray(no broadcasting done). When the destination is c contiguous the value can be 0(memset) or an ndarray(tranfer) Yes: subsasgn(), assign()
reshape Yes (copy when numpy would copy) Yes (copy if not c_contiguous) Yes: setSize(), reshape()
n-dim transpose Yes Yes(can add dim with shape 1 at the same time) No
dot/gemm Yes* Theano op Yes: times(), GPUtimes()
gemv Yes* Theano op ?

It need an external blas, that is included with CUDA. For OpenCL back-end you can use clmath, but clmath support isn’t good on Mac and Windows.

No done but planned in gpu nd array.

ones No Theano op only Yes
subtensor with a list of index var[1,2,3,4] (part of numpy advanced indexing No in a branch Yes: slice(A, {[1,2,3,4]})
reduction (max, min, argmax) No No No
ger No Theano op ?
flatten No(you can use reshape for this) Yes ?
random No Theano op only with our own implementation Yes: GPUrand(), GPUrandn()
join No Theano op ?

Other Theano op: CrossentropySoftmaxArgmax1HotWithBias, CrossentropySoftmax1HotWithBiasDx, Softmax, SoftmaxWithBias, DownsampleFactorMax, GpuImages2Neibs, Dot22SCalar, GpuEye, ErfinvGPU

gnumpy: as_garray, as_garray_or_scalar, as_numpy_array, tile(the same as numpy?), rand, randn, empty, zeros, ones, seed_rand, dot(0d,*d), dot(1d,1d), dot(1d,2d) dot(2d,1d), dot(2d,2d), dot(a1.ndim >= 2, a2.ndim >= 2) with reshape and transpose(transpose done by a loop?), outer, concatenate, where, nonzero, support newaxis?, eye, diagflat, tensordot, reduction(all, any, sum, mean, max, min, (prod and std cpu only)), elemwise(abs, exp, isinf, isnan, log, log_1_plus_exp, logistic, negative, sign, sqrt, tanh, (cpu only: log10)) gnumpy.garray fct: as_numpy_array, astype, ravel(call self.reshape(-1)), item(transfert to cpu), sort(cpu only), reshape_2d, T, transpose, shiftAxesRight, copy, diagflat, diagonal, diag, all_real, isinf, isreal, isnan, isnumber, abs, as_bool, exp, log, log_1_plus_exp, logistic, sigmoid, sign, sqrt, tanh, sum, mean, max, argmax(cpu), argmin(cpu), min, all, any, all2, any2, rand, euclid_norm, dot, where, nonzero, __lt__, gt, le, ge, ne, eq, sub, div, rmul, radd, rsub, rdiv, rpow, pos, neg, iadd, imul, isub, idiv, imod, ipow, len, getitem, iter, __setitem__

⚠️ **GitHub.com Fallback** ⚠️