Home - abergeron/compyte GitHub Wiki
Goal of compyte
Make a common GPU ndarray(matrix/tensor or n dimensions) that can be reused by all projects.
Mailing list
- Development/user mailing list: http://lists.tiker.net/listinfo/gpundarray
- Announce mailing list(low volume): http://lists.tiker.net/listinfo/gpundarray-announce
Comparison of existing implementation
Branch
- The current development that include a real C back-end and full support for OpenCL is in this branch: https://github.com/abergeron/compyte/tree/reorg
Motivation
- Currently there are at least 6 different gpu arrays in python
- CudaNdarray(Theano), GPUArray(pycuda), CUDAMatrix(cudamat), GPUArray(pyopencl), Clyther, Copperhead, ...
- There are even more if we include other languages.
- They are incompatible
- None have the same properties and interface
- All of them are a subset of numpy.ndarray on the gpu!
Lack of Standard Creates Problems:
- Duplicates work
- GPU code is harder/slower to do correctly and fast than on the CPU/python
- Harder to port/reuse code
- Harder to find/distribute code
- Divides development work
Pitfalls to Avoid
- Start alone
- We need different people/groups to "adopt" the new GpuNdArray
- Too simple - other projects won't adopt
- Too general - other projects will implement "light" versions... and not adopt
- Having an easy way to convert/check conditions as numpy could alleviate this.
The preferred option is to have a general version with easy check/conversion to allow supporting only a subset!
Design Goals
- Make it VERY similar to numpy.ndarray
- Easier to attract other people from python community
- Have the base object in C to allow collaboration with more projects.
- We want people from C, C++, ruby, R, ... all use the same base Gpu ndarray.
- Be compatible with CUDA and OpenCL
Current behavior not wanted
- No CPU code generated from the python interface (for PyOpenCL and PyCUDA). Gpu code is OK.
Implementation plan
All of the basic C code is done. Currently working on elementwise functionality in prevision of a PyOpenCL/PyCUDA integration.
Sketch of the file structure and the reasoning behind it
This section will detail the file structure and give you a hint of what to expect if you intent on shipping a project integrating this code. Also this applies to the code in the reorg branch which will become the mainline soon. It is located here: http://github.com/abergeron/compyte/tree/reorg
Some of these files are not in the repository yet, which means that this functionality is being worked on.
The main files are:
- ndarray/compyte_buffer.h:
- Defines the base compyte_buffer object
- Also defines the structure for GpuArray and GpuKernel
- ndarray/compyte_buffer_cuda.c:
- Implements the CUDA version of the compyte_buffer API
- ndarray/compyte_buffer_opencl.c:
- Implements the OpenCL version of the compyte_buffer API
- ndarray/pygpu_ndarray.pyx
- Define a Cython wrapper that exposes the GpuArray object and a couple of function to mimic the interface of numpy.ndarray
- elemwise.py:
- Support running arbitrary elementwise kernels on GpuArray of arbitrary memory layout (python-only).
These files serve as support for the functionality above:
- ndarray/compyte_types.{c,h}:
- generated by ndarray/gen_types.py
- serve as a type table for operations that need to know some information about types involved
- ndarray/compyte_util.{c,h}:
- some generally useful functions that don't really fit anywhere else.
- ndarray/setup.py:
- Builds the python module implemented in pygpu_ndarray.pyx along with all the supporting code
These files serve for portability (mainly to support windows):
- ndarray/compyte_compat.h
- ndarray/compyte_mkstemp.c
- ndarray/compyte_strl.c
- ndarray/wincompat/*
Some tests for the python interface (that also test the underlying C code):
- ndarray/test_gpu_ndarray.py (test basic functionality: init, copy, indexing, ...)
- tests/test_elemwise.py (test that the numpy-like elemwise operations on array work correctly)
Some gotchas and differences from numpy
- We have the updateifcopy flag as numpy, but it is always False and we expect it is False.
- Buffer offsets (like what is generated when you do a[1:3]), are only partially supported under OpenCL 1.0. You cannot run kernels on them without copying them beforehand.