codeprojects - cython/cython GitHub Wiki

Possible coding projects within Cython

<<TableOfContents>>

This is a list of possible larger and reasonably isolated projects within Cython. These could be done either as a summer project (e.g. Google Summer of Code) or perhaps an undergraduate project or master thesis. Please contact the developer mailing list if you are interested. At the University of Oslo, Norway, Simula Research Laboratory could potentially host such a master project (contact Dag Seljebotn ([email protected]) for more information about that).

All of these projects indicate a general area of work. For each project it should be possible to break off in the middle and still have something that would be genuinely useful (if the development process is done right). Also, further improvements beyond what is described should be possible on all projects. In other words, the amount of work can be adjusted to suit your needs.

About Cython: Cython compiles Python scripts into C. However it does not convert Python code to a pure C program, instead it generates C code that makes full use of the CPython libraries and runtime environment. Writing code using Cython is a bit like writing in Python and C at the same time -- most of Python is available like usual, but in addition one can call C functions directly, or type variables for extra speed (for some loop-intensive examples a 500x-1000x speed increase over pure Python is not unheard of). The long-term goal for Cython is 100% Python compatability, while keeping some extra syntax which can be used to gain speed or call external C/C++(/Fortran) libraries. Cython has an ever-growing userbase (some projects using Cython).

Project: Control flow analysis

What can we know about the possible values of a variable at a given point in a program? If Cython can know that it is impossible for a variable to be None in a given situation one can safely drop a check to see if it is; likewise, checks can be safely dropped if it is known that a given array access will always be within the array bounds.

This project has big consequences for the user-friendliness of Cython. Currently, one must make a choice between safety and speed, between checked access and fast access. This project would give the best of both worlds: One could insert checks for safety, but avoid doing expensive redundant checking, making the "safe mode" potentially almost as quick as the fast mode. Once this is achieved, one could also look closer at other kinds of optimizations which can come from the same kind of program analysis; like removing unnecessary reference counting, or allocating objects on the stack if references to them are never given away.

This project is mostly algorithmic in nature, and can be implemented as a "black box" with less interaction with other parts of the Cython code base than the other projects. One could tackle a few simple cases first (and simply say "not known" in the more complex cases). Then one can gradually improve the algorithm to handle more and more cases.

Project: Improve Python compatability

Make Cython as close as possible to 100% Python compatible. The list of tasks includes but is not limited to:
  • Inner functions (this is partially started already)
  • Generators (we know how it can be done)
  • Proper Python scoping, if possible (at least consider if it can be done)

Project: More numerical features

There's still a lot of room for improvement for numerical programming. This project would consist of a collection of different items, with an overall aim on improving the end-user experience for users from a numerical Python background. This would continue on the work already done for numerical programming in Cython.

Examples:
  • Numerical looping utilities: Make higher-level constructs for looping through arrays in a way that is as numerically efficient as possible -- in a way that doesn't depend on the number of dimensions in the array. ndenumerate, zip and so on can serve as examples.
  • Implement support for NumPy array slicing
  • Currently there is a Python call overhead to all functions within the NumPy and SciPy libraries -- eliminate this overhead for parts of these libraries. Some can be done immediately, while some requires agreement with the NumPy and SciPy projects about conventions etc.
  • Perhaps Blitz++-like features: Turn code like i,j = indices(2); A[i,j] = B[i] * C[j] * i * j into a double for-loop calculating all elements of A. (Must also be voted on by the Cython community first.)

Project: Enabling parallelism in C space

There are several ways to enable parallel code execution below the thread level. OpenMP is an example, parallel operations on arrays are another. This can either be enabled by explicit syntax in the code, by special variable typing constructs, or by exploiting some kind of "obvious" parallelism in the code.

There may be space for a more friendly and "Fortran-like" syntax for working with arrays, where arrays are a first-class primitive in the language rather than just an optimization of Python syntax. There is also a separate Wiki page about parallel execution syntax proposals in a more general context.

Any syntax changes must be voted on by the Cython community first.

Further ideas

The enhancement specification list also contains some ideas. For instance metaprogramming and type-generic programming might make interesting projects. Also, getting Python compatability to 100% is always a stated goal.

⚠️ **GitHub.com Fallback** ⚠️