enhancements typeparameters - cython/cython GitHub Wiki

CEP 506 - Parametrized types

Status: Rejected in favour of a host of other CEPs (specific PEP 3118 buffer support, specific C++ template support, and /enhancements/fusedtypes

Description

Parametrizing types has lots of potential uses:

C++ template compatability

Declare "variable-wide" modifications to how code should be generated for a type. This includes declaring expectations of NumPy arrays as being of a certain type and dimension, or even more obscure parameters like (contrived example warning) declaring in what way a float should be automatically rounded.

Easy implementation of a more Pythonic syntax for arrays and pointers: carray(int, 4) foo, ptr(int) my_ptr

Syntax user side

Type name with parenthesis containing the arguments. Each argument must be either a compile-time expression as documented on the Pyrex website, resolving to a value of a primitive type, or a Cython type.

All arguments are named and have an order (this is forward-compatible with keyword-only arguments etc.), and can be optional or mandatory.

#!python
DEF ND = 4
def foo(numpy.ndarray(float, ND) bar1):
    cdef cpp.map(str, int) my_cppmap

Syntax declarator side

By simply adding new keywords etc liberally one can end up with something like this, although it is only a suggestion. The main point is that it is declarative. Also this is supposed to be specified in the .pxd file-part of the declaration. Any declaration that leads to declaring new C types (ie not simply an "object") can take arguments.

#!python
cdef class Foo:
    cdef int objectvar

    typearguments:
        # Providing defaults make the arguments optional
        cython.type dtype
        int strategy = 0

ctypedef Foo Bar # Bar takes same arguments

Subclasses can only append arguments, not remove or override anything in the parent list. This restriction might be made weaker in time if needed (like it may become possible to change the ordering and make any new arguments come first etc.)

Effect

The effect is that the parser expects any mandatory arguments to be specified or raises an error. The arguments are stored in the type (which will be a subtype of a special "unparametrized" root type) and can be retrieved in different way throughout Cython's compilation process.

Type compatability

Some rules must be made with respect to how it is possible to convert types. These are the default rules, overrides can alwayas be done through overloaded coercion operators. Suggestion:

Conversion "up" towards lesser specification are always ok. That way, foo(a=2, b=3) will be automatically converted to foo(a=2), foo(b=3) and foo (here a and b are both optional parameters).
Conversion "down" towards higher specification must be explicitly cast. While a pure assignment wouldn't be unclear with automatic conversion, this example must also be considered:

   #!python
   def myfunc(foo(a=2,b=4) arg): ...

   cdef foo(a=2) myfoo
   myfoo = ...
   myfunc(myfoo) # we probably want this to produce error, not automatic conversion

Requiring explicit cast seems to be most in line with C++ rules too.
* Coercion from ``object`` is automatic to any parametrized type (and happens by coercion to the base type)

Usecases

Any specific use-case is considered outside the scope of this spec itself; however here are some examples and ideas for usage:

Overloaded methods

Since, in Cython, self is a parameter to member functions, one could by implementing a form of function overloading provide different functionality depending on argument type; while making it perfectly clear that it is operating on instances of the same run-time class.

#!python
cdef class Allocator:
    typearguments:
        int strategy

    def __init__(self, name): self.name = name

    cdef object newobj(Allocator(1) self):
        print "Strategy 1", self.name
        ...

    cdef object newobj(Allocator(2) self):
        print "Strategy 2", self.name
        ...

#!python
>>> cdef Allocator(1) x
>>> x = Allocator("instance A")
>>> x.newobj()
Strategy 1 instance A
>>> cdef Allocatpr(2) y = x
>>> y.newobj()
Strategy 2 instance A

This could also be combined with method templates.

C++ template support

The type arguments would take the role of providing a way to instantiate the templates Cython-side. For outputting C++ code using templates one would need special support for it in the Cython compiler and extra syntax to "use" the type arguments as C++ template arguments.

Discussion

Stefan Behnel: Why not use Py3k type annotations instead of introducing yet another new syntax?

http://www.python.org/dev/peps/pep-3107/

Dag Sverre: The two are seperate things. In fact, if anything this proposal is a prerequisite for a good use of that PEP in Cython. It specifies what constitutes a type, not in what position the type is declared. It would not make sense to try to cram every possible future feature into this spec, and so I use example code with the currently supported syntax - one spec per change, and PEP-3107 belong to another spec (in fact the code lines needed to change for the two won't overlap at all).

I totally agree with making use of PEP-3107 (while keeping the old syntax for backwards compatability).

Hierarchy

Robertwb: Note that there are different levels of type specification. For example, if I have a numpy array, I may know just the type, or the type and dimension, or the type and dimension and size at compile time. We should be able to handle all of these cases. (Essentially, we can think of this as numpy.ndarray(numpy.uint8), numpy.ndarray(numpy.uint8, dim=2), and numpy.ndarray(numpy.uint8, 2, 10) are distinct types.)

Syntax discussion

Parametrized types with () can potentially be confusing. Does numpy.ndarray(2) mean an array with 2 dimensions, or attempting to pass 2 to the ndarray constructor? Depends on context. And what to do if wanting to pass something to the constructor of a parametrized type?

In most cases considered, this is not a problem for Cython, as there is a difference between "type context" and "runtime context". However it could become a problem if for instance C++ templates are wrapped; how would one call the constructor of a C++ vector of ints, specifying 10 elements? cpp.vector(int)(10)?

Also, from a pure usability perspective, perhaps the () is more difficult to learn as it looks like constructor syntax? When using the () syntax in an example for the NumPy community, the initial response was that the numpy.ndarray constructor should take the same arguments as usual (though after explaining about it being in a type context, the person in question were perfectly ok with it).

Options:

Use double (), like this: myarray(int)(4, 4). The first () resolves the type, and the second () goes to the constructor. For empty type parameter list, one still has to call (), i.e. type_with_optional_params()(4,4).
Use [] instead for type arguments. This is in line with disucssions in the Python community on generic types (and comes from Guido blogging using the [] for syntax example. However, this is not an official Python direction, see below). So one could allocate using myarray[int](4,4). A consequence is that cdef variables use [] instead:

cdef numpy.ndarray[numpy.int64, 2] myarr = numpy.ndarray([2,2])

Guido's blog post: http://www.artima.com/weblogs/viewpost.jsp?thread=86641 followed up with http://www.artima.com/weblogs/viewpost.jsp?thread=87182

DagSverreSeljebotn: I must say I very much prefer the latter one -- it is always clear what is a type, what is a call to a constructor, and what it means to declare something of a type vs. calling a type for construction.

robertwb: Hmm... the type(params) seems more natural to me, but in this context it does have issues. The call syntax can accept keywords arguments as valid Python code. On the other hand, we have settled on square brackets for C++ and numpy parameterization.

Implementation

Given that static binding is one of the primary benefits of Cython, it seems clear that we want to implement this via specialization. There are several issues with this.

Code bloat. One problem with C++ templates is that they have to be re-compiled for every type. In Cython this would mean that they would have to be compiled for every used type in every module, and the implementation code would have to sit in the .pxd file. As an alternative, the number of types could be fixed ahead of time, i.e.

{{{

cdef class MyWrapper[T]:: T x cdef T get() cdef void set(T)

# These will be compiled as part of the module MyWrapper[int] MyWrapper[void*] MyWrapper[object]

Whole-program compilation may help here as well. If each module does compile in each specialization, it may make sense to have a global dict to re-use types across modules. Subclasses of object could all be (mostly?) the same underlying type.

Runtime-generated types? This may be possible for Python types. They could be accessed via __getitem__ on the "generic type" object.

Work so far

A primitive patch can be downloaded on http://heim.ifi.uio.no/dagss/cython-typeargs1.diff , it does some small change to the parser etc:

Pulls in type arguments after types (for any and all types), and set the typeargs attribute of the corresponding CSimpleBaseTypeNode in the parse tree.
In the type inference phase, the default behaviour of CSimpleBaseTypeNode is to complain loudly if the argument list isn't empty (so that a compiler error will be given on the above example, because int doesn't take arguments.

Known issues:

Note: This only does the parsing and leaves the info in the parse tree -- probably, the type hierarchy in PyrexTypes.py should also be modified to export the parameters. This should be done in the type inference phase of CSimpleBaseTypeNode.
There are syntax errors in the existing code base that I don't have time to fix that come into play now and then.