GSoC2017 - Theano/Theano GitHub Wiki

This is a list of ideas for the Google Summer of Code 2017. You can have other good ideas. In all cases discuss them on theano-dev mailing list to make you are known to our community and to understand them better. This is important for your application; it should demonstrate that you understand what is needed to do.

We will try to participate through an umbrella organization, probably the Python Software Fundation. For more information on how to apply, you should read this Python SoC2017). They request that you have one PR merged for candidate student. You can look at ticket marted as easy fix.

The current potential mentors are Frédéric Bastien, Pascal Lamblin, Arnaud Bergeron and Gijs van Tulder.

Theano Organization

Theano is a software library written in Python, C and CUDA (we also have a start of an OpenCL back-end). It is a domain-specific compiler. This means that a user writes Python code to build a computation graph and asks Theano to compile it. Theano optimizes the graph and generates mainly C and/or CUDA code to perform the computation efficiently. It mainly supports computation on numpy.ndarray, but Theano also supports sparse matrices.

Theano is mostly developed by the MILA institute, that works in machine learning, but Theano isn't restricted to machine learning; its capacity for optimizing computation makes it useful for many applications that rely on large amounts of numerical computation. There are also many Theano contributors outside of the MILA.

As you probably know, deep learning is changing the world and Theano is one of the main libraries that support this field!

Contacting us

The main communication channel with developers is via our developer mailing list: theano-dev, please use it for GSoC related questions or discussion of projects.

We mainly reside in the Eastern Standard Time zone so you will usually receive replies faster during our work day. Some of us, however, frequently work outside normal work hours or reside in other time zones.

For student in GSoC, we prefer that discussions stay public as much as possible. The mailing list or github is great for that. But for more interactive discussion, other means can be used. Last year, we used g-chat. This needs to be discussed with mentors.

Highlighted ideas:

  • Faster optimization phase during compilation 2494

    • Difficulty: Easy/medium
    • Skill needed: only Python code. Understanding of algorithm complexity (O(1) vs O(n)) useful. Some experience with Theano useful to start faster.
    • Problem: The Theano compilation phase optimization, is slow for big graph. This make Theano hard to use with big graph, especially when the user is developing its model.
    • ticket Ask on the mailing list user for slow case. Then profile them and find/fix the bottlenecks.
    • This can work as all the problems found up to now for slow case are due to how the optimization are implemented or the optimization order or the algorithm or how they got applied. Only real use case can reveal the real bottleneck.
    • Change algo to better scale.
      • For example, use more strict algo instead of cycle detection for inplace opt.
    • Mentors: Frédéric and Arnaud
    • Note: some work was done last year during GSoC, but more work needed.
  • Faster linker phase during compilation

    • Difficulty: medium
    • Skill needed: Python and C code.
    • Problem: The first time we compile a Theano function, we compile many C shared library. This is time-consuming. As we cache them, it is less of a problem for later call, but as it still can take ~1h in some case, an upgrade there would be very useful.
    • Compilation
      • Compile thunks in parallel with python threads. This mainly help one job with empty cache. https://github.com/Theano/Theano/issues/3340
      • A lock-less compilation cache, or partial locks, to enable compilation in parallel using multiple processes. This mainly help many concurrent jobs started about the same time with empty cache. We could swap the random directory name by something using the hard part of the key. This would make deterministic directory name, so we could more easily see if someone else already started compiling that unit.
      • Compile not cached thunk from the same Theano function inside one module.
    • One way is to compile fewer shared module (Currently we compile about 5k Python module for Theano tests)
      • Using the new mechanism from https://github.com/Theano/Theano/pull/5612 that will be finished before the GSoC is going to help this issue.
      • Check the content of the Theano cache for Theano tests. Then find way to combine many case together. Some cases found.
        • The indexing operation could be more generic without loosing execution speed.
        • CuDNN descriptor should not use attribute, but inputs of the ops.
        • CuDNN conv and pool could use op param instead of hardcoding some props.
      • (Started) Make elemwise c code generate code for many dtypes at the same times. Elemwise is the op with the highest number of generated c code.
    • Mentors: Frédéric, Pascal or Arnaud
  • Include more operations from optimized GPU libraries

    • Difficulty: medium
    • Skills needed: Python and CUDA
    • Problem:
      • CuSparse could also be wrapped for faster operations
      • CuSolver could also be wrapped for faster operations (gh-3027 for interface and gh-3028 for GPU operations)
      • CTC: Baidu released open source code. There is a wraper started outside Theano, but it isn't finished. We should complete it and more it inside Theano.
      • Spatial transformer Ops (from cudnn) https://github.com/Theano/Theano/issues/5622
    • Mentors: Frédéric, Arnaud, Pascal
  • reviving the meta-optimizer
  • More convolution operation

  • Better handling of large graph

    • Difficulty: medium/hard
    • Skills needed: Python, algorithmic understanding (graph traversal, asymptotic complexity)
    • Problem: Theano has trouble handling large graphs (large number of nodes) and deep graphs (long chain between inputs and outputs), which can lead to crashes or long compilation times.
      • One issue is the use of recursive algorithms for graph traversal, which make Theano reach the Python stack limit. They are used in particular when computing gradients, when cloning the graph, maybe during the optimization phase as well. https://github.com/Theano/Theano/issues/3607
      • Another issue is that Python's pickle also uses a recursive algorithm to serialize and de-serialize objects. We should investigate alternatives. Maybe fixed in Python 3.
      • Also, the time spent optimizing graphs does scale supra-linearly in the number of nodes when using the full optimizer (fast_run). This is something to investigate further: can we reduce that by cutting some optimization phase? How could we organize optimizations differently?
    • Mentors: Frederic, Pascal
  • Lower peak memory usage

    • Difficulty: medium
    • Skill needed: Python, algorithmic understanding (O(1) vs O(n))
    • Problem: We can't run some models on the GPU as the GPU don't have enough memory.
      • We need a mechanism in the VM that allows it to remove during execution some temporary variable and let them be recomputed later only when needed. The recomputation is partially done with the current lazy evaluation mechanism. Then when we miss memory on the GPU, we can callback to that.
    • Problem: We currently compare the peak memory usage we currently have again the min theoretic peak. But the computation of the min theoretic peak is too slow for many basic case. So speed it up. We have see in some case that we use more then the min theoretic peak. So it would be useful to know in more normal case if we got hit by that. https://github.com/Theano/Theano/issues/2111
      • As that algo is too slow, we can try approximations during profiling: random search and a fast algo that give the right result when the graph is a tree (this would be an approximation as Theano graph are DAGs)
      • After comparing those 2 algo on during profiling, make Theano function use them. This will request that the user pass expected shapes for the inputs.
    • Mentors: Frédéric, Pascal and Arnaud
  • Lower Theano function call overhead

    • Difficulty: easy
    • Skill needed: Python and C code.
    • Problem: Each call to a function compiled by Theano have some overhead. If the graph does not contain much compututaion (like if it works on scalar) this can be significant.
    • Create a Filter Op and reuse it to move the logic that validate/convert the input in the Theano graph instead of the wrapping Python code.
      • Create c code of this Op to remove Python overhead
    • Split the Python code when calling a Python version in 2 layers: one with a fixed number of inputs without any keyword arguments and default values and one with those.
    • Move elemwise computation on numpy 0d array to theano scalar op(that represent the c type, so no object allocation)
    • Disable garbage collector for numpy 0d array?
    • Mentors: Frédéric, Arnaud

Ideas with a lower priority

  • Generate a shared library (a proof of concept is available as a starting point)

    • Difficulty: medium
    • Skill needed: Python and C.
    • Problem: It would be very useful to generate a shared library from a Theano function. This would allow to reuse it in other program and on embedded systems more easily.
    • Bring the prototype to a working version without adding new feature.
    • Document it.
    • Add support for scalar constant value in the graph.
    • Make a configuration option to enable/disable GC of intermediate results.
    • Make an interface to support shared variables.
    • (If time permits) To make it work on Windows, we need to back-port some c code that use C99 features.
    • Mentors: Frédéric Bastien, Arnaud and Pascal
  • Add more linear algebra operation here, here and here

    • Difficulty: easy
    • Skill needed: Python
    • Problem: There is still many operation in numpy.* that we do not have under theano.tensor.* We have request for some of them from time to time. We should provide those operations. We should also implement the infer_shape and grad method when possible.
    • Mentors: Frédéric, Arnaud and Pascal.
  • Bridge Theano with other compiler and library (Dask, Numba, Cython, Parakeet, ...)

    • Difficulty: medium
    • Skill needed: Python mostly, but knowing C would help some of them.
    • Problem: There is many other system that have very optimized code for some case or allow to generate faster code then Python code more easily then writing C code (like Numba, Cython). Making it easier to use them with Theano would be very valuable.
    • Update the compilation system to compile other library more easily by reusing the Theano compilation cache mechanism.
    • Make an easy to use interface to use Cython with Theano. We currently do it manually for the Scan op.
    • Make an easy to use interface to reuse Numba (we provide just an example for now)
    • Make Theano use the C interface of a Numba function
    • Mentors: Frédéric, Arnaud and Pascal.
  • An example for Android

    • Difficulty: medium
    • Skill needed: Python, C. Knowing Android would help.
    • 2 possible cases:
      • Full Theano with dynamic compilation
      • Only the dynamic DLL from the point above. This could need only the first part of above.
    • Mentors: Frédéric
  • OpenCL

    • Difficulty: medium.
    • Skill needed: Python and C. Understanding parallel computation a must. Knowing CUDA and/or OpenCL a plus.
    • Continue ongoing work in development branch to build OpenCL support
    • Port current CUDA implementations to our format that support CUDA and OpenCL
    • Add OpenCL implementations for unsupported expression types
    • Tune existing OpenCL kernels for various operations
    • Mentors: Frédéric and Arnaud
  • Improve pickling of Theano objects

    • Difficulty: very hard.
    • Skill needed: Python.
    • Theano Shared variable pickling with and without GPU.
    • Cache the compilation step in the compiledir (started, but need to be finished gh-)
    • Mentors: Frédéric, Arnaud and Pascal

Other ideas not sorted and not developed

  • IfElse (lazy evaluation) c code and can be inplace on two inputs
  • Faster optimization phase (use a SAT Solver?)
  • Allow to do memory profiling in the CVM (now it use the VM)
  • Re-write DebugMode to reuse the CVM or VM and simplify it
  • less opaque theano.function()
  • Track user usage of Theano with their permission
    • Allow to find bugs that would have affected you in the past too.