GSoC 2014 Organization Application - Theano/Theano GitHub Wiki

Our own note:

  • Mentors: Fred, Arnaud, James
  • back-up mentors: Yann D., (Razvan as a last resort, David WF is his intership allow)

Describe your organization.

Theano is a critical component at the root of an ecosystem of machine learning projects, including PyLearn2[1], PyMC[2], HyperOpt[3], and the Deep Learning Tutorial [4]. It is a Python library that allows you to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficiently, on GPU or CPU. Theano has been in development and active use since 2008, and has achieved significant adoption across a range of industrial and research settings (at least 6K downloads per month).

The core contributors are based at the University of Montreal, and they are complemented by an active user community that contributes code, issues and fixes via github.

Some of Theano's key features include:

  • transparent GPU usage: run your code on the GPU for a speedup up to 140x (float32 only).
  • automatic differentiation: Theano does your derivatives and produces an expression for the gradient automatically.
  • optimizations for speed and stability: get the right result for near-limit expression like log(1+x)
  • dynamic C code generation: faster evaluation of your expressions.

[1] http://deeplearning.net/software/pylearn2/

[2] https://github.com/pymc-devs/pymc/tree/pymc3

[3] http://hyperopt.github.io/hyperopt

[4] http://deeplearning.net/tutorial/

Why is your organization applying to participate in Google Summer of Code 2014? What do you hope to gain by participating?

Theano has grown to be quite popular, and with that popularity comes a broader diversity of use cases than the original designers anticipated. Theano is being pushed to its limits in several respects, and we have identified a few specific development efforts that could be useful to a lot of people. The support of the GSoC program would allow us to attract some talented people to improve internal code organization, improve code documentation, scale to larger programs and extend support for OpenCL to complement our current CUDA code generation. This will help us to grow our base of knowledgeable developers and ensure continued good health of our project.

Of course, working with smart people can also lead to all sorts of unexpected interesting outcomes, and we welcome those too!

Has your organization participated in past Google Summer of Codes? (yes/no)

No

If your organization has not previously participated in Google Summer of Code, have you applied in the past? If so, for what year(s)?

No

What Open Source Initiative approved license(s) does your project use?

BSD 3-clause

What is the URL for your Ideas list? This is the most important part of your proposal. Please make sure we can access it and it is complete when you submit this proposal. “Placeholder” or inaccessible ideas pages will be grounds for an automatic rejection for participation in Google Summer of Code 2014.

https://github.com/Theano/Theano/wiki/Gsoc2014

What is the main development mailing list for your organization?

[email protected]

What is the main IRC channel for your organization?

We use a user mailing list [email protected]

Who will be your backup organization administrator?

Arnaud Bergeron

What criteria did you use to select the mentors? Please be as specific as possible.

We selected mentors on the basis of their familiarity with the code, familiarity with particular application areas, and their stability / reliability. Frederic Bastien is a full-time employee of the University, and has overseen all aspects of Theano's development for several years. Arnaud Bergeron is also a full-time employee, and has worked extensively with Theano for the last few years, especially on the nascent OpenCL support. James Bergstra was one of the original designers of the library, and has continued to be active in the design of Theano and related libraries in the Python machine-learning space.

What is your plan for dealing with disappearing students? Please be as specific as possible.

What we saw from other organizations is that students that communicate openly with the community have more success and stay longer with the project. So we will push in that direction with selected student(s) and during their selection. Having regular contact with them will help prevent them disappearing. We will ask for a minimum of a weekly update on the status of the project, but we will encourage them to have even more frequent contact.

We will also give a preference during selection to students that are able to keep frequent communication with us, for example when discussing their projects proposal or in pull requests.

We will ask students to put their code on github to allow us to do frequent reviews and be able to guide them better.

In the case where contact with a student becomes difficult, infrequent or nonexistent, we will schedule a meeting via hangout or some other real time system. We will try to have a real discussion about what is not working and try to see what can be done to solve the problem. The goal is to fix the situation and have a good contribution at the end of the project.

If a student stops being responsive, we will ask him daily reports to make sure his project doesn't fail. Having one lone student without experience working without communication will infrequently end in good work.

If this doesn't fix the problem, we will raise this to GSoC mailing list as other organization have already faced this. They could have good suggestion to help us. As the last resort, we will need to fail the student. But we will make sure the student is not surprised by this, by explicitly warning him about this possibility after failed attempts to fix the issue.

Our experience in a large research lab with contributors around the world has given us lots of experience in dealing with various communication gaps in constructive ways.

What is your plan for dealing with disappearing mentors? Please be as specific as possible.

The machine learning research lab that supports Theano's development, has several full-time faculty and postdocs, tens of students, and two of the proposed mentors are full-time employees of the lab. In the unlikely event that the mentors disappear, there are several other qualified people that are capable and interested in standing in to ensure the GSoC projects are successful.

Students will be encouraged to ask for help on the mailing list when they need something. If a mentor becomes less available, even temporarily, the rest of the community will be there to help.

What steps will you take to encourage students to interact with your project's community before, during and after the program?

We will highly encourage students to submit a PR before their proposal. For this first year, making this a hard requirement will be tough, as potential students might not have seen Theano as a possible mentor organization and won't have as much time to do this as with veteran organization. We will still push in that direction, even if the contribution is very small(typos, trivial stuff). For students without a code contribution, we will ask for another source to evaluate their programming skill.

Doing even a trivial PR will force communication between the applying student and the mentors. Students that aren't able to do a minimal PR probably will spend too much time learning the basics to produce good work during the summer.

We will request students to blog every week about their work. This will give them the habit to contact with the full community and not just their mentor. We will also request that they have a good communication with their mentor about the advancement. We will highly encourage those student/mentor communication to be open to the full community. Other organizations have found that students that communicate in such a fashion stay longer with their projects after the summer. So we are going in that direction. This will also allow other people in the community stay up to date and possibly give other suggestion.

We will also prefer applicants who have a vested interest in using Theano afterward (e.g. for their research) and applicants who have worked on related projects, and want to build bridges.

Are you a new organization who has a Googler or other organization to vouch for you? If so, please list their name(s) here.

We have 5!!!!

(we need names ...)

Are you an established or larger organization who would like to vouch for a new organization applying this year? If so, please list their name(s) here.

N/A

What will you do to encourage that your accepted students stick with the project after Google Summer of Code concludes?

From other organizations reports, what they found is that including students in the community during the summer helps to keep them contributing. So we will try to make them feel they are part of the community.

Another important factor is that we evaluate and recognize contributions and suggestions in our community from everybody. That way people feel welcome to stay and contribute.

We will also tell that we would appreciate if they stay in our community and continue to contribute. We can't force that (and that would be bad), but making it clear that it will be highly appreciated can influence them to stay. They will see a future for them in our community.