GSoC 2012 Report Sergiu Ivanov: Category Theory Module - sympy/sympy GitHub Wiki

========

About Me

My name is Sergiu Ivanov and, at the beginning of the Google Summer of Code 2012 program, I was studying mathematics and informatics (with a bias for algebra). Almost immediately after the end of the program I moved to France to start my PhD on biologically-inspired models of computing. I am interested in functional programming, cell biology, abstract algebra and, of course, category theory :-)

Project Overview

This summer I was accepted with an idiosyncratic project of my own: starting a category theory module in SymPy, under the mentorship of Tom Bachmann. I was (and still am) deeply grateful to the SymPy community for their exceptionally open attitude to side ideas!

The full proposal can be found here; the proposal also includes a short theoretical introduction which should be sufficient for grasping the essential concepts behind the project idea. I will therefore only sketch the general roadmap that was in plan at the beginning of summer.

As it can be easily inferred from the name, the goal of my GSoC 2012 project was to start a category theory module in SymPy. Obviously, category theory is a relatively large and complex domain of mathematics, so I had no expectations to cover all of it, or even to go too far beyond the very basics. The initial plan was to implement the following three major components:

  1. The base classes, i.e., the classes which would model some of the fundamental category theoretic notions: objects, morphisms, categories, and diagrams. Note that implementing functors was not planned; instead, I left functors for later stages of development of the category theory module.

  2. Automated diagram plotting, i.e., the functionality that would automatically construct a graphical representation of a supplied diagram. Instead of actually producing a graphics file, I decided to produce TikZ code. That would allow me to rely on the higher-level graphical abstractions offered by TikZ, but also to have a nice embeddability into scientific articles.

  3. Automated proof of commutativity basing on the fact of commutativity of a set of diagrams. This basically refers to writing the code that would start with a set diagrams known to be commutative, and that would then be able to decide whether a given diagram is commutative in the given circumstances or not.

In the following sections I will try to describe how the actual development workflow went, restricting the amount of technical details pertaining to the actual project, and instead focusing on the actual development activity.

The GSoC Workflow

The process of working on my GSoC project has largely gone through two phases. In the first phase, I implemented the base classes and I've also got automated diagram drawing done. I was one or two weeks late on schedule, but everything was basically going as expected, the retard being mostly caused by slight time mismanagement and overenthusiastic attitude to polishing certain details. The second phase started at the beginning of August, when I understood that I had actually badly botched the Diagram class and that automated proof of commutativity wasn't just complicated, but very complicated :-)

I will describe the development workflow in two subsections, in accordance with the two phases I mentioned in the previous paragraph. To get a more immersive perspective, take a look at the SymPy-related articles on my blog.

Base Classes and Diagram Drawing

Sketching the base classes and submitting the first large pull request was really straightforward and easy. I essentially followed the proposal, adding the not-described-but-necessary bits of functionality as I went. However, reviewing this pull request took much more time than I initially expected: instead of working a half of a day, I actually spend about five full days implementing the suggested changes. The reason was that, even though I thought I had done sufficiently much object-oriented modelling before, it turned out that I was quite a noob and that I missed several important points, both in modelling the mathematical notions themselves and in being compliant with the SymPy class structure. With the kind help of my mentor and SymPy people, I managed to refactor the original class Morphism into IdentityMorphism, NamedMorphism, and CompositeMorphism, which better represented the corresponding concepts and proved to be much easier and clearer both to write and to use. More importantly, having ideas grouped in tighter entities actually enforced very sharp semantics on the classes, often forcing the user to do things right.

Having base classes done, I took up the next task of implementing automated diagram drawing. Even though I had sketched the algorithm in my proposal, a lot of details were missing which I had to figure out during the development process itself. An important change was the decision to use Xy-pic instead of TikZ. The reason was that Xy-pic is specifically meant for typesetting category-theory-like diagrams, while TikZ is more generic, and therefore more stuff would have to be done by hand. So far, the consequences of the change have been largely beneficial, especially in what concerns the complexity of the output code.

I split the actual work on diagram drawing into two subtasks: object layout and generation of actual Xy-pic code. These two bits of functionality turned out to be sufficiently independent to submit in separate pull requests even. I spent quite a lot of time on object layout because it influences the final aspect of the diagram so much. Also, I wanted to give the user some control over how the diagram is laid out, so I added the possibility to explicitly require the objects to be arranged in a linear-like fashion, as well as the feature to specify groups of objects. The trick with groups is that each group is laid out independently from the others, and then they are laid out together as opaque entities, forming a condensed diagram. I believe this feature gives better control over the positioning of objects, but it has not been that thoroughly tested yet.

The generation of actual Xy-pic code seemed rather straightforward at the beginning, and indeed, I was able to produce plots of simple diagrams within a day or two after having finished the object layout part. However, it was the details that required much more attention: I spent about a week on getting curved arrows curve nicely and not intersect too much with other arrows. I've spent another long timespan on re-positioning morphism names in order to minimise the number of ugly overlaps of labels and arrows.

I am very glad to state that both pull requests, the one concerning object layout and the other one, dealing with generation of Xy-pic code have already been merged into SymPy's master branch. Diagram drawing has not yet been thoroughly tested, though. Furthermore, while working on my other tasks I discovered a lot of room for improvement (i.e., I tried some diagrams and some of them were laid out rather uglily). I do believe though that the heuristics I have implemented can be extended or plugged sufficiently enough to get decent automated diagram plots.

Automated Proof of Diagram Commutativity

I started working on automatic proof of diagram commutativity at the beginning of August. While this meant I was lagging behind my proposed schedule rather seriously, I wasn't too nervous, because I believed that reasoning about commutativity was rather well described in the proposal and only some minor details would have to be added to that description to get a decent implementation. It turned out I was wrong; very wrong :-)

First of all, before starting to write the code, I had to better formalise my goals. It suddenly turned out to be far from being easy. After some hours of meditation, I arrived at the conclusion that the original implementation of Diagram was badly screwed: it looked like I had stuffed two concepts into a single class: the concept of a commutative diagram and the concept of a commutative implication. When I realised this, I quickly split Diagram into two classes, which has made the code clearer, of course.

However, writing the commutativity code meant that Diagram would have to model diagrams instead of just being a container for morphisms, as it used to be in the diagram drawing period. I wasn't too much concerned about this; yet, having implemented an algorithm for find embeddings of diagrams, I discovered I had some strange hash-randomisation-related failures. When I tracked down the source of error, I found that Diagram was still badly screwed: it was supposed to store all the morphisms of a diagram, but instead it would only add some of them; different morphisms every time. It felt like slapping myself on the face and fixing a simple bug would have solved the issue; however, when I actually fixed the addition of morphisms and Diagram would automatically add all morphisms to itself on creation, it obviously failed in the cases where there were loops or cycles. I desperately tried to fix everything in an ad-hoc way, but that turned to be impossible. Now note that I realised this one week before the pencils-down date :-)

So I re-thought and re-implemented Diagram from scratch. This resulted in a completely different, much more intelligent class, which was still largely compatible with the old Diagram, so the fundamental change of internal workings did not affect the existing code almost at all. After that I had to implement embeddings of all kinds of diagrams (including infinite diagrams). I started working on this one day before the pencils-down date and I've spent two days on it, having written what I believe to be the most sophisticated and well commented piece of code I've ever done, to only throw it away :-) It turned out that I could use the concept of expanded generators which I introduced myself to deal with infinite diagrams as if they were finite. I have proved a couple lemmas on the way; they are all in my notebook, but I will put them on my blog soon. I will not go into details here; appropriate comments will be soon available in the docstring of the new Diagram class which will hopefully be merged in a couple weeks (that mostly depends on how fast I fix the issues my mentor Tom has pointed out).

Very happy with my diagram embedding functionality, I started writing the code for doing the actual proofs. I thought very little about optimisation in order to keep things as simple as possible. Unfortunately, this has resulted in prohibitively slow code. This meant that everything was so slow that I could not even check whether it worked or not. I am very seriously committed to optimising the existing approach and getting it to work though. See the "Further Plans" sections for more details.

Miscellaneous Remarks

I would like to state that this year's GSoC experience was extremely exciting, mainly because of the friendly atmosphere in the SymPy community. I found it very useful to post on the blog every week, although I was always late with my blog post :-)

I am very excited with having rediscovered test-driven development! I found out that this made it possible to write good tests, which depended very little on the structure of the code that implemented the actual functionality. And yet more fundamentally, with test-driven development one can understand much better the functionality that one eventually describes in code.

I would also like to thank my mentor Tom Bachmann for his tolerant attitude towards my always being late and eventually towards my failure to implement the reasoning about commutativity in time. I would totally like to express my corresponding appreciation and thankfulness by actually finalising the said task and by having it merged into SymPy's master.

Further Plans

My further plans include responding to Tom's suggestions about the pull request which introduces the new Diagram class, as well as bumping up test coverage of diagram drawing. I also intend to find out why some diagrams are typeset in an ugly way and fix the issues.

In parallel with that, I will start posting to my blog the proofs necessary to explain why my idea about proving the commutativity of a diagram will actually work, just as I have promised before. After that I will work on optimising the reasoning procedure so that it runs much more efficiently.

I'd also plan to work a little bit on automated diagram drawing and implement the possibility to force certain coordinates on some objects in the layout of a diagram, as well as on extending the part which produces actual Xy-pic code.