GSoC 2011 Report Tom Bachmann: Definite Integration - sympy/sympy GitHub Wiki

GSoC 2011 Report Tom Bachmann: Definite Integration

This page is a report on my work on SymPy as part of the "Google Summer of Code" program.

About Me

My name is Tom Bachmann, and I study mathematics at the University of Cambridge, England. I will enter part II of the Mathematical Tripos in october 2011, which obviously will be my third year of undergraduate study ;). My primary interests are arithmetic, analytic and algebraic geometry, which is why I applied to implement a heuristic symbolic definite integration algorithm ;).

Introduction

I know it is advantageous to do (lots of) internships or similar. However, I had to sort of settle on the conclusion that there are just no internships in pure maths. I have known about the existence of GSoC for a long time, and sort of by accident one day I found myself browsing through the accepted organisations and proposed projects for 2011. As it turns out, there indeed seemed to be one organisation offering projects that I was vaguely interested in: mathematical computations not involving floating point numbers. Indeed there was a proposed project in symbolic integration, and even though I knew nothing about how to implement this it had always seemed intriguing.

I am not going to describe my application process in great length. Needless to say, I looked at the literature, read up on the theory involved, and thought hard about what I believed I would be able to do. Then I wrote a draft proposal and kept bugging people on IRC to tell me what else they wanted to know; special thanks go to Ronan and Aaron (the latter also my mentor) for comments at this stage. During this time and the ensuing waiting period I familiarised myself with SymPy by fixing bugs. I started with random stuff tagged EasyToFix. Eventually I hit an EasyToFix issue that turned out to involve rewriting half of the gruntz algorithm (for computing limits)...

Eventually my proposal was accepted and I spent the summer coding.

Looking back, I'm surprised to say that I implemented pretty much what I proposed (although the proposal was admittedly vague about the second half). My contributions can be split into four categories:

    1. Miscellaneous improvements.
    1. Addition of new special functions.
    1. The hyperexpand() function.
    1. The meijerint family of functions.

GSoC Review

GSoC started well before the end of my academic term. For this reason, I of course began coding even before the official GSoC start ;). In all seriousness, in cambridge you actually get about six weeks of preparation time for exams with no lectures at all! So while it was clear that I could not be working full time during term time, indeed not at all during examination period, it was also clear to me that I would have significant amounts of free time during the preparation period to work on SymPy. (I'm the first the admit that tripos exams are extremely challenging, but how exactly are you supposed to prepare full time for six weeks?)

I began by adding bessel functions (2) and then I started the hyperexpand() function (3). Then I had exams. I also submitted my first pull request (creatively named "gsoc"). After that I started meijerint (4). This was of course the bulk of my project, so I spent a long time on it. I did lots of minor and not-so-minor improvements across many parts of sympy (1), and I also kept improving hyperexpand (3) until I felt my code produced halfway-satisfactory results. I submitted a pull request ("gsoc-2") and started further polishing. I added lerch transcendent (2, 3), improved the meijerint heuristics and performance (4) and added exponential integrals (2). As I suppose happens to everyone, I kept doing minor improvements "all over the place" (1). All code is submitted for inclusion in SymPy master ("gsoc-3").

With this rough outline of my actual working in place, in the next subsections I will describe each of the four categories mentioned in the introduction separately. This seems more advantageous than fleshing out the actual order of implementation sketched above, which as can be easily seen was a mess :-). In any case, a collection of much more lively snapshots of the development history is provided by my blog.

Miscellaneous Improvements

It would be a lenghty and unenlightening process to describe many details here. So let me just illustrate one highlight: polar numbers. There are functions defined on (say) the positive real line which, for various purposes, one might wish to extend to the complex numbers. It is clear (principle of isolation of zeros) that under reasonable circumstances there is at most one way to do this, and for many functions, e.g. (sin{x}), such continuations are well known. On the other hand there are functions, like (sqrt{x}), which have no such extension. One can define (sqrt{x}) on a cut plane (mathbb{C} setminus mathbb{R}_{le 0}), but it is impossible to (continuously) extend it further. Of course the right thing to do is to define (sqrt{x}) on a double cover of (mathbb{C} setminus {0}), any section over a simply connected subset of the base will then yield such an extension with "branch cuts" as before.

While this is a very neat solution, it is very difficult to use on a computer. This is because there is no practical uniform description (I am aware of) of all Riemann surfaces which arise in this way. Polar numbers are a sort of middle ground: they represent elements of the Riemann surface of the logarithm, i.e. the universal cover (mathcal{S}) of (mathbb{C} setminus {0}). In particular the (for this GSoC project) all-important Meijer G-functions are defined as (discontinuous) functions on (mathcal{S}).

In down-to-earth terms, then, polar numbers are complex numbers in polar form, i.e. in the form (re^{i theta}), where (theta) is remembered in full, and not modulo (2 pi). There are variety of functions implemented to deal with these numbers. exp_polar(x) is the most direct way to create a polar number, and it's meaning should be self-evident. polar_lift(z) can be used to convert an ordinary complex expression into a polar expression by lifting to the standard branch. The function unbranched_argument(z) can be used to represent/evaluate the argument of a polar number (note that this argument can of course be any real number). Symbols can be given an assumption "polar" to express that they are polar numbers.

Various identities that hold between positive reals but not in general between complex numbers hold between all positive numbers. Examples of this are (log{ab} = log{a} + log{b}) and ((xy)^a = x^a y^a). All of these things work (or should work) if the correct assumptions are in place.

New Special Functions

In the course of my work I added the Bessel (J, I, Y, K) and Hankel (H^{(1)}, H^{(2)}) functions, the hypergeometric and Meijer G-functions, the exponential integrals Ei, (E_n), the trigonometric integrals Si, Ci, the hyperbolic integrals Shi, Chi, the polylogarithm and the lerch transcendent. All come with extensive docstrings and various basic functionality such as differentiation and rewriting. I also improved numerous other special functions, like the error function and the incomplete gamma functions. In particular many of these functions come with "branching prescriptions", i.e. if the relevant function is branched at the origin and passed a polar argument outside the standard branch, it will be reduced. For example:

In [1]: Ci(x*exp_polar(2*I*pi))

Out[1]: Ci(x) + 2⋅ⅈ⋅π

The hyperexpand() Function

The hypergeometric and Meijer G-functions are very general in that they can represent the vast majority of commonly occuring special functions. They are also very convenient in that they allow us to describe and understand these special functions in a uniform way. They are inconvenient in that they are somewhat "obscure". In particular it is often possible (though far from obvious how) to express hypergeometric or Meijer G-functions in terms of more common named special functions. This is the sole purpose of the function hyperexpand(). It implements an algorithm by Kelly Roach that uses a combination of properties of said functions and lookup tables. The results are impressive and underlie any integral (or sum) computed with my code.

The meijerint Family of Functions

The real reason for my project was of course to compute integrals. In particular, by rewriting the integrand using G-functions and then employing integral theorems, many non-elementary definite integrals can be evaluated. It would be too much to go into details here, they can be found in documentation files. Let me just say that this way we can compute some indefinite integrals, some definite integrals over finite intervals, and fairly many integrals over the positive reals or the entire real line (in particular if the integrand is moderately simple). The latter category of course includes all common integral transforms, and also integrals from probability theory. There is special support for fourier, laplace and mellin transforms, and also for their inverses (which in the case of mellin and laplace transforms are computed using different trickery involving G-functions). Let me just give a very small sample to convey the flavour of the results of my work: http://pastebin.com/raw.php?i=C59eKRMk. For many more interesting and complicated examples, refer to my blog posts.

Outlook

The branch "gsoc" has been merged into master and "gsoc-2" is under active review by Aaron. This is very thorough and I feel my code is improving a lot this way. In general Aaron has been a wonderful mentor. When this is merged "gsoc-3" will have to be reviewed. I contend it may take quite some time until everything is in; I will do my best to speed up the process (in particular since at least one other GSoC branch depends on my code).

When all of my code is merged, I intend to stay with sympy. After all, a free (as in libre) symbolic CAS that I understand thoroughly sounds like an awesome item in a mathematician's tool bag. I intend to keep fixing issues related to my code and do enhancements as asked-for. However, I am somewhat exhausted and need a break from all the code I stared at for the last months, so there probably won't be a ton of innitiative from my part. Moreover, when term starts again, my amount of free time will diminish drastically.

On the other hand, maybe I will get around to starting a computational algebraic geometry module for sympy :-).

⚠️ **GitHub.com Fallback** ⚠️