GSoC 2015 Application Shivam Vats : Series expansions - sympy/sympy GitHub Wiki

About me

Contact Information

Name : Shivam Vats
University : Indian Institute of Technology, Kharagpur
email-id : [email protected]
github username : shivamvats
Blog : shivamvats.github.io
Time-zone : UTC + 5:30

Personal Information

My name is Shivam Vats, a second year undergraduate student at IIT Kharagpur, India. I am pursuing a degree in Mathematics and Computing.

I use Ubuntu 14.04 LTS on my workstation and Vim (with some really cool extensions) as my primary text editor. I've been using Winpdb for debugging and understanding the code flow, for some time now, and think it's really awesome. I've been coding in C and C++ for over 3 years and in Python for under a year and am proficient in all three of them now. I was introduced to Cython while working on CSymPy and I find it very interesting.

I am very much interested in robotics and work for the Autonomous Ground Vehicle group of my college. I have worked on and implemented Bayesian filters (mostly Kalman and Extended Kalman) to solve the problem of localization of a car.

My mathematical background is sufficient for the project, with relevant courses taken being Discrete Mathematics, Computational Number Theory, Linear Algebra, Numerical Methods, ODE and PDE. I am also familiar with real and complex analysis and abstract algebra.

Contributions

I was introduced to Sympy by Harsh Gupta in October, 2014 and have been contributing ever since. Here is a list of all the merged and unmerged pull requests in chronological order.

###Sympy

  • (Merged) Corrected as_leading_term in hyperbolic functions (fixes #8455 ) #8456. Caught the bug while working on issue #8369

  • (Merged) Fixes simplification of more than two summations #8608

  • (Merged) Added sech and csch under ReciprocalHyperbolic class #8401. Despite being my first pull request, it is my most important contribution to Sympy. I was introduced to the Sympy workflow while working on this PR and hence it probably took a much longer time than it should have. It refactored the hyperbolic functions to create a ReciprocalHyperbolicFunction class and then added sech and csch to it. I also caught and raised #8421 while working on it.

  • (Unmerged) Added is_even and is_odd methods to Min and Max #8646. This was made redundant by a better patch #8652

  • (Merged) Added line_color option to plot_implicit #8693. I almost read the whole of plotting code while working on this. I compiled and posted a list of possible improvements to the plotting module at Improving the plotting module.

  • (Unmerged) Modified default range to interpret the assumptions of free_vars #8701. This was an attempt to integrate assumptions with plotting, with a hack. However, the hack was not good enough and wrong in some sense.

CSymPy

I liked the idea of having a fast C++ library and started working on CSymPy in December, 2014

  • (Merged) Fixed typos in documentation #349 A minor doc fix.

  • (Merged) Fixed wrong printing of pow when base is Mul #359.

  • (Merged) Added the method symbols #361. Ported the code of symbols from Sympy to CSymPy's wrappers. It led to a discussion on the best way to test Python wrappers.

  • (Merged) Allows expand to work with pow instances with negative powers #373

  • (Unmerged) [WIP] Implementation of the basic structure of Polynomial class #406. It is an initial structure of the proposed Polynomial class.

Theory

A series is the sum of the terms of a sequence, usually involving the powers of a variable.

Every function can be represented in the form of a series in the powers of one of its variables. That is called its series expansion.

An analytic function is one that is equal to its series expansion at every point in its domain. Series expansion is calculated about a specified point lying in the domain of the function. Every function has a unique series expansion at a given point and is mainly of one of three types:

  1. Taylor Series : Let us say, we want to calculate the series expansion of a function at a in its domain D. If the function is analytic in a disc centered at a, the function will have a Taylor series expansion. A Taylor series has only positive exponents of the variables and is calculated as follows:
    Taylor
  2. Laurent Series : If the function has singularities, then Taylor series is not always possible. If the function is analytic in an annulus between two circles C1 and C2, centered at a and with radii r1 and r2, respectively, then there exists a unique series expansion for the function in terms of positive and negative powers of (x-a). Thus, Laurent series is an extension of Taylor series, that allows negative powers.
  3. Puiseux Series : It is even more general in that it also allows fractional powers of the variable. A Puiseux series in indeterminate T can be considered to be a Laurent series in t*1/n, where n is positive.
    Puiseux

Series expansion and series manipulation are of fundamental importance when dealing with most mathematical operations. In most practical situations, an exact solution is not possible and we need to find a sufficiently approximate solution. An immediate example is evaluating exp(x):
exp(x)

We approximate the function using its series expansion. We can get as close to the actual value as we want, by simply considering more terms of the expansion.

For many common functions like exp(x), sin(x), log(1+x), we have a formula, which gives us as many terms of the series as we want.

Even though the final series is infinite, manipulations of such series is easy as we only need to do the same operation on the formulas. However, such a closed form is not always possible. In such situations, when we have to manipulate two infinite series, we can deal with only a part of the series. So, depending on the accuracy we want, we operate on polynomials of some chosen degree n. For example:
p(x)

By doing so, we are effectively converting the problem to one of polynomial arithmetic.

A polynomial is an expression consisting of variables (or indeterminates) and coefficients, where the variable can have only non-negative integer exponents. eg:
p(x)

Some Definitions

There are two ways to store polynomials:

  • Dense representation: Storing an array of all the coefficients even if it is zero, starting from the highest degree down to the constant term.
  • Sparse representation: Storing the coefficients and powers in a dictionary or hashtable. eg. x as {(5, 2, 3): 3, (1, 0, 0):1}.
  • Kronecker's trick: Kronecker's substitution trick is way to convert polynomial multiplication into a problem of integer multiplication. It packs the coefficients of the polynomial into a large enough integer. After multiplying, the integer can be unpacked to get back the coefficients. Consider the problem of squaring p. The result is this polynomial: p(a) Now take p evaluated at 10^5 and square this integer to get 1156038080844014856161641404008100. The coefficients can be easily read off from the integer (by modulo operation).

The Project

My proposal is to implement a fast, class-based, sparse series expansion infrastructure in CSymPy and Sympy, along with a fast Polynomial module for CSymPy, that can handle series related operations. The proposed series and Polynomials module for CSymPy will be among the fastest and comparable in speed with or faster GiNac, Mathematica, Maple. The new Series class in Sympy will significantly improve the performance of series expansion.

Motivation

Series-expansion

Most advanced algorithms for computation of limits, including Gruntz[4] (that Sympy uses), make heavy use of series expansion. Further, applications are in solving differential equations, partial differential equations, integration, and others. Projects such as PyDy need very fast trigonometric calculations and series expansions. A slow series expansion, written in Python, will always be a speed bottleneck for it. A much faster solution is to implement optimal algorithms in C++ and then use Python wrappers for use with PyDy, which is what I (and CSymPy in general) am proposing.

Why series expansion in CSymPy

Sage currently uses Pynac as its main symbolic engine. The core of Pynac, in turn is GiNac.

Following is the benchmark[10] tests for CSymPy vs Sage on some of the benchmarks from http://wiki.sagemath.org/symbench:

                                                                         Factor
Time for R1     CSymPy :    0.00066423 s     Sage   :   0.0044119358 s   6.64218087108 
Time for R2     CSymPy :    0.02706814 s     Sage   :   0.2184009552 s   8.06856160785
Time for R3     CSymPy :    0.00001597 s     Sage   :   0.0002391338 s   14.9739386349
Time for R5     CSymPy :    0.00000691 s     Sage   :   0.0281481743 s   4073.54186686
Time for S1     CSymPy :    0.01834798 s     Sage   :   0.1464669704 s   7.9827300008
Time for S2     CSymPy :    0.02388787 s     Sage   :   0.2697858810 s   11.2938441561
Time for S3     CSymPy :    0.02070594 s     Sage   :   0.1418960094 s   6.85291319303
Time for S3a    CSymPy :    2.09762287 s     Sage   :  14.9792959690 s   7.14108154675

CSymPy clearly outperforms Sage in all the tests by large margins.

Here are some benchmarks[10] against GiNac core:

                   CSymPy          GiNac         Factor
Time for R1 :    0.002495569    9.078088352    3637.68276974
Time for R2 :    0.013004237    0.015011056    1.15432039573
Time for R3 :    0.000003073    0.000003073    1.0
Time for R4 :    0.000005518    0.000031359    5.68303733237
Time for S1 :    0.018381039    0.037988210    2.06670634886
Time for S2 :    0.026061345    0.075487462    2.89652978386
Time for S3 :    0.021999655    0.027604139    1.25475326772
Time for S3a :   2.234145347    2.170938707    0.971708805748

CSymPy outperforms even GiNac core in almost all benchmark tests. Thus, CSymPy is being written with single-minded focus on speed. Moreover, the option of optional thin wrappers makes the speed easily accessible from a number of languages.

The goal of CSymPy is to be have a speed comparable or better than other CAS, including Mathematica, Ginac and Maple. All of them have robust Series and Polynomial modules. For CSymPy to be able to compete with or replace them, it has to have a robust and fast series module. A big goal of CSymPy is to replace Pynac in Sage[8]. However, compared to GiNaC, CSymPy is missing specialized polynomial manipulation and series expansion and pattern matching. My proposal will greatly further CSymPy's candidature to replace Pynac.

  • Series expansion and polynomials are the core modules of any CAS, over which rest of the functionality is built. My proposal will build the core of CSymPy and prepare the base for addition of more advanced modules, all of which will benefit from the fast core. Effectively, it will give Sage a much faster alternative for symbolic computation.

Polynomials

A polynomial module is the backbone of any CAS. It finds application in solving equations, factorization, series expansion, mechanics, Physics, and so on. Series expansion, when used for functions evaluations and approximation, is effectively a polynomial. Thus, the core of series expansion will be CSymPy's polynomial module. All the optimized algorithms will be implemented in this module. The series module will merely be its client. This way, as and when Polynomials are improved, series expansion will improve automatically. Sympy's current poly module is quite robust and extensive and hence should serve as a model for CSymPy too.

Audit of other CAS

Series Representation

  • Mathematica uses a class called SeriesData to represent a class. It uses dense representation and stores all the coefficients of the series in a list (even if some of them are zero), series variable and the expansion point.
  • GiNac uses sparse representation and stores the series in a class called pseries in a list of (coefficient, power) pairs. The class also stores other information about the series like expansion point, etc. They have a function that lets the user to convert a series into a normal expression and vice versa.

It is clear that most popular CAS favour a class based representation of series. However, there is no consensus on whether to use dense or sparse representation.

Polynomial representation

  • Piranha is quite similar to what we want Polynomial to be capable of, in CSymPy. Piranha uses templates everywhere in its code. Consequently, its polynomials are totally generic, i.e, one can create a polynomial over any ring and with any type of exponents. It uses Kronecker's substitution trick for polynomial multiplication and also has a special static implementation of vectors that is faster than std::vector.

Proposed Design

Sparse Polynomials

However, Polynomial representation is predominantly dense in Sympy. Using a dense representation makes sense only for univariate polynomials. The number of terms of a dense multivariate polynomial is huge. Consider a completely dense polynomial in k variables of total degree n_1..n_k. Then the number of terms of this polynomial is as large as frac. Suppose k = 5 and n = 50, assuming that coefficients are native 32–bit integers and are stored in an array, then we would need almost 80 GiB of memory to hold this kind of polynomial in a true dense representation[7]. Though such a purely dense polynomial is rarely encountered, a sparse representation will be faster and more efficient than a dense one. Even for polynomials of small degrees, sparse representation gives comparable speed[5].

Moreover, multiplication algorithms in Sympy are not optimised, and in many cases are naive. Kronecker substitution can give significant speed gains.

A speed comparison
expr = (2*x + 1)**1000
R.dup_mul(expr, expr)   #Sympy's univariate multiplication, using Karatsuba
Time : 1.30838489532
R.kronecker_mul(expr, expr)   # Univariate multiplication using Kronecker substitution
Time : 0.579982995987

See Kronecker Substitution for its Python code. The timings are with an unoptimised Kronecker substitution implementation, but does manage to show why Kronecker is better.

Hence, a combination of sparse representation and Kronecker substitution should give the fastest possible result in CSymPy[6].

Class for Series Expansion

The current approach in Sympy in which a series is simply a core object, i.e, a sum of core objects, has worked quite well till now. Its design is very simple, yet it supports a wide range of operations. However, the current design puts a lot of constraints on it, the most important being, slow speed. Due to this, the Sympy series design cannot be used in CSymPy. Following are the major issues with it and my comments about them:

1. All types of series are grouped under the same name

The user never knows what series to expect when, he calls the series function and he has absolutely no information about the properties of the series from its data type. There are different types of series, with markedly different properties. In many cases, it is advantageous to know what series to expect, so that one can make series specific optimizations in one's code. For example, if one knows that one should expect a Taylor series, on calling a function, he need not bother about handling negative exponents, if he, let's assume, wants to transfer the series to a different data structure.

2. Series expansion is slow and naive.
In [51]: %timeit s = (1/cos(y/log(y))).series(y,0, 10)
1 loops, best of 3: 2.6 s per loop

It takes 2.5 seconds for just 10 terms!

%timeit t = exp(y).series(y, 0, 10)
1 loops, best of 3: 580 ms per loop

Now, 0.58 s might not seem much, but when used in an algorithm that needs repeated calls for series expansion, for example PyDy, the total time taken is exorbitant.

Why series expansion is slow

All elementary functions like, log, exp, sin, etc have their series expansion calculated by naively using their Taylor expansion formula. However, much more efficient algorithms do exist. In fact, rs_log and rs_exp are already implemented in ring_series and are much faster than the naive methods that are currently used. A comparison:

In [53]: %timeit t = log(1+y).series(y, 0, 100)
1 loops, best of 3: 3.71 s per loop

In [54]: %timeit s = rs_log(1+x, x, 100)
100 loops, best of 3: 10.6 ms per loop

A factor of 2300 is simply too much to ignore. Similar efficient methods to calculate power series of elementary functions do exist.

3. Series expansions are stored as sum of core objects

I will implement sparse representation for series, on top of polynomial rings in ring_series. ring_series already uses dictionaries and lists, hence it won't be difficult. For univariate monomials the key is just the exponent of the monomial, while for multivariate polynomials, a tuple of their exponents is used. Kronecker's trick will give significant speed improvement, especially for univariate polynomials.

In [48]: from sympy.polys.ring_series import rs_exp, rs_mul
In [49]: q = rs_exp(x**2, x, 101)  #Series of exp(x**2) stored in a sparse dictionary
In [50]: t1 = clock()
    ...: b = rs_mul(q,q,y,101)     #Sparse multiplication
    ...: t2 =clock()
    ...: t2-t1
    ...: 
Out[50]: 0.0010459423065185547
In [56]: p = exp(y**2).series(y, 0, 100)  #Series of exp(y**2) stored as a core object
In [57]: t1 = clock()
    ...: a = expand(p*p) 
    ...: t2 =clock()
    ...: t2-t1
    ...: 
Out[53]: 0.0587308406829834    

Manipulation over core objects is slow by a factor of 50. This, of course is a slightly sparse series, however sparse multiplication will still be faster with dense series.

Sparse representation based power series based polynomial rings is partially implemented in polys/ring_series.py based on the work of Mario Pernici. However, a greater part of his work is unmerged (https://github.com/sympy/sympy/pull/609/files). The best approach is to build on his representation and complete it. Such an approach will give the best speed[2].

4. Core has to deal with order.

Core should not bother about the order of the series. Core does not have specialized data structures and algorithms to handle the Order term. Hence, we can treat the Order term as another series, on which the regular rules of series expansion work. When required to be printed, the series can be printed along with its order term in the proper format.

Implementation Details

I plan to divide my project into three phases. At the end of each phase, we'll have something that can be merged directly into the code base.

Phase I

I'll start by implementing a basic fast polynomial module in CSymPy.

The polynomial module will have the following features/operations:

  1. A class based extensible structure compatible with Sympy
  2. Use of sparse representation
  3. Fast basic operations including addition, subtraction, multiplication, division

There are two common ways to implement sparse multivariate polynomials-

  • The tuple representation using std::vector or std::array where we have a map in which the vector/array stores the powers of the variables and an int or rational stores the coefficient.
  • Hashtable implementation using Kronecker substitution: We pack the exponents into one integer, e.g. if each requires 8 bits, then we need a total of 24 bits and we can use up to 64bit integers (or perhaps 128bit using the __int128 type in C++). So we pack them into one integer, and then use that integer as the key in the hashtable. Then, multiplying two monomials means, the final key is got by adding the keys of the operands. For example, if we have a monomial monomial, we can use 502030 as its key in the hashtable that maps (5,2,3) with (3).

Corresponding to these two methods, there will be two layers of implementation:

The interface layer

CSymPy uses Teuchos::RCP smart pointer for reference counting. To make Polynomials work with Sympy, it needs to use RCP and hence will be subclassed from Basic. Further, this will use std::vector to store the exponents. Here is a working implementation of a univariate Polynomial class. It broadly follows the proposed structure, except that we don't need a vector for univariate polynomials

namespace CSymPy {

    class Polynomial : public Basic{
    public:
        int degree;
        std::string var_;
        map_uint_integer dict_;  //Will be changed to use fast integer class
    public:
        IMPLEMENT_TYPEID(POLYNOMIAL)
        Polynomial(){}
        Polynomial(const std::string& var, map_uint_integer&& dict);
        std::size_t __hash__() const {return 0;}
        bool __eq__(const Basic &o) const
        {
            const Polynomial &s = static_cast<const Polynomial&>(o);
            return this->var_ == s.var_;
        }
        int compare(const Basic &o) const
        {
            const Polynomial &s = static_cast<const Polynomial&>(o);
            if (this->var_ == s.var_) return 0;
            return this->var_ < s.var_ ? -1 : 1;
        }

        virtual vec_basic get_args() const {return {};}
        virtual void accept(Visitor &v) const;
    };  //Polynomial
    
    RCP<const Polynomial> add_poly(const Polynomial &a, const Polynomial &b);   
    RCP<const Polynomial> sub_poly(const Polynomial &a, const Polynomial &b);
    RCP<const Polynomial> mul_poly(const Polynomial &a, const Polynomial &b);
    RCP<const Polynomial> div_poly(const Polynomial &a, const Polynomial &b);
    inline RCP<const Polynomial> polynomial(std::string i, map_uint_integer&& dict)
    {
        return rcp(new Polynomial(i, std::move(dict)));
    }
}  //CSymPy
  • The coefficients of the polynomial may belong to a ring. There are two ways to allow them to be generic:
    • The polynomial class can store that information in a ring member variable, so that operations specific to that ring can be dispatched on the specific ring. This will be less flexible but simpler to implement and read.
    • We will use templates to allow for any ring to be used and make polynomials fully generic, i.e, control the types of both the exponents and the coefficients. Polynomial<int, Rational> f; Creates a polynomial with integer coefficients and rational exponents. However, use of templates leads to a less readable code. I will decide the best approach to implement by discussing with my mentor.
  • The Polynomial class will store the polynomial, its degree and will have arithmetic operations add, sub, mul, etc defined for polynomials. This will be the layer visible to the user. However, it is currently slow due to overhead of RCP. Further, to be able to store multivariate polynomials, it needs to use std::vector. However, std::vector allocates memory on the heap and is hence slow.
  • A faster vector that uses static allocation will solve this issue. See Piranha's static vector for reference.
  • The dict_ will eventually use a fast integer type that switches between machine int and mpz, depending on the size.Implementation of smart integer class
  • The class will have user functions like eval, invert, poly_from_list. More functions will be added to give this module a similar level of functionality as Piranha
The fast layer:

It will basically be a wrapper around a fast hashtable implemented using Kronecker substitution. Consequently, it will be much faster.

vector<long int, fast_int> 
poly(map_vecint_int dict){    //vecint stores the exponents of the monomials in a 
    vector<long int, fast_int> htable;  //vector
    convert_to_kronecker(a, htable);  //Converts the sequence of exponents of 
                                      //every monomial into an integer which 
    return htable                     // can be used as the key this hashtable
}

Now, all the keys are integer. Multiplication is an addition of keys and multiplication of coefficients. Addition is multiplication of coefficients. Finally, exponents can be extracted using modulo operation.

One possible issue with Kronecker substitution is that the exponents may be too large to be packed in __int64 or __int128 or there might be an overflow due to operations. In those cases, an exception will be raised with the help of a guard bit that gets switched to 1 and the user will need to use the vector implementation.

Phase II

Implement Series class in Sympy.

The best way to represent a series class is debatable and not so straightforward. One goal of this phase is to discuss and try different possible class structures for series. I have not yet decided which structure to use, but I do have the following as a starting point:

    FormalSeries     
          FormalPowerSeries   
          FormalLaurentSeries    
          FormalPuiseuxSeries     

The FormalSeries is an abstract type and different types of series are subclassed to it. To me, it seems to be the most intuitive structure as addition of other series like, FourierSineSeries etc, is seamless. I have added a basic Python implementation based on this structure a little later in my proposal.

The reason I want to implement it in Sympy first, is twofold. First, experimenting with code is much easier in Python, than in C++. So, a lot of variations can be tried. Once, we agree on a particular design, I will port it to CSymPy. And secondly, series expansion in Sympy does suffer from a number of issues.

A semi-working example involving univariate series on QQ:

class FormalSeries(object):
    def __init__(cls, data):
        pass

class FormalPowerSeries(FormalSeries):
    def __init__(self, data):
        if isinstance(data, PolyElement):
            self.series = data
            self.ring = data.ring
        else:
             self.ring, self.series = sring(data, domain=QQ)

    def __repr__(self):
        from sympy.printing import sstr
        return sstr(self.series, order=None)

    def __str__(self):
        from sympy.printing import sstr
        return sstr(self.series, order=None)

    def __add__(self, other):
        return FormalPowerSeries(self.series + other.series)

    def __mul__(self, other):
        if(self.ring.ngens == 1 and other.ring.ngens == 1):
            x = ((self.ring).gens)[0]
            prec = min((self.series).degree(), (other.series).degree()) + 1
            return FormalPowerSeries(rs_mul(self.series, other.series, x, prec))

    def __pow__(self, n):
        if(self.ring.ngens == 1):
            x = ((self.ring).gens)[0]
            prec = (self.series).degree() + 1
            return FormalPowerSeries(rs_pow(self.series, n, x, prec))

    def __div__(self, other):
        if(self.ring.ngens == 1 and other.ring.ngens == 1):
            x = ((self.ring).gens)[0]
            return self * other**(-1)

class FormalLaurentSeries(FormalSeries):
    def __init__(self, data):          #Converts a laurent series into a power series by multiplying the 
        x = (data.atoms(Symbol)).pop() #  whole series by the lowest negative exponent.
        lead = data.extract_leading_order(x)
        if lead < 0:
            self.ring, self.series = sring(data * (x**(-lead)), domain=QQ)
        else FormalPowerSeries.__init__(data)

Ultimately we should be able to initialise the series as

x = Symbol('x')
expr = x**2 + 2*x + 1
f = FormalPowerSeries(expr, gens=x, prec=3, domain=QQ) 

or

R, x = Ring("x", QQ)
f = FormalPowerSeries(x**2 + 2*x + 1, prec=3)

Hence, in this phase I will:

  1. Add efficient algorithms for series of the following functions:

    • sin, cos, tan
    • sinh, cosh, tanh
    • lambert
      The starting point for all these functions will be pernici's unmerged pull-request[13], which includes implementations of all these functions. For further speed gains, I intend to use argument reduction and series splitting, wherever possible. Argument reduction consists of three steps:
    • Transform the argument into a reduced argument
    • Calculate individual series
    • Combine the series using functional identities, eg. elementary addition formulas of sin, cos, etc.

I plan to implement them by using Modern Computer Arithmetic by Richard P. Brent and Paul Zimmermann for reference. Even porting of [13] will result in significant speed-up. I can add more functions at this stage if time permits. It can be decided by discussing with the mentor.

  1. Add more user functions to finish ring_series.
  2. Add methods to the series class so that it can use the functions of ring_series

Phase III

By the time I reach this phase of my project, I'll already have implemented a fast polynomial module in CSymPy and figured out the right approach to implement class based series expansion.

Port series expansion to C++

The series class written in Phase II will serve as a model for CSymPy. It is much easier to experiment with code structure in Python. However, I will also make use of the speed benefit of working in C++

  • In Python, we can not customise Python dictionary and list to become faster. In C++, there is no such restriction. Series will be stored in a fast hashtable that uses Kronecker's trick with multivariate series.
  • Series will use the fast vector of polynomial module and the fast integer class.

Write Sage wrappers

To allow CSymPy to be used with Sage, I will write Python wrappers. This won't be particularly difficult as the basic steps will be similar to interfacing SymPy with CSymPy, for which wrappers have already been written:

  • Convert Sage data types to CSymPy's types
  • Add _sage_ methods
  • Add missing methods required by Sage objects, like as_real_imag, etc
    Part of this stage will also be to write extensive tests as all corner cases will need to be covered. The wrapper will allow sage to use the fast core of CSymPy.

Timeline

  • Note 1: I have no major plans for summer. Contributing 40-50 hours a week will not be a problem as I will try my best to learn as much as I can, during the project. My summer vacation starts on 30th April. Hence, I will start coding a month earlier than the GSoC coding period, effectively giving me 4 months to complete the project. Though my classes start in mid July, it won't be an issue as I won't have any tests or exams then.

  • Note 2: I will be writing the tests and documentation for every class and function as I implement it.

###Community Bonding Period (1st May to 24th May) I have been contributing to Sympy-CSymPy for over 6 months now. I am very much familiar with the community, their code base and review process. Hence, I won't need to spend time learning the basics and will be productive from Day 1. A number of design decisions have to be made in my project. I will discuss and finalize some of them during this period.

  • Decide which framework to use with Polynomials
  • Decide what should be added to ring_series
  • Get in touch with Mario and other contributors for their views
  • Implement static vector

Community Bonding Period Week 1 & 2

  • Get the interface-layer of polynomials ready for merge
  • Finish fast integer implementation
  • Modify the polynomial class to use fast integers and add functions

Week 3

  • Implement Kronecker substitution along with fast polynomials.

Week 4

  • Benchmark the fast poly and optimise it
  • Write Python wrappers

Week 5

  • Complete ring_series

Week 6 and 7

  • Finalize series class design and set up basic structure
  • Buffer to finish all pending tasks for mid term evaluation

Week 8

  • Implement basic series functionality in Sympy and get it merged

Week 9 and 10

  • Port the basic series structure to C++

Week 11

  • Implement additional functions
  • Benchmark the implementation against existing systems

Week 12 and 13

  • Write Sage wrappers
  • Finish pending tasks
  • Buffer period

Post GSoC

There will still be a lot of work left, especially in CSymPy's polynomial module. I would help in fixing bugs and reviewing pull requests to Sympy and CSymPy.

How I fit in

I have been involved with the Sympy community for over 6 months and with CSymPy for over 4 months. During this period I have worked on a wide variety of issues and a number of my patches have been accepted. Hence, I am very much familiar with the code and community guidelines. On a number of occasions, whenever I have needed help on issues, the community has been very forthcoming. Consequently, I am familiar with a number of modules and the challenges in building an efficient new code structure.

My project will build entire modules for series expansion and polynomials from scratch in CSymPy. My work in Sympy is also more of a code structure design and optimization problem. While doing so, my experience with Sympy and CSymPy code will complement my proficiency in C++ and Python. I don't claim to have to have covered all the loose ends, given that part of the project is also to identify and implement the most efficient series class structure. But I do have a clear understanding of what the requirements are and have spent time in designing initial code structures. More than anything else, I want to read and learn new things, as and when they come up.

References

  1. http://www.cs.berkeley.edu/~fateman/papers/polysbyGMP.pdf
  2. https://groups.google.com/forum/#!searchin/sympy/series/sympy/hiRuIHa8ImA/JLOBsMr9yUcJ
  3. http://link.springer.com/chapter/10.1007%2F978-3-319-02297-0_8#page-1: Contains definitions and algorithm for multiplication of DMP (Distributed Multivariate Polynomials)
  4. http://www.cybertester.com/data/gruntz.pdf
  5. https://mattpap.github.io/masters-thesis/html/src/conclusions.html#polynomial-arithmetics
  6. https://groups.google.com/forum/#!topic/sympy/TVEp72mZ3Uo
  7. https://mattpap.github.io/masters-thesis/html/src/internals.html
  8. https://groups.google.com/forum/#!topic/sage-gsoc/WbmAJAaGlhs
  9. http://arxiv.org/pdf/0712.4046.pdf?origin=publication_detail
  10. https://github.com/sympy/csympy/pull/374
  11. Mailing list discussion on my proposal
  12. Discussions on gitter with Ondrej
  13. Pull request by Mario
  14. https://github.com/sympy/csympy/pull/408