GSoC 2017 Application Prateek Singh: Polynomials and the rest of SymEngine - sympy/sympy GitHub Wiki

Table of Contents

Personal Background

Details

Name : Prateek Singh
University : Vellore Institute of Technology, Chennai
Email : [email protected]
Github : prateeksingh0001
Time-zone : UTC +5:30
Age : 18

About Me

I am a second year undergraduate student pursuing Bachelors in Computer Science and Engineering at Vellore Institute of Technology, Chennai. I have a keen interest in computer science and physics, which stems from my affinity for mathematics. I work on Linux(Ubuntu 16.04) with Clion as my primary IDE for C/C++ and Pycharm as the primary IDE for python. I use Clion and Pycharm IDEs because they provide with functionalities like searching for functions and classes between files, integrated debuggers and most importantly I’m very comfortable with them which allows me to focus on the important aspect of development like the ideas to be developed upon and their implementational plan. Programming Experience I have been programming for 2 years now, on my introduction to computer science in college, I started with python with which I understood the basics of computer programming and I developed an digital version of the board game battleship on it. I learnt to code in C and C++ about an year ago and I have mostly used it for competitive programming.

I am also a machine learning enthusiast, during my internship I worked on a system, for recommending users best places to visit in their city based on their mood and other preferences like budget, time etc.

As a project in mathematics, during my first semester, I built a system in MATLAB to predict the next occurrences of eclipses among two or more planets and their corresponding star(s), based on their angular velocity and current trajectory equations. I learnt javascript while developing the college website for the annual technical fest of my college which used Angular JS and Node.js. Apart from this I am also familiar with makefiles, PHP, Latex, git and github. In the course of 2 years I have studied Linear algebra, Discrete mathematics, Algorithms: Analysis and Design, Calculus, Object Oriented Programming Differential equations, Statistics, Introduction to computer programming, Operating Systems, Theory of computation and compiler design and Networks and communication.

Contributions to Sympy

(Open) Matrix Printing: Fix printing errors in Matrix Symbol and subs() #12272

Contributions to SymEngine

(Ready to merge) Print E as exp(1) as well. #1225

Project Overview

To achieve its aim of being the fastest CAS ever, it is very crucial for SymEngine to have a seamless interoperation between the different types in it, Thus the main goal of my project is to enhance the interoperability in SymEngine and create a faster implementation for the multivariate polynomial module. SymEngine currently has two univariate polynomial classes which were developed last year by Srajan Garg and Nishant Nikhil but it lacks the functionalities like gcd and lcm of polynomials and functions like set_coeff, is_zero etc. The two multivariate polynomial classes currently use the hashmaps with integer vectors to store the coefficient of the polynomials as a result, during operations on these classes, the vectors of all entries are updated to a common format which makes them slow.

I intend to amend these shortcoming of the engine by implementing these primary and stretch goals Primary goals:

  1. Implementing functions for interoperability and a coercion framework for the univariate and multivariate polynomial classes in SymEngine.
  2. Implementing hash maps from ordered maps for storing the multivariate polynomials to increase the speed of the operations on this class.
  3. Benchmarking and writing the tests for the new implementation of the polynomial class.
  4. Writing Piranha wrappers for the multivariate Polynomial class.
  5. Adding the missing functionalities of gcd, lcm and the convenience functions for the existing univariate polynomial class.
  6. Writing test for the above mentioned methods.
  7. Generating documentation for the above mentioned classes and functions.

If I finish with my primary goals in time I will try to implement these stretch goals:

  1. Implementing the Groebner basis for polynomials.
  2. Implementing the square free decomposition of polynomials.

Having basic functions like gcd and lcm would help us in implementing other features like the groebner base, square free decomposition functions, and having a fast polynomial module would help us in achieving a fast series module, and finally having a seamless interoperability between the different classes and types in SymEngine would ultimately add to the efficiency and speed of the system.

Why this project

I have chosen to work under this project because of my keen interest in mathematics. I was introduced Sympy in February 2017 and immediately I was fascinated by the system, on understanding the mechanisms that come in play behind this computer algebra system. Having used other CAS like Maple, I always wondered about the internal working of these systems, the data structures that are used to represent the mathematical structures and the algorithms for computing the result of mathematical problems, fed to them. This I was able to experience first hand in Sympy. I shifted to SymEngine from Sympy as I feel that working under the former would not only help me increase my mathematical knowledge by improving the core modules which are the foundations of any CAS, but also would allow me to see the real life use of various data structures and algorithms that I have studied during my lectures and while doing competitive programming. Overall the reasons for choosing this project is that it would be a great learning curve for me and I would have the opportunity of interacting with some of the most intelligent minds of the open source community.

Project Description

The coercion framework and interoperability with the rest of SymEngine

For the being a superfast, it is of utmost importance for the the different types in SymEngine to have a efficient and fast interoperability among themselves as this would lead to efficient and fast calculations and at the same time would ensure the use of proper data structures to represent the results of the calculations performed on the engine.

In the discussion on the SymEngine mailing list regarding the coercion framework to enable polynomial and series module work with the other types in SymEngine, Isuru Fernando mentioned that the polynomial class needs a method for converting symbolic expressions to polynomials, and also stated some rule that would form the basis of the coercion framework that would be working with the polynomial and series classes. I will follow these rules and develop a coercion framework in alignment with these rules according to which:

  • An operation between a symbolic and polynomial would return a polynomial as the result.
  • An operation between a symbolic and series would return a series as the result, and
  • An operation between a polynomial and series would return a series as the result.

For example, if an operation is performed between a Symbolic and Polynomial, the resulting expression must checked by the engine and represented in the form of a polynomial and not any other type. Example: mul(2, (x**2 + 3*x +4)) should be stored in UIntPoly class and not in an Basic class.

Previously there has been some work done on this by Isuru Fernando. In a pull request (#1058) initiated by him he has laid down the key requirements that need to be completed in order to make the polynomial and series classes work with each other. I would be continuing my work from where it was left by Isuru. Currently in SymEngine the mul() method(multiplication) has been incorporated with a coercion framework and the add function has the same functionality for series, but currently this functionality of add method has to be extended for polynomials also which I plan to do. Furthermore I also plan to incorporate the same coercion framework in the power function.

For extending the functionality of the add() method I plan to modify the addVisitor classes for also incorporating polynomial addition and further I will overload the add_impl method which is already present in SymEngine so that it works efficiently with the polynomial(both integer and expression) types also.

Function Prototype(for add_impl):
inline RCP<const Basic> add_impl(const UIntPoly &a, const UIntPoly &b)

For the Pow I intend on defining the respective visitor classes, and dispatch methods so that they can work with the other Symbolic types present in SymEngine.

Function Prototype(for pow):
pow(const UExprPoly &a, const Basic &b)

Apart from this I also plan to refactor the Number class and complete the unimplemented functions in the same and in other files for example: expr2poly() (in rings.cpp) throws a Not Implemented Error when called for types other than addition, also in addition it throws the same error for types other than integer.

Multivariate Class Improvements

Currently the multivariate class in SymEngine uses hashmaps with vectors to store polynomials. But this poses some serious challenges to the speed of operations performed on these objects as during the operations the vectors of these objects have to be combined to a common format which is basically the union of the symbol sets of the two operands. This operation is very costly since every time a monomial in the polynomial is to be searched we have to iterate through all the monomials, the elements of the vector containing the exponents of all the different variables and their corresponding coefficients.

To eliminate this inefficiency I plan to implement the multivariable polynomial (MIntPoly and MExprPoly) classes using a hashmap with ordered maps.

The exponents of the variables in the hashmap would be used to create the hash key for the hashing function and the the coefficients would be stored in an ordered map with the corresponding exponents of each variable as the keys for the map and the coefficient of the monomial as the value of the ordered map. In case of a hash collision we can use chaining to store the key value pair. For example: The polynomial P(x, y, z) = 3 + 5z + 7z3 + 11y + 9yz + 13xyz + 8x2z2 + 9x4 would be represented in the hash map as

This representation of multivariate polynomials would be very efficient as it would take O(1) access time in average case and O(R) access time in worst case, where R is the number of monomials in the given polynomial.

On reimplementation the class will have the following structure:

  • Constructor and other relevant methods which would be needed to be reimplemented so as to conform to the new architecture of the polynomial class.
  • Ordered Map: This map will store the exponents of the corresponding monomial term as its key and the coefficient of that monomial term as its value for each term of the polynomial.
  • Variables : It would contain the number of variables and the symbols associated with each variable, because these variables will be indexed there will be a specific order in the variable which is also followed in the keys of the ordered map.

The helper function which keeps the track of the terms that vanish while performing subtraction, division and derivatives also have to be reimplemented to be able to work seamlessly with the new architecture of the polynomial class

While performing any operations between two objects of the multivariate polynomial class the program would first check if the number of variables and the type of variables in both the polynomials are same. If the above results to be true then the corresponding terms would be searching via inputting the exponents in the hash function and finding the key value pair that matches the term, upon finding which the operation would be performed else the function will handle accordingly. After the finishing of the implementation I also plan to write the tests and the benchmarks for the polynomial class.

Univariate Class Improvements

GCD and LCM functions for polynomials

In algebra, the greatest common divisor of two polynomials is a polynomial, of the highest possible degree, that is a factor of both the two original polynomials. This concept is analogous to the greatest common divisor of two integers.

The most basic algorithm for finding the gcd of two polynomials is the Euclid’s algorithm, which makes use of the euclid’s division lemma.

    if deg(f) < deg(g):
      swap f, g
    if g = 0
      return f
    else
      return gcd(g, f mod g)

However, in the case of polynomial inputs this algorithm runs in O(n^2) complexity and it would be naive to use this algorithm as it would work very slowly for large inputs. This paper [1] discusses an heuristic algorithm for the calculation of polynomial gcd which instead of solving a complicated problem right from the start, first finds a related simpler problem and from the solution of the simpler problem tries to find the solution for the original problem. This algorithm evaluates the polynomial at specific points and consider the gcd of the results in the coefficient domain, it then reconstructs the gcd based on the values obtained above and finally checks if the result is the correct GCD. Since gcd calculations in the coefficient domain are cheaper, this can lead to a sizeable speed-up.

It is to be noted that there is no guarantee that the heuristic algorithm will always succeed in finding the gcd, thus as a failsafe we need another algorithm which might not be as fast but sure shot gives the gcd of the inputs. But if the heuristic algorithm works, it is usually faster. For this we can use the extended euclidean gcd algorithm with pseudo-division property. It uses pseudo division with remainder to extend euclidean algorithm to polynomial rings. This will have to carefully benchmarked to find the number of iterations that heuristic algorithm must go through before switching to the fallback Euclid’s algorithm.

LCM can be calculated easily once we have obtained the gcd by using the formula:

LCM(a, b) = (a*b)/GCD (a,b)

Function prototype:

RCP <const UIntPoly> gcd (const UIntPoly &a, const UIntPoly &a)

The Flint and Piranha wrapper for the gcd and lcm are already implemented in SymEngine.

Convenience functions

The current module of univariate polynomials( UIntPoly ) in SymEngine lack functions like set_coeff() for setting the coefficients of a monomial in the polynomial, is_zero(), is_one() etc. Although the later two functions are implemented for the univariate expression polynomial class, still their functionality needs to be extended for the univariate integer polynomial class.

Presently we have the following function in the UExprPoly class whose functionality needs to be extended for the UIntPoly class, which I am planning to implement:

  • is_zero()
  • is_one()
  • is_minus_one()
  • is_integer()
  • is_symbol()
  • is_mul()
  • is_pow()
  • is_poly()
  • eval()
    The class UIntPoly has eval_bit() method to find the value of polynomial for any power of two we can overload the eval() function to automatically call eval_bit() when the input is given as a power of 2.

is_zero()

Returns true if a polynomial is a zero polynomial.
Prototype:
bool UExprPoly::is_zero() const
Example:

In[]:cout<<r->is_zero()
Out[]: 0

is_one()

Returns true if the polynomial is equal to 1.
Prototype:
bool UExprPoly::is_one() const
Example:

In[]:cout<<r->is_one()
Out[]: 0

is_minus_one()

Returns true if the polynomial is equal to -1.
Prototype:
bool UExprPoly::is_minus_one() const
Example:

In[]:cout<<r->is_minus_one()
Out[]: 0

is_integer()

Returns true if the polynomial is just an integer.
Prototype:
bool UExprPoly::is_integer() const
Example:

In[]:cout<<r->is_integer()
Out[]: 0

is_symbol( )

Returns true if the polynomial is a symbol.
Prototype: bool UExprPoly::is_symbol() const
Example:

In[]:cout<<r->is_symbol()
Out[]: 1

is_mul()

Return true if the polynomial is the result of multiplication of two or more polynomials.
Prototype: bool UExprPoly::is_mul() const
Example:

In[]: RCP<const UExprPoly> r2 = uexpr_poly(x, {{7, 5}, {-1, 3}, {0, 1}, {4, 2}})
In[]: RCP<const UExprPoly> r3 = mul_upoly(*a, *b)
In[]:cout<<r->is_mul()
Out[]: 1

is_pow()

Returns true if the polynomial is an exponentiation.
Prototype:
bool UExprPoly::is_pow() const
Example:

In[]: RCP <const UExprPoly> a = pow(x, integer(2)) ;
In[]:  cout<<a->is_pow() 
Out[]: 1

is_poly()

Returns true if the given input is a polynomial.
Prototype:
bool UExprPoly::is_poly() const Example:

In[]: RCP<const UExprPoly> r = uexpr_poly(x, {{-2, 5}, {-1, 3}, {0, 1}, {1, 2}})
In[]:cout<<r->is_poly()
Out[]: 1

eval()

Evaluates the value of the given polynomial at a given point and outputs that value.
Prototype:
bool UExprPoly::eval(RCP <const Basic> &a) const
Example:

In[]: RCP<const UExprPoly> r = uexpr_poly(x, {{3, 5}, {2, 3}, {0, 1}, {1, 2}})
In[]:cout<<r->eval(2).get_basic->__str__()
Out[]: 57  

Multivariate bindings for Piranha

Once the reimplementation of the multivariate class is finished I will write the bindings for the multivariate polynomial in Piranha. For this I would need to further study the codebase for Piranha and the Piranha wrappers that have been implemented for the univariate polynomial classes, so as to get an understanding of writing the Piranha bindings. Currently SymEngine has Piranha bindings for univariate polynomials, but I intend to extend this to multivariate polynomials and also uses Piranha for it SeriesBase class.

Piranha main polynomial class piranha::polynomial also happens to be its class for representing multivariate polynomials.


  Cf  : The coefficient type. We may choose to use either
        piranha::integer, piranha::rational or SymEngine::integer_class
        depending on the polynomial to be constructed.

  Key : The monomial to be used
        piranha::monomial or piranha::kronecker_monomial

Piranha also has the implementation for multiplication, division and exponentiation of multivariate polynomials, it has overloaded operators for managing its classes also support in term of many convenience functions for its polynomial class.[3] some of which are:

Integrate()				//Supports indefinite integrals
find_cf(std::initializer_list< T > l)	//To find the coefficients of a term
degree()				//Return the degree of the polynomial
subs (const std::string &name, const T &x) //For substitutions in the polynomial
empty () const			//Check if the polynomial is non-zero

Once I’ve completed the above goals successfully and if time permits I will proceed forward with the implementation of the stretch goals

Groebner basis

A Gröbner basis is a particular kind of generating set of an ideal in a polynomial ring over a field K[x1, ..,xn]. A Gröbner basis allows many important properties of the ideal and the associated algebraic variety to be deduced easily, such as the dimension and the number of zeros when it is finite. This has been implemented in Sympy as groebner() . For implementing groebner basis I will be going to use the algorithm that was proposed by Sumith Kulal in his proposal for GSoC 2015[5] according to which for a given a set of polynomials G, one can check if G is a Groebner basis in a finite number of steps using the generalized division algorithm implemented.

Given two polynomials f and g, to test whether they form Grobner basis.
Find s_polynomial() of f and g.
Find remainder of s_polynomial with respect to [f, g] using reduced().
If zero, then f and g form groebner basis.
If not, then [f, g, s_polynomial(f, g)] form groebner basis which can be shown by taking s_polynomial pairwise and computing its remainder with the extended array and all turn out to be zero.
This can be extended for larger arrays.

Algorithm

Methods needed for the algorithm:
LM(): Returns the leading monomial.
LT(): Returns the leading term.
We then define the notion of s_polynomial. Function prototype:

Polynomial s_polynomial(const Polynomial a, const Polynomial b){
	return expand(lcm(LM(f), LM(g))*(1/LT(f)*f - 1/LT(g)*g));
}

Square free factorization

The square-free factorization of a polynomial a has the form a = D1, D22, . . ., Dnn where each Di has no repeated factors and for all i j: GCD(Di, Dj) = 1. Such a factorization may be used in partial fraction decomposition, simplification and wherever computing a complete factorization would be too expensive. It is also the first step in factoring a polynomial into irreducible factors.
It would consist of an unordered map defined from UIntPoly to int which will be returned(let this data structure be called dictionary for now)
For Example:

RCP<const UIntPoly> f = Poly(84*x**6 + 56*x**5 + 21*x**3 + 14*x**2);
dictionary d = sqf(f) ;
cout << d->__str__() ;

would print the output : {x**2 : 7, 4*x**3 + 1 : 1, 3*x + 2 : 1}

Commitment to GSoC 2017 and SymEngine

My semester breaks for summer vacations are from 25th May and go on until 12th July. During the break I have not other commitments and I can contribute around 45-50 hours per week or even more towards completing my proposal. I will be committed to GSoC throughout. I will be starting with the next semester in the third week of July and I wouldn't be involved in anything that could possibly be a hindrance for my project. I may slow down in mid August due to exams, to around 30-35 hours per week. I will try my level best pick up momentum again towards the end.

Timeline

Pre GSoC (Upto May 4th)

  • Although I already studied most of the polynomial module I would like to study the rest of the SymEngine source code to thoroughly understand how to above mentioned data structures and algorithms work.
  • Solve at least 9-10 issues regarding SymEngine to get acquainted with the coding style and the codebase.
  • Start discussion with mentors on design and implementation details of the project.

Community Bonding Period (5th May - 28th May)

  • Discuss with mentors on choice and implementation of algorithms.
  • Continue with the solving of issues
  • Develop the pseudo code of the major complex algorithms and data structure(for multivariate class reimplementation) and verify them so that there is no delay during the coding period.

Week 1 (29th May - 4th June)

  • Implement the gcd algorithm for the integral domain.
  • Implement the lcm method based on the gcd method and write the tests for these functions. Send in a PR

Week 2, 3 & 4 (5th June - 25th June)

  • Implement the following in the coercion framework:
    • Add functionality for polynomials
    • Defining the visitor classes and the dispatch methods for Pow
    • Refactor the code for Number class and implement the unimplemented functions.
    • Finish with the implementation of the other unimplemented functions(like expr2poly())
    • Write the tests and benchmarks for the above implemented methods.
  • Discuss with mentors and fix errors/bugs(if any). Send in a PR

Week 5, 6 & 7 (26th June - 16rd July)

  • Implement the multivariate class improvements.
  • Implement the hash function and the representation of monomials as maps also start with the modification of functions to conform to the new architecture.
  • Complete the modification of the functions and implement the benchmarks and test for the reimplemented polynomial class. Send in PR

Week 8 (17rd July - 23th July)

  • Complete if any unfinished work from previous weeks.
  • Discuss with mentors for any errors/bugs in the previous PR’s, solve them else, start with the execution for the convenience functions for the univariate polynomial class.
  • Start with the execution of the convenience functions by week 8. Send in PR

Week 9 (24th July - 30st July)

  • Complete the implementation of the methods is_zero(), is_one(), is_minus_one(), is_integer(), is_symbol() is_mul(), is_pow() is_poly() and eval(), and implement the test for these functions by week 9. Send in PR

Week 10 & 11 (31st July - 13th August)

  • Implement the Piranha wrappers by week 10.
  • Once finished with the wrappers start with implementation of the stretch goals by week 11. Send in PR.

Week 12 & 13 (14th August - 27th August )

  • Buffer week for getting the PR’s merged and implementing the unfinished details, and generating the documentation for the code generated so far.
  • Try to finish the stretch goals, get them merged and generate documentation for them. Send in PR

Post GSoC

  • First I would finish the unfinished stretch goals. Implement them, test them, benchmark them and get them merged into the SymEngine code.
  • One thing that SymEngine lacks is a good and self-sustaining documentation. Thus my primary aim after completion of gsoc would be to start the expansion of the existing documentation. This would be relatively easy for me at this stage because I would be very familiar with the codebase.

Note: I have organized the tasks according to the decreasing order of difficulty, I have ensured that the most important and time consuming tasks are finished first during my summer vacation when I will be on my optimum performance and later on time is left to finish off the easy tasks and focus on fixing bugs and errors(if any) and completing the documentation.

General Questions:

  • Have you participated in GSOC previously?
    No. This is the first time I’m participating in GSOC.

  • Are you submitting proposal to any other organisation except us?
    No. I am not submitting proposal to any other organisation in GSoC 2017.

References

[1] Computer Algebra in Particle Physics (Page 23 section 3.3)
[2] Storage of multivariate polynomials
[3] Piranha’s Documentation for polynomial class
[4] Sumith Kulal’s proposal for GSoC 2015
[5] Srajan Garg’s proposal for GSoC 2016
[6] Square free factorization of polynomial over multivariate fields
[7] En route to Polynomial
[8] Univariate Polynomials in SymEngine
[9] Functionalities of Univariable & Multivariable Polynomials and Series
[10] Discussion for the implementation of the coercion framework
[11] Previous work on coercion framework by Isuru Fernando

⚠️ **GitHub.com Fallback** ⚠️