GSoC 2016 Application Nishant Nikhil: Implementing Finite Fields and Set module in SymEngine - sympy/sympy GitHub Wiki

About Me

Username and Contact Information

Name : Nishant Nikhil

University : Indian Institute of Technology, Kharagpur

email : [email protected]

blog : www.nishnik.github.io

github username : nishnik

Time-zone :

		- UTC + 5:30 (India)
		- UTC + 1:00 (Germany)

##Personal Information

I am Nishant Nikhil, a second year undergraduate student at IIT Kharagpur, India majoring in Mathematics and Computing.

I use Ubuntu 15.04 on my workstation. I use Sublime Text for development. With the cool extensions it has, like direct access to git and automatic indentation. I have been programming for the previous four years and have used Turbo C++ as a compiler(with the feel of a geeky blue screen). I do competitive programming and am quite comfortable with C++ STL and a lot of algorithms. I was introduced to python, two years back. For the first time, I used it for scraping my institute's website to get subject wise marks distribution. It was cool and I dived a little deeper and got introduced to the book, "Violent Python". Now, I am a firm believer that if it takes more than 5 minutes to do a work, you should write a script for it.

I am a robotics enthusiast and have been part of FIRA's MiroSot bronze winning team, Kharagpur Robosoccer Students' Group (KRSSG) as an Artificial Intelligence team member. I have worked on Humanoid simulation and have implemented kicking and dynamic positioning using vornoi and delaunay triangulation.

Apart from it, I know web programming and have developed the back-end of Code-O-Soccer's website using PHP and MySQL and am quite comfortable in scraping any website with python, BeautifulSoup and regex coming to my rescue.

I am also familiar with git for version control and have successfully completed large projects, like creating an android app with a team of six people.

My educational background is sufficient for the project. I have taken the following mathematical courses: Discrete Mathematics, Linear Algebra, Numerical Methods, Probability and Statistics, ODE and PDE. programming courses: Programming and Data Structures, Design and Analysis of Algorithms. Furthermore, I am a crypto-enthusiast and have knowledge of finite fields (used in Rijndael(AES)).

#Contributions to SymEngine

  • (Merged) Bug fix in polynomial #873 It fixed a small bug in UnivariateIntPolynomial's function sub_poly.

  • (Open) Added KroneckerDelta derivatives #870 Implemented derivative of KroneckerDelta function.

  • (Merged) Typo Fix #860 It was a minor typo fix, I encountered while implementing sets.

  • (Merged) Added basic functionality for Intervals #846 I started this for intervals only, but now this PR has work on Set class, EmptySet class and Interval class.

  • (Open) Bug fix in pow of zero #840 While solving #824, it occurred to me that for 0**(negative number) we are returning 0, but it should be an exception throw.

  • (Open) Working whitespace check #838 It was to fix the trailing whitespace of the code. I suggested using git diff HEAD^1 --check, but as this returns error only if the trailing whitespace occurred in previous commit, which might not be the case always. So, I am still searching for a better way to handle this.

  • (Merged) Added exception for division by zero #824 This arose from #821, where tan(PI/2) returned 0. So, added exception throw.

  • (Merged) Added derivative of beta function and tests #818 Doing this, I got more acquainted with the code base.

  • (Merged) Bug fix in rational.cpp #805 This PR introduced throwing exception when zeroth root of a Number is queried and while reading the code, I also got a bug where, in Rational, we were checking for denominator equal to zero, which is not possible.

  • (Closed) Bug fix in rational #804 Somehow, I mixed up previous commits of my local branch and pushed the code, so had to close this, so superseded by #805

  • (Merged) Trigonometric table tests #803 This introduced me to the use of catch.

  • (Closed) Added tests for cos_table #801 Added tests for cos_table, then was asked to add tests for all the trigonometric tables. The use of tables made me realise how much concerned we are for speed.

  • (Closed) Added cos_table #800 Actually, I misunderstood #672, But this led to #801 and #803 and also made me realise how to approach a problem.

Contribution to SymPy

  • (Merged) bug fix in solveset #10709 Fixed a bug in solveset, got acquainted with solveset and solved #10671

#Project Proposal Information

Proposal Abstract

Polynomial factorization is one of the fundamental tools of the computer algebra systems. And in symbolic mathematics, it is one of the basic requirement over which other algorithms can be implemented. Currently, SymEngine has the implementation of Univariate Polynomial class, which provides us the basic functionality to add, multiply and subtract two polynomials.

Now, comes the problem of factoring the polynomials and then other modular arithmetics and, gcd and lcm.

Therefore, we need to have a module, which can convert the coefficients of polynomial into a finite field (or Galois Field), do basic operations of add, mul, div etc. on them and then the algorithms for factorization like:

After doing factorization, we will be able to find roots, which in turn will help us to solve univariate polynomial equations. Then for returning solution, Set are the simplest and most intuitive way. Like in real pen-paper exam, we are supposed to return solutions in form of sets. So, here also, following the way sympy has been adapting to solveset, we should develop a basic infrastructure, over which solveset can be developed.

Some Definitions

  • Finite Field: A finite field or Galois field is a field with a finite order (number of elements). The order of a finite field is always a prime or a power of prime.
  • Monic Polynomial: In algebra, a monic polynomial is a univariate polynomial in which the leading coefficient (the nonzero coefficient of highest degree) is equal to 1.
  • Square Free Polynomial: In mathematics, a square-free polynomial is a polynomial defined over a field that is not a multiple of the square of a non unit factor. In general terms, they can be thought as a polynomial with no repeated roots.

Galois Field

GF(P) is simply the ring of integers modulo p. We can perform operations (Addition, Subtraction, Multiplication) using the usual operations on integers followed by reduction modulo p. For example, 4*5 in GF(11) will be 20%11 i.e. 9. Finite field with p^n elements is denoted GF(p^n). Elements of GF(p^n) can be represented as polynomials of degree strictly less than n over GF(p). Operations are then performed modulo R where R is an irreducible polynomial of degree n over GF(p). The addition of two polynomials P and Q is done as usual, then compute remainder modulo R.

Help from here

Factorization

For factorization, firstly we will be converting coefficients of a polynomial to a finite field of suitable domain, do factorization using the specific algorithm, then perform hensel lifting after which we can return the solution.

By, suitable domain, we mean an integral domain, i.e. A ring is said to be an integral domain if and only if there are no non-trivial zero divisors. That is, if a,b ∈ R, such that ab = R. Then either a = 0 or b = 0. Like ring of integer modulo 6 is not an integral domain, as 6 = 2*3. Actually for non-integral domain, factorization is not unique. Example in modulo 8 we have:

X**2 - 1 = (X - 1)(X + 1) = (X - 3)(X + 3)

Now, factorization is the first step to a Solver module. And for factorization, firstly we need to make the polynomial square free over the field.

If two or more factors of a polynomial are identical to each other, then the polynomial is a multiple of the square of this factor. In the case of univariate polynomials, this results in multiple roots. In this case, then the multiple factor is also a factor of the polynomial's derivative. Yun's algorithm exploits this to factorize efficiently the polynomial into factors that are not multiple of a square and are therefore called square-free. To factorize the initial polynomial, it suffices to factorize each square-free factor. Square-free factorization is therefore the first step in most polynomial factorization algorithms.

Then we need to perform Distinct Degree Factorization and Equal Degree Factorization.
Cantor–Zassenhaus algorithm and Berlekamp's algorithm are the most used algorithms in this field.

In short, a factorization algorithm will have three stages:

  • Square free factorization
  • Distinct Degree factorization
  • Equal Degree factorization

Factorization in finite field

Firstly, we make the polynomial square free over the field, by repeatedly performing:

do {
g(x) = gcd(f(x), f'(x))
f(x) = f(x)/g(x)
} while (g(x) != 1)

It will mean that f(x) now has only one instance of its factors, i.e. square free.

Here, we have exploited the fact that if there are multiple roots, both f(x) and f'(x) will have the instance of that root, and hence it will occur in there gcd. Dividing by gcd will make the polynomial square free.

Furthermore, we have:

Theorem

g(X) = gcd(f, X^p − X)
Then if α was any root of f, then clearly since α also satisfies (X^p − X), α will be a root of g as well. And more importantly, since (X^p − X) has no repeated roots,g will retain the same property as well. Having removed all degree 1 factors, we can all extract degree 2 factors by taking the gcd with ((X^p)^2 − X). And repeat so on. But this will be a naive implementation as computing gcd(f, X^p^i − X) for large p will take about O(p) time. We improve it by looking at gcd algorithm and as first step is to compute (X^p^i − X) mod f, we break it to: (X^p^i mod fX mod f), and do it by repeated squaring, lowering the bound to O(log(p)).

So, now we have Distinct degree factors of f(x).

Then we need to find, the factorization of these Distinct degree factors (i.e. Equal Degree Factorization).Using a bit of Chinese Remainder Theorem, this reduces to finding Zero-Divisors in Fp(x)/(f(X)).

Cantor Zassenhaus's is an probabilistic method for factoring polynomials. It approaches the problem of finding zero divisors by taking some random polynomial, a(X) of degree less than n. Checks if we are lucky by computing gcd(a,f) and checking if it is 1. Else, compute a(X)^(p^d−1)/2 mod f(X) using repeated squaring. Now according to its mapping being -1 or 1 we compute a+1 or a-1 in the mapping mod f(X). And that way, repeatedly doing for all the distinct degrees, we get the factorization.

Berlekamp's Algorithm reduces the factorization of a polynomial of degree m over GF(q) to the solution of about m(q − 1)/q linear equations in as many unknowns over GF(q). In Berlekamp's Algorithm, instead of a probabilistic approach, we follow a deterministic approach and construct the matrix of transformation B, corresponding to map a <-> a^p - a (Frobenius Map). B is the Berlekamp's subalgebra. Then we use gaussian elimination to find a basis for the Berlekamp's subalgebra. And using Chinese Remainder Theorem, we find out the factors. It also needs gcd computation.

Factorization of polynomials

If f(x) is a univariate polynomial over the integers, assumed to be square-free, one starts by computing a bound B such that any factor g(x) will have coefficients of absolute value bounded by B. This way, if m is an integer larger than 2B, and if g(x) is known modulo m, then g(x) can be reconstructed from its image mod m.

The factorization proceeds as follows. First, choose a prime number p such that the image of f(x) mod p remains square-free, and of the same degree as f(x). Then factor f(x) mod p. This produces integer polynomials f_1(x),...,f_r(x) whose product matches f(x) mod p. Next, apply Hensel lifting, this updates the fi(x) in such a way that now their product matches f(x) mod p^a, where a is chosen in such a way that p^a is larger than 2B.

Hensel Lifting

Hensel's Lemma:

Hensel's lemma states that if a polynomial equation has a simple root modulo a prime number p, then this root corresponds to a unique root of the same equation modulo any higher power of p, which can be found by iteratively "lifting" the solution modulo successive powers of p.

Using the lemma we lift the root of a polynomial from GF(p^k) to GF(p^(k+1)). Actually a root s in GF(p^(k+1)), also satisfies the polynomial equation in GF(p^k). That means: r ≡ s mod p^k Now as the new root s is congruent to r, we have: derivative The lifting will happen peacefully iff r is a simple root. But if we have a root mod p^k at which the derivative mod p equals 0, then there is not a unique lifting of a root mod p^k to a root mod p^(k+1): either there is no lifting to a root mod p^(k+1) or there are multiple choices.

To illustrate the algorithm, I take help of an example: Suppose we have to find, square root of 2 in GF(7), ie we have to solve x^2 -2 = 0, We get a solution at r = 3. Then, f(r1) And 7 mod 7 equals 0. img2
img3
img4
that is,
img5
img6
img7

That is, we get a root at x = 10 for GF(7^2), i.e. 10^2 - 2 mod 49 = 98 mod 49 = 0.

I will be implementing these algorithms for SymEngine’s own polynomial class. For flint polynomial class which already has inbuilt factorization module and finite field implementation, there will be wrapper to call its own inbuilt functions, but for piranha polynomials until piranha has its own factorization module, I will write a wrapper which will convert piranha polynomials’ coefficients’ to vec_int so that they can also be processed the same way.

Implementing Sets:

From sympy's experience, it is quite clear that to maintain consistency, sets should be used for returning solutions, parallel to what we do in mathematics: solving equations means returning the set for which the equation holds true. Now as sets can be of different types. Based on number of elements we can classify them as:

  • Empty set.
  • Finite set.
  • Infinite set.

Apart from it we also have:

  • Universal Set
  • Interval
  • Product Set

Apart from it we can classify sets based on their domain, like:

  • Natural Numbers.
  • Complex Numbers.
  • Integers.
  • Rational Numbers.
  • Real Numbers.
  • Irrationals.

So, we got to implement all of them. I have already developed a Set class, from which all the other types will be derived. The Set class had been derived from Basic class. Set will have virtual member functions as:

  • _intersection
  • _union
  • is_subset
  • is_proper_subset
  • is_super_set
  • is_proper_super_set
  • is_disjoint
  • contains
  • complement
  • power_set (This list can be modified as per needs).

For implementing type of sets which are kind of constant like Empty Set, Natural Numbers, Complex Numbers etc. I propose to use Singleton classes. Then on the lines of SymPy, which has a separate Intersection and Union class derived from Set, I will implement those classes.

On the lines SymPy, which has already a robust sets module I have already implemented basic architecture of Set(PR #846), and have got a basic idea of implementation.

I plan to implement all this in two independent phases:

  • Implementation of Galois field and related algorithms of factorization.
  • Implementation of Set module.

Previous implementation in SymPy

Sympy has a robust polynomial and sets module. Polynomial module has been implementted at sympy.polys and Galois field has been implemented at galoistools.py while integer factorization have been implemented at factortools.py. It has been implemented in the duration of 2009-2011, Mateusz Paprocki being the chief contributor. There, factorization has been implemented for dense polynomials. And the following algorithms have been implemented for factorization (for both univariate and multivariate polynomials):

  • Trial Division
  • Zassenhaus's Algorithm
  • Cyclotomic Polynomial factorization
  • Wang's Algorithm

And, the following finite field related algorithm have been implemented:

  • Chinese Remainder Theorem
  • Zassenhaus's Algorithm
  • Euclidean Algorithm for gcd
  • Extended Euclidean Algorithm
  • Ben-Or's irreducibility test
  • Rabin's irreducibility test
  • Square free decomposition
  • Berlekamp's Algorithm
  • Kaltofen-Shoup's Algorithm
  • Linear Congruence
  • Differentiation, Integration, Addition, Subtraction, Multiplication etc

Sets module have been implemented at sympy.sets and following classes have been implemented:

  • Product Set
  • Interval
  • Union
  • Intersection
  • Complement
  • EmptySet
  • Universal Set
  • Finite Set
  • Symmetric Difference
  • Image set
  • Condition Set
  • Naturals
  • Integers
  • Reals
  • Range
  • Complexes

Timeline (tentative)

I have my summer vacations starting from 30th April. I am already well-acquainted with the community and will be productive from the day1. In between, I have no other commitments and will be easily able to give 40 - 50 hours a week, but in the 6th week, I have to travel to Germany for 4 days(30th June to July 4), and I will be able to give only 30 hours in that week, but will make up for it by working more after July 4. Furthermore, I am a crypto enthusiast and this project will be intuitive for me to work on, and being a mathematics student, it will be easy for me to implement this.

Community Bonding period (23rd April - 22nd May)

I have my summer vacation during a great part of this period. I will implement all the domains(Real Numbers, Complex Numbers etc.). They will all be Singleton class like EmptySet. Before actual program starts, I aim to send a PR implementing all the Singleton Classes. And along with it I will implement division, gcd and lcm for univariate polynomials.

Week 1 to mid-term evaluation

  • Week 1
    It will go in implementing the wrappers for Galois Field

     vec_int intpoly_to_vec(RCP<const UnivariateIntPolynomial> &a);
    

    And implementations like:

     vec_int gf_from_vec(const vec_int &a, int p)
     {
     	vec_int b;
     	for (auto i : a) {
     		b.push_back(i % p);
     	}
     	return b;
     }
    

    and implementing basic operation in field, like:

    • add
     gf_add(7x, 5x, 11)
       x 
    
    • subtract
     gf_sub(x**2,9x**2, 11)
       3x**2
    
    • divide
     gf_div(x**3 + x + 1, x**2 + x, 11)
       [x + 10, 2x + 1] // It will return quotient, remainder
    
    • multiply
     gf_mul(3x**2 + 2x + 4, 2x**2 + 2x + 2, 5)
       x**4 + 3x**2 + 2x + 3
    
    • negate
     gf_neg(3x**2 + 2x + 4, 5)
       2x**2 + 3x + 1
    
    • pow
     gf_pow(x**2 + x + 1, 3, 5)
       x**6 + 3x**5 + x**4 + 2x**3 + x**2 + 3x + 1
    

    And other basic functions.

  • Week 2
    It will go in implementing a little advanced algorithms related to finite field like:

    • finding out irreducible polynomial from a field of degree n
     gf_random(4, 5) // deg, characteristic
       x**4 + x**3 + x**2 + 2x
    
    • finding gcd
     gf_gcd(4x**2 + 2x + 4, 2x**2 + 3, 5)
       x + 4
    
    • finding lcm
    • extended euclid algorithm
    • computing monic polynomial (Polynomial whose leading coeffecient is 1)
    • differentiate
    • evaluate
    • composition
  • Week 3
    Implementing a little more advanced algorithm, like:

    • checking if a polynomial is square free in a field
    • returning square free polynomial from a given polynomial (Yun's Algorithm)
    • all the other algorithms related to square free polynomials.
  • Week 4
    Implementing Berlekamp algorithm

  • Week 5 - 6
    Implementing Cantor Zassenhaus's algorithm and deciding benchmark for the Berlekamp and Cantor Zassenhaus's as Zassenhaus's can tend to exponential time as well.

  • Week 7
    Do we have to port all the algorithms of factorisation from SymPy Gathen-Shoup

  • Week 8 and 9
    Applying Hensel Lifting and applying Cantor Zassenhaus's to Polynomials. Make sure that it works for flint polynomials and piranha polynomials also.

  • Week 10
    Implementing finite sets and universal set.
    Universal set will be a Singleton class like Empty Set, and Finite Set will be a Set with a private member variable which is vector of RCP<const Number>.

  • Week 11
    Implementing Union class and Intersection Class.
    Union class will be able to take union of any two set variables, of any derived type. And furthermore, will help in special cases, when there can't be a single variable representing the union. Like:
    union(interval(0, 2), interval(1, 3)) is interval(0, 3)
    But
    union(interval(0, 2), interval(3, 5)) is interval(0, 2) U interval(3, 5)
    This will be an instance of union object. It will have a vector of RCP<const Set> as a private member. Similar is the case for intersection.

  • Week 12
    Implementing Complement class, Symmetric Difference class and Range.
    Range will have three private members, start, end and step and will be used to represent
    [2, 4, 6, 8, 10] is range(2, 10, 2) Prototype being range(start, end, step) .
    Symmetric Difference and Complement class will have RCP<const Union> and RCP<const Intersection> as private members.

  • Week 13
    Buffer Time.

References

⚠️ **GitHub.com Fallback** ⚠️