GSoC 2013 Application Katja Sophie Hotz: Faster Algorithms for Polynomials over Algebraic Number Fields - sympy/sympy GitHub Wiki

Table of Contents

About me

My name is Katja Sophie Hotz and I am studying Technical Mathematics at the Vienna University of Technology in Austria. I am in the final year of my master's and have specialized in Mathematics in Computer Science.

  • Email: katja.sophie.hotz (at) NOSPAM.ac.at (replace NOSPAM with student.tuwien)
  • IRC: katjasophie (registered on freenode)
  • GitHub: katjasophie

Programming and mathematical skills

I acquired basic programming skills in C++, Java and Matlab at the university by taking an introductory course on programming and courses on algorithms, data structures, computer graphics and computer mathematics. My experience with Python is limited, but I am eager to learn more.

At university, I took courses on Linear Algebra (two semesters), Algebra (two semesters), Universal Algebra, Algebraic Number Theory and Error Correcting Codes, so I feel comfortable working with algebraic extension fields and polynomial rings.

I use git for versioning my master thesis, which is written using LaTeX and is about equidissectable polytopes. Other than that, I have submitted two patches to SymPy via github (see below). Summing up, I am confident that I know how to handle everyday tasks in git.

My main operating system has been Ubuntu for a few years now, so this is where I will do my coding. For Python programming I am currently using gedit, since it is the default editor and supports most of the features I need.

The project

The goal of this project is the implementation of faster greatest common divisor (gcd) and factorization algorithms for univariate and multivariate polynomials over algebraic number fields. This is important for symbolic integration of rational functions and simplification of expressions among other things.

I would like to start by briefly discussing the basic mathematics involved.

Let <math>K</math> be a field. A field <math>L \supseteq K</math> is called an extension of <math>K</math> and <math>K</math> a subfield of <math>L</math>. For a finite subset <math>V \subset L</math>, <math>K(V)</math> denotes the smallest subfield of <math>L</math> which contains <math>K</math> and <math>V</math>. If <math>V</math> only contains one element, then <math>K(\alpha)</math> is called a simple extension of <math>K</math>.

An element <math>\alpha \in L</math> is said to be algebraic over <math>K</math>, if there exists a nonzero polynomial <math>p(x) \in K[x]</math>, i.e. with coefficients in <math>K</math>, such that <math>p(\alpha) = 0</math>. If every element of <math>L</math> is algebraic over <math>K</math>, then <math>L</math> is called an algebraic extension of <math>K</math>. For example, the simple extension <math>\mathbb{Q}(\sqrt{2}) \subseteq \mathbb C</math> of <math>\mathbb Q</math> is algebraic, but <math>\mathbb{Q}(\pi) \subseteq \mathbb C</math> is not.

Now if we have a simple algebraic extension <math>K(\alpha)</math>, there exists a unique irreducible monic polynomial <math>m_\alpha(x) \in K[x] \setminus \{0\}</math> with <math>m_\alpha(\alpha) = 0</math>, the minimal polynomial of <math>\alpha</math> over <math>K</math>. It can be shown that the factor ring <math>K[x]/m_\alpha(x)</math> is isomorphic to <math>K(\alpha)</math>, where the isomorphism works by mapping <math>p(x)</math> to <math>p(\alpha)</math>. Therefore elements of <math>K(\alpha)</math> can be viewed as polynomials with degree less than <math>\operatorname{deg}(m_\alpha(x))</math>, where the arithmetic operations have to be carried out modulo <math>m_\alpha(x)</math>. This allows us to reduce calculations in <math>K(\alpha)</math> to calculations in <math>K</math>.

The primitive element theorem states that every finite algebraic extension <math>K(V)</math> over a field <math>K</math> with characteristic zero is generated by a single element <math>\alpha</math>. This means that, in theory, it suffices to be able to work with simple extensions. However, this is not always optimal for computational purposes.

Currently, SymPy provides basic arithmetics for algebraic extensions of <math>\mathbb Q</math>. Computing the gcd of two polynomials with coefficients in an algebraic extension of <math>\mathbb Q</math> is done by a generic gcd algorithm for polynomials over an arbitrary field, but this does not take advantage of the specific structure of an extension field. Factorization of such polynomials is done by Trager's algorithm. This works by constructing a polynomial of higher degree with coefficients in <math>\mathbb Q</math> and then factorizing it in <math>\mathbb{Q}[x]</math>. However, for multivariate polynomials this sometimes leads to extremely large polynomials, making this approach impractical; see [3] for an explicit example.

I propose to implement the modular gcd algorithm and the factorization algorithm for polynomials over algebraic number fields as described in [1,4] and [3].

The modular gcd algorithm works by choosing appropriate primes <math>p</math>, computing the gcd modulo <math>p</math> and using rational reconstruction to get the desired result. This is a generalization of Brown's modular gcd algorithm for integer polynomials, which is (for integer polynomials) potentially faster than the currently used subresultant method. Therefore I would start by implementing an improved version of Brown's algorithm for <math>\mathbb{Z}[x]</math> as described in [4]. This would be followed by adding support for the multivariate case. Here the polynomials are recursively evaluated at certain points to reduce the number of variables until it is a univariate problem. Next, I plan to work on the corresponding versions for <math>\mathbb Q</math>. The final step would be the implementation of the algorithm for polynomials over algebraic number fields.
Although some of the substeps may have to be repeated using different primes and evaluation points, the algorithms are not heuristic and will always return the correct result.

The factorization algorithm from [3] avoids the problem with multivariate polynomials described above by carrying out multiple univariate factorizations. Here techniques like Hensel lifting and sparse <math>p</math>-adic lifting are used to reconstruct the multivariate factorization. Since this is a very sophisticated algorithm I anticipate that the implementation will take a considerable amount of time.

Timeline

  • Week 1-2: Implement the modular gcd algorithm for <math>\mathbb{Z}[x]</math> and <math>\mathbb{Z}[x_1,\]</math>.
  • Week 3: Implement the modular gcd algorithm for <math>\mathbb{Q}[x]</math> and <math>\mathbb{Q}[x_1,\]</math>, building on the work from the first two weeks.
  • Week 4: Write tests and documentation, fix issues and integrate the algorithms in the current SymPy framework. Prepare and submit the first pull request.
  • Week 5-6: Implement the modular gcd algorithm for <math>\mathbb{Q}(\alpha)[x]</math> and <math>\mathbb{Q}(\alpha)[x_1,\]</math>.
  • Week 7: Write tests and documentation, fix issues and integrate the algorithms in the current SymPy framework. Prepare and submit the second pull request.
  • Week 8-11: Implement the factorization algorithm for <math>\mathbb{Q}(\alpha)[x_1,\]</math>.
  • Week 12: Write tests and documentation, fix issues and integrate the algorithm in the current SymPy framework. Prepare and submit the final pull request.
  • Week 13: Buffer week to compensate for possible delays.
Should I finish these tasks earlier than expected, I would work on implementing gcd and factorization algorithms for multiple algebraic extensions of <math>\mathbb Q</math> without resorting to calculating a primitive element; see [1]. Another possibility would be to generalize the algorithms to algebraic function fields; see [2] and [3].

Since I am not planning to go on any vacation, I will be free to work 40 hours per week from June to September.

Contributions to SymPy

I have so far submitted two patches. The first one fixes a bug with minpoly() and the second one implements the function hyper.rewrite(Sum). Both are already merged into master. In addition, I filed a bug regarding the combsimp() function.

References

  • [1] M. van Hoeij and M. Monagan, A modular GCD algorithm over number fields presented with multiple extensions, Proceedings of ISSAC 2002, pp. 109-116, ACM, 2002.
  • [2] M. van Hoeij and M. Monagan, Algorithms for Polynomial GCD Computation over Algebraic Function Fields, Proceedings of ISSAC 2004, pp. 297-304, ACM, 2004.
  • [3] S. M. M. Javadi and M. Monagan, On factorization of multivariate polynomials over algebraic number and function fields, Proceedings of ISSAC 2009, pp. 199-206, ACM, 2009.
  • [4] M. Monagan and A. Wittkopf, On the Design and Implementation of Brown’s Algorithm over the Integers and Number Fields, Proceedings of ISSAC 2000, pp. 225-233, ACM, 2000.
⚠️ **GitHub.com Fallback** ⚠️