GSoC 2022 : Improving Series Expansions and Limit Computations Report by Anutosh Bhat - sympy/sympy GitHub Wiki

This report summarizes my work during GSoC 2022, on the project Improving Series Expansions and Limit Computations with SymPy. The links to the PRs are in chronological order.

About Me

I am Anutosh Bhat, a fourth-year undergraduate student pursuing a major in Data Science from the Indian Institute of Technology (IIT) Madras. I am open source enthusiast always looking forward to contribute to project involving any scientific/mathematical computation.

Project Synopsis

The primary goal of the project could be divided into 4 - 5 parts.

Refactoring/Implementing leading term & series method for functions involving branch cuts/ branch points such as pow, log, inverse trig and inverse hyperbolic.
Adding leading term and series support to elementary and special functions mainly revolving around bessel, gamma and error type functions.
Fixing limit errors through gruntz algorithm and improving performance of the algorithm overall by refactoring blocks in some functions like mrv_leadterm.
Addressing miscellaneous bugs and implementation ideas related to the series module.
Refactoring written code for order.py and extending functionality of the order term. (Was proposed but the goal couldn't be achieved due to time constraints and hence this section has been shifted to Future Work,)

Pull Requests

Merged

#23492: Refactored leading term method for log Function: This Pr was essentially the foundation for the Phase 1 of my project. As the limit calculations for functions involving logarithms weren't performing well at points lying on branch cuts, the _eval_as_leading_term helper method for the log function needed to be refactored/re-written. This would eventually be helping us with addressing series and limits of other functions like hyperbolic and inverse hyperbolics that can be expressed using the log function. My mentors and I engineered a 5 step algorithm that was able to tackle all cases from basics to the complex ones. The basis of the algorithm is that, in leading term methods the parameter x is expected to tend to 0 along the half-ray x = cdir*t where t is real and positive .Hence we can define a positive dummy variable t and substitute cdir*t for x in the argument of log: z = arg0.subs(x, cdir*t). The algorithm is further explained in the Pr description. Apart form this, some notable things/issues which were fixed are -

Fixed buggy leading term method for Ei, Chi and bessely functions.
Implemented branch cut handling for leading term methods of inverse trigonometric functions.
#21227 : Series expansions for nested logarithms involving the logx parameter were fixed

def test_issue_21227():
    f = log(-log(x))

    assert f.nseries(x, logx=y) == log(-y)
    assert f.nseries(x, logx=-x) == log(x)

    f = log(log(x))

    assert f.nseries(x, logx=y) == log(y)
    assert f.nseries(x, logx=-x) == log(-x)
    assert f.nseries(x, logx=x) == log(x)

    f = log(log(log(1/x)))

    assert f.nseries(x, logx=y) == log(log(-y))
    assert f.nseries(x, logx=-y) == log(log(y))
    assert f.nseries(x, logx=x) == log(log(-x))
    assert f.nseries(x, logx=-x) == log(log(x))

Currently no new issues for limit calculations of functions involving logarithms have been reported after the new algorithm was introduced . If any issues will be discovered , those would be looked into. Overall 5 issues were closed through this pull request.

#23715: Implemented/Fixed a few series methods for inverse trig/hyperbolic functions: This PR fixes a long-standing issue (#3663) from 2008 where SymPy isn't able to calculate correct series expansions for inverse hyperbolic functions. Hence it wasn't unknown that SymPy doesn't perform well while calculating limits and series expansions for inverse trig/hyperbolic functions at points lying on branch cuts and otherwise. To tackle these changes, the taylor_term methods for all 12 inverse trig/hyperbolic functions was looked into where the buggy ones were fixed and in some cases a new method was introduced. After this, the _eval_as_leading_term methods for all 6 inverse hyperbolic functions were re-written and specific code blocks in _eval_as_leading_term methods for all 6 inverse trig functions were refactored. Once these methods were correctly framed, heavy testing was added catering to points from all over the axis (branch points, points lying on branch cuts and general points). We also made sure to test leading terms being calculated from different directions of approach (cdir being real or imaginary). Apart from this, some other notable changes introduced in this Pr are -

#23752 : Fixed branch cut handling for pow._eval_as_leading_term and pow._eval_nseries method where new direction (ndir) being calculated with respect to base of the pow expression is real.

def test_issue_23752():
    expr1 = sqrt(-I*x**2 + x - 3)
    expr2 = sqrt(-I*x**2 + I*x - 3)
    assert limit(expr1, x, 0, '+') == -sqrt(3)*I
    assert limit(expr1, x, 0, '-') == -sqrt(3)*I
    assert limit(expr2, x, 0, '+') == sqrt(3)*I
    assert limit(expr2, x, 0, '-') == -sqrt(3)*I

def test_nseries():
    x = Symbol('x')
    # test issue 23752
    assert sqrt(-I*x**2 + x - 3)._eval_nseries(x, 4, None, 1) == -sqrt(3)*I + sqrt(3)*I*x/6 - \
    sqrt(3)*I*x**2*(-S(1)/72 + I/6) - sqrt(3)*I*x**3*(-S(1)/432 + I/36) + O(x**4)
    assert sqrt(-I*x**2 + x - 3)._eval_nseries(x, 4, None, -1) == -sqrt(3)*I + sqrt(3)*I*x/6 - \
    sqrt(3)*I*x**2*(-S(1)/72 + I/6) - sqrt(3)*I*x**3*(-S(1)/432 + I/36) + O(x**4)
    assert cbrt(-I*x**2 + x - 3)._eval_nseries(x, 4, None, 1) == -(-1)**(S(2)/3)*3**(S(1)/3) + \
    (-1)**(S(2)/3)*3**(S(1)/3)*x/9 - (-1)**(S(2)/3)*3**(S(1)/3)*x**2*(-S(1)/81 + I/9) - \
    (-1)**(S(2)/3)*3**(S(1)/3)*x**3*(-S(5)/2187 + 2*I/81) + O(x**4)
    assert cbrt(-I*x**2 + x - 3)._eval_nseries(x, 4, None, -1) == -(-1)**(S(2)/3)*3**(S(1)/3) + \
    (-1)**(S(2)/3)*3**(S(1)/3)*x/9 - (-1)**(S(2)/3)*3**(S(1)/3)*x**2*(-S(1)/81 + I/9) - \
    (-1)**(S(2)/3)*3**(S(1)/3)*x**3*(-S(5)/2187 + 2*I/81) + O(x**4)

Currently no new issues for faulty leading terms or taylor terms for inverse trig/hyperbolic functions have been reported after the new methods were introduced . If any issues will be discovered , those would be looked into. Overall 5 issues were closed through this pull request.

#23798: Fixing/Implementing nseries methods for inverse hyperbolic/trig and log functions: The last Pr as we know was focused on fixing/implementing the taylor_term and _eval_as_leading_term methods, hence by obvious choice the following Pr should work towards correcting _eval_nseries for these functions. Hence _eval_nseries methods for all 6 inverse hyperbolic functions was implemented and specific code blocks (those responsible for handling points lying on branch cuts) from the method were refactored in case of all 6 inverse trigonometric functions. As these functions can be expressed in terms of logarithms, the Pr also had to make sure that log._eval_nseries() was working perfectly for all cases, as we would be using the self.rewrite(log)._eval_nseries() call for some specific cases in these functions. Hence log._eval_nseries() method was refactored too by introducing a positive dummy variable - t and replacing x with cdir*t . A block based on the Heaviside function was added and we would be subtracting a factor of 2*I*S.Pi once a set of conditions are met. Good amount of testing was added for the same.

def test_log_nseries():
    p = Symbol('p')
    assert log(1/x)._eval_nseries(x, 4, logx=-p, cdir=1) == p
    assert log(1/x)._eval_nseries(x, 4, logx=-p, cdir=-1) == p + 2*I*pi
    assert log(2*x + (3 - I)*x**2)._eval_nseries(x, 3, None, 1) == log(2) + log(x) + \
    x*(S(3)/2 - I/2) + x**2*(-1 + 3*I/4) + O(x**3)
    assert log(2*x + (3 - I)*x**2)._eval_nseries(x, 3, None, -1) == -2*I*pi + log(2) + \
    log(x) - x*(-S(3)/2 + I/2) + x**2*(-1 + 3*I/4) + O(x**3)
    assert log(-2*x + (3 - I)*x**2)._eval_nseries(x, 3, None, 1) == -I*pi + log(2) + log(x) + \
    x*(-S(3)/2 + I/2) + x**2*(-1 + 3*I/4) + O(x**3)
    assert log(-2*x + (3 - I)*x**2)._eval_nseries(x, 3, None, -1) == -I*pi + log(2) + log(x) - \
    x*(S(3)/2 - I/2) + x**2*(-1 + 3*I/4) + O(x**3)
    assert log(sqrt(-I*x**2 - 3)*sqrt(-I*x**2 - 1) - 2)._eval_nseries(x, 3, None, 1) == -I*pi + \
    log(sqrt(3) + 2) + I*x**2*(-2 + 4*sqrt(3)/3) + O(x**3)
    assert log(-1/(1 - x))._eval_nseries(x, 3, None, 1) == I*pi + x + x**2/2 + O(x**3)
    assert log(-1/(1 - x))._eval_nseries(x, 3, None, -1) == I*pi + x + x**2/2 + O(x**3)

Apart from this, some other notable changes introduced in this Pr are -

Refactored series method in expr.py to use the cdir parameter effectively. As a result of this change SymPy can now cater to a more vaster cases of interest.

def test_log_series():
    # Note Series at infinities other than oo/-oo were introduced as a part of
    # pull request 23798. Refer https://github.com/sympy/sympy/pull/23798 for
    # more information.
    expr1 = log(1 + x)
    expr2 = log(x + sqrt(x**2 + 1))

    assert expr1.series(x, x0=I*oo, n=4) == 1/(3*x**3) - 1/(2*x**2) + 1/x + \
    I*pi/2 - log(I/x) + O(x**(-4), (x, oo*I))
    assert expr1.series(x, x0=-I*oo, n=4) == 1/(3*x**3) - 1/(2*x**2) + 1/x - \
    I*pi/2 - log(-I/x) + O(x**(-4), (x, -oo*I))
    assert expr2.series(x, x0=I*oo, n=4) == 1/(4*x**2) + I*pi/2 + log(2) - \
    log(I/x) + O(x**(-4), (x, oo*I))
    assert expr2.series(x, x0=-I*oo, n=4) == -1/(4*x**2) - I*pi/2 - log(2) + \
    log(-I/x) + O(x**(-4), (x, -oo*I))

I had encountered an issue based on asymptotic series expansions(#23843), when I was doing some basic testing after the Pr was merged. This issue will be looked into soon and other than this everything with respect to series expansions and limit calculations for inverse trig/hyperbolic functions looks good. Overall 1 issue was closed through this pull request.

#23844: Implemented/Fixed a few series methods for bessel functions: This PR was meant to fix/implement _eval_as_leading_term and _eval_nseries methods for bessel and modified bessel functions but ended up fixing a lot more than what was expected. Good amount of testing was added for limit of all 4 bessel functions at both real and imaginary infinities. Apart from this ,some notable changes introduced were

Refactored mrv_leadterm function of gruntz.py to make use of the leadterm method rather than the calculate_series method . This refactoring essentially deserved another pull request but a couple of issues based on bessel functions like #19154 would require the gruntz refactoring if we would like to fix that, hence code for that was added . This eventually also led to deprecation of calculate_series function and the compute_leading_term function from the codebase.
Fixed limits/leading term methods for Chi, Shi,polygamma and the harmonic function.
Series support was added for pow._eval_nseries() where the exponent is variable and the number of terms in the series expansion can be expressed in terms of the exponent.

>>> n, k = symbols('n, k', positive=True)
>>> ((1 + 1/n)**k)._eval_nseries(n, -k + 3, None)
n**(-k) + k*n**(2 - k)*(k - 1)/2 + k*n**(1 - k) + O(n**(3 - k))
>>> ((1 + 1/n)**k)._eval_nseries(n, 4, None)
n**(-k) + k*n**(3 - k)*(k - 2)*(k - 1)/6 + k*n**(2 - k)*(k - 1)/2 + k*n**(1 - k) + O(n**4)

All limit and series issues based on bessel functions have now been addressed from the issues page . If any issues turn up in future, those would be looked into. Overall 6 issues were closed through this pull request.

Work in Progress

Ongoing

#24020 : Impelementing leading term and nseries methods for frac function
#24027 : Implementing some leading term and series methods for gamma functions ( uppergamma, lowergamma and loggamma) and error functions (expint, Si, Shi and Chi).

To be done

Fix Piecewise._eval_as_leading_term() . SymPy's limit handling for Piecewise functions is not up to the mark as of now (at points where the piece changes or at points of discontinuity). Some related issues which need to be addressed are the following : #23836, #23299, #6171, #23299
Implementing leading term and series methods for the following functions - zeta, lerchphi, polygamma, beta, elliptical, inverse error functions like erfcinv, erfinv. There are issues scattered on SymPy's issues page which demands for implementation of these methods, a recent one being #23635 . Also aseries support should also be added for gamma, factorial and bessel functions.

Issues Opened

Future Work

A lot of special functions now have an _eval_as_leading_term and _eval_nseries methods correctly implemented, but a few remain. Support for the remaining few should be added and also asymptotic expansions through _eval_aseries could be added if they exist for the respective function.
Improving documentation throughout the series/limits module - SymPy's series module makes use of lot of helper methods like taylor_term, _eval_nseries, _eval_lseries, _eval_aseries and _eval_as_leading_term but there isn't any clear guideline or documentation regarding what each of these do or the significance of these methods. We might also need to highlight the important constraints to keep in mind while implementing any of these methods for a particular function from the functions module. These documentation changes are being tracked in #23625
Refactor Order code and order.py to handle more cases. This was proposed in my proposal but couldn't be acheived due to time constraints. Some notable issues which should be addressed are the following
- #22836 - Possible improvements for Order of expressions involving factorials.
- #22247 - Refactoring Order for Multivariate expressions and improving individual if-else blocks to handle specific cases.
- #19120, #10290, #7872 - Issues related to _eval_subs method in order.py.
- #22590 - Issues related to convergence tests (particularly Dirichlet and Raabe test) and order term.
Extending ring_series module.

Conclusion

I am quite impressed with the current state of the series and limits module right now . A good number of issues which were pending since a long time have now been resolved. Earlier, the gruntz algorithm was responsible for most of the limit computations for functions of type pow, log, inverse trig and inverse hyperbolic at points lying on branch cuts. The algorithm being somewhat recursive was slow in many cases and as the calculate_series method was fragile and buggy, the algorithm would often return wrong results . Now that the _eval_as_leading_term and _eval_nseries methods have been refactored or completely re-written in most cases we not only have more control on the correctness of the results being returned but also on the speed behind these computations. The calculate_series method also stands deprecated as of SymPy version 1.12. As the series module reaches a good extent of development, extending the faster ring_series module should be the next target for development.

I had been hearing amazing things about GSoC since the past 2 - 3 years and my GSoC tenure exactly lived upto its expectation. I had an exceptional experience past summer working for SymPy. Apart from improving my coding skills, I got a gist of how documentation and testing driven software development works which would surely be of great help when I move to the industry in the upcoming years. I am grateful to SymPy for this experience. I am highly grateful to my mentors, Kalevi Suominen and Sidharth Mundhra for always being there whenever and wherever I was stuck and have guided me in every possible way. I would also like to thank the org admins Aaron Meurer and Oscar Benjamin for being readily available throughout the Summer and pitching in for code reviews whenever they had time.

I plan to keep contributing to SymPy, trying to finish the work proposed in the Future Work after addressing the issues pointed out in the Work in Progress section. I would also like to pitch in as a co-mentor for modules such as the series, concrete and stats module for GSoC 2023 if required.