GSoC 2020 Application Moses Paul R: Advancing SymPy Gamma - sympy/sympy GitHub Wiki

Advancing SymPy Gamma

To introduce Natural Language Processing to these projects to help them better translate expressions and queries to mathematical symbols.
To extend the parsing support for SymPy Gamma by including support for syntaxes from languages such as Julia, Matlab, Lua etc, which would improve the way SymPy Gamma queries are interpreted.
To allow Tex to be interpreted by SymPy Gamma.


Table of contents


Basic Information


Bio and Programming experience

I am a Physics Undergraduate at Madras Christian College

My interests lie at the intersection of Physics, Computer Science and Mathematics. I do research in Non-Linear Dynamics and Complex Systems. So most of my ventures seek to improve my understanding in all these fields.

I use a Macbook to write most of my programs. I also use a couple of Linux variants that I boot from my HDD. I program extensively in python, a language I've been using for nearly 4 years. I use VSCode as my IDE. I use VSCode because of the vast collections of extensions that prove to be quite useful when learning new languages and frameworks and that it supports version control.

I have been programming for 5 years now. I can write|understand code in about 20 programming languages. I’ve used many variants of programming languages, from Assembly to Mathematica.

I actively participate in competitive programming competitions such as the Google Hash Code.
I was a Finalist for the Google Code-In 2017 and subsequently a mentor for both the Google Code-In and the Google Summer of Code.

Python is one of my favourite programming languages. I love the way that Python is versatile. I also love that it is an interpreted language. I also use Jupyter Lab extensively, this is because I work on a lot of Scientific Programming and Jupyter offers me a lot of flexibility. I’ve used Python for a whole array of projects, from writing simple games, building machine learning models, creating web applications with django, exploit development and much more.

When I am not coding, I can be found playing the Piano or the Guitar. I find that music excites my mind in a similar fashion to the release one obtains from solving an algorithmic challenge.

I have assumed many roles when it comes to computers. I’ve been a Systems Architect, an App Developer, Penetration tester, Operations Research programmer and so on. I have had experience developing projects spanning from Operating Systems to Artificial Intelligence and most things in between including Mobile Applications, Cloud Computing, and Web Development.

Topics in which I have a formal academic background include:

  • Optimization Algorithms
  • Distributed Computing
  • Formal Languages and Automata Theory
  • Data Structures and Algorithms
  • Quantum Computing
  • Discrete Mathematics
  • Graph Theory and Combinatorics

One of the features of SymPy that i fell in love with was that it made symbolic programming so accessible. An instance when I realised just how powerful SymPy was, when I was writing a program that used Newton’s Forward Interpolation to fit a Polynomial. I would’ve moved on to using SageMath if it were not for SymPy. I I wrote an implementation for Polynomial Fitting using recursion and it was incredibly easy to implement symbolically.

from sympy import *
from numpy import *
def f(x):
    return((x**2)+(3*x)+(x**4)+(x**6)+(x**9))
a = list(range(10))
b = list(map(f,range(10)))
def fwdif(l,n):
    l = l[:]
    for j in range(n):
        ln = []
        for i in range(len(l)-1):
            ln.append(l[i+1]-l[i])
        l = ln
        if len(l)==0:
            return([0])
    return l
def s(n,i):
        res = fwdif(b,i)[0]/factorial(i)
        if(res==1):
            return 1
        else:
            return(res + ((p-i) * s(n,i+1)))
deg = 0
i = 0
while True:
    diff = fwdif(b,i)
    if(diff.count(0)==len(diff)):
        deg = i
        break
    deg = i
    i += 1
print("{}th degree Eqn".format(deg-1))
p = symbols("p")
a = s(deg-1,0)
x = symbols("p")
print(f(x).equals(a.simplify()))
a.simplify()

Code Output

I am also quite familiar with version control softwares such as git and GitHub.


Contributions to SymPy

The following are the contributions I made to SymPy repository.

Merged PRs

  • #18953 : Implemented a function called rot90 that rotates the matrix by 90 degrees.
  • #19040 : Fixes Typo in factortools and removes XFail for test_issue_5786
  • #19182 : Fix Mul.is_integer
  • #19147 : Add support for binom in latex
  • #19137 : Update cse_opts.py

Open PRs

  • #18960 : Added a Class called multinomial, a generalisation of the binomial.
  • #19236 : fix holonomic caching error
  • #19223 : Fix Travis build config warnings

Issues raised

  • #18986 : Inconsistencies for expand_func and .expand for binomial
  • #19000 : Binomial for -ve values
  • #19030 : values passed to binomial().rewrite(gamma) || (factorial) don't agree with binomial()
  • #19174 : test('sympy/integrals/tests/test_integrals.py') fails on master ?
  • #19222 : 'sympy/holonomic/tests/test_holonomic.py' fails on master ?
  • #19067 : unable to get a limiting value of gamma

Why Me

I want to advance SymPy Gamma. This is because I’ve found tools like WolframAlpha and Symbolab extremely useful and exciting in my journey as a budding Physicist and Mathematician.

I have always envied Stephen Wolfram for his WolframAlpha computational engine. The chance to help create one, excites me very much. I’ve had experience with parsers and have played around with nlp queries. My experience with a lot of programming languages such as Julia, Mathematica, JS and so on would be incredibly useful when it comes to parsing SymPy expressions with syntaxes from other programming languages.

I’ve used the Google App Engine and am very comfortable with building web applications. In addition I’ve got experience using Natural Language Processing and have a strong mathematical background on related topics. This could very well be useful when it comes to making SymPy Gamma ‘intelligent’.

There seems to be no work done on NLP for understanding queries, I’ve seen lots of issues like these that would greatly benefit from a system of parsing queries using NLP.

The Project

SymPy Gamma is a simple web application based on Google App Engine that executes and displays the results of SymPy expressions as well as additional related computations, in a fashion similar to that of Wolfram|Alpha. I'd like to implement an nlp that converts textual queries such as square root of 2 to Sqrt(2) || 2**(1/2) and so on.
Extend SymPy Gamma such that it is able to map mathematical expressions from one programming language to SymPy expressions f(x) = exp(-x) * cos(x); q = quadgk(f, 0, pi); to integrate(E**(-x), (x, 0, pi)).
I intend to use the work of Nikhil Mann and the work of others who have written parsers for various languages.

To allow Tex to be interpreted by SymPy Gamma. Say you input \frac{1}{2} to SymPy Gamma, you get an error, I'd like to have it output 1/2

There are packages like mathparse which parse sentences like one hundred times fifty four and so on. There is another package that sympy already uses called latex2sympy that is based on ANTLR. This can already parse latex into sympy expressions.

The idea is not to implement an NLP package from scratch as that would be a huge undertaking, but make use of existing technologies such as BERT , ANTLR4 and so on to help expedite the process.

After the successful implementation of an NLP, the next step would be to integrate it into SymPy. That can be done either as a separate function or a class of functions.

So SymPy Gamma can access it, so that the input provided via it, is converted into sympy expressions via the implemented functions.

Task 1 :

I plan to use resources such as Text2Math, Deep Learning for Symbolic Math, Querying using NLP and MathParse and so on to accomplish this. I’d have to do a lot of reading up and refreshing my knowledge base.

Task 2 :

Start working on an appropriate class of functions to handle the parsing of queries. Try varying approaches to solve this problem. Implement a couple of approaches and benchmark them. Choose the most efficient approach.

Task 3 :

Upgrading the web framework that SymPy Gamma uses. The current one i.e Django is not the latest implementation. It also doesn't have support for python3, without which it would be improbable to use the latest distribution of SymPy because of python2.7 deprecation.

Task 4 :

Write tests for the parsing functions and rigorously test them. Putting them all together. Configuring the upgraded framework of SymPy Gamma to work with the newly created parsing functions. Bug Fixes, finding out the limits of usage and optimize for best user interaction.


Timeline

A Tentative Timeline that I intend to follow during the course of this Google Summer Of Code. I don’t plan to take any major vacations during the course of the GSoC, so I will be available to work on my proposal. I might backlog for a week during my semester exams and I am more than ready to make up for it by putting in extra hours.

Community Bonding Period

I plan to use this time to get to know my mentors and go through the Codebase intending to get a clearer picture of how my proposal might be implemented in the smoothest way possible.

Week 1, 2

  • Read existing work done on NLP for parsing mathematical expressions.

  • Go through the existing codebase to figure out what technologies are most likely to help in the realization of this proposal's goals.

  • Finalise the timeline for writing code and submitting PR's.
    (#task-1)

Week 3, 4

  • Start working on the parsing framework and functions.

  • Look at more suitable web frameworks for SymPy Gamma since the current one is outdated and start upgrading it. (#task-2) (#task-3)

Week 5, 6, 7

  • Finish upgrading the web framework and make the UI optimal for end users.

  • Present a workable rough solution for the parsing functions.

  • Start writing tests for said parsing framework.
    (#task-3)

Week 8, 9, 10

  • Integrate the parsing framework into SymPy, run tests and check compatibility issues.

  • Integrate the upgraded SymPy framework with the parsing framework, run tests and check compatibility issues.

  • Deploy to App-Engine and test it extensively.
    (#task-3)

Week 11, 12

  • Consolidate work. Fix CI Errors and adhere to PEP8 / coding standards.

  • Complete Documentation and Testing.

  • Finalize Commits for Merging.
    (#task-4)

Post GSoC period

I do plan to continue contributing to SymPy after GSoC is complete, maybe even become a member in the future.


Contingency Plan

Priority 1:

Introduce Natural Language Processing to these projects to help them better translate expressions and queries to mathematical symbols.

Priority 2:

To add new query syntaxes for SymPy enabling it to support a wide array of inputs.

I choose these priorities because NLP when implemented successfully could create a huge change compared to what the other task could do.


03.31.2020
Moses Paul R