GSoC 2017 Application Abdullah Javed Nesar: Rubi Integrator - sympy/sympy GitHub Wiki

About Me

Basic Information

Name: Abdullah Javed Nesar

University: Indian Institute of Technology, Kharagpur

Email: [email protected]

Github/IRC : Abdullahjavednesar

Blog : #TODO

Time Zone : IST (UTC + 5:30)

Personal Background

I am Abdullah Javed Nesar, a third year undergraduate student at IIT Kharagpur, India. I have taken courses on Probability and Statistics, Advanced Engineering Mathematics, Operation Research, Intelligent Tutoring System, Mechanics of Materials, Programming and Data Structures and Solid Mechanics.

Programming Details

I have recently switched on to Ubuntu 15.10 from window 8.1 with Sublime Text 3 as my primary editor. I like Sublime Text 3 because of its simplicity, user-friendly GUI and other cool features.

I was introduced to Object Oriented Programming at an early age and I have been coding for more than 7 years now. I started coding in Java and then followed it up with C. I've been introduced to Python in the recent past and it makes thing more simpler and short. Features in Python help me focus more on the algorithms an its implementation. There are a number of things that makes coding in Python more easy and versatile. Many open source Standard libraries in Python adds on to its simplicity. PEP 008, which emphasizes on coding style in Python.

I am familiar with using Git version control system and Github. I have been using them for a long time now and learnt them through practice and in case I get stuck, I read more about it at Stack Overflow, and come up with a solution. I am familiar with SymPy's workflow and have set my development environment.

Contributions to SymPy

I started contributing to SymPy in early November 2016 by solving Easy-to-Fix bugs and here is a list of all my contributions so far in chronological order:

  • (Closed) Improved computation of matrix power to integer #11808 #11808 I couldn't successfully finish this (due to my semester exams) so was later continued and merged in #11874 Increase performance of matrix power for 2-dim matrices #11874. This was my first experience in SymPy, although I couldn't completely fix it but, I learnt a lot.

  • (Merged) Function gamma pprint issue fix #11919 #11919

  • (Merged) Matrix hstack/vstack improved #11951 #11951

  • (Merged) Q.prime returns None, Integer x Integer can be Prime #11952 #11952

  • (Open) Derivative is_real if Function is_real #12041 #12041 Needs Decision

  • (Merged) Fixes code quality in tests_sums_products.py #12061 #12061

  • (Merged) Improve singular matrix power performance #12099 #12099

  • (Merged) Fixes a Typo #12138 #12138

  • (Merged) Fixes typos in holonomic.py, singularityfunction.py and diophantine.py #12184 #12184

  • (Merged) Minor spellchecks in finite_diff.py #12244 #12244

The Project

Overview

Sympy is a pure python library which aims to provide a full featured Computer Algebra System (CAS). Presently there are a number of algorithms that handles integration:

  • sympy.integrals.risch.risch_integrate() - Uses Risch algorithm works for transcendental equations with exp() and log().

  • sympy.integrals.manualintegrate.manualintegrate() - Similar to manual integration (i.e., as done by hand). Algorithm for manualintegrate is similar to Rubi, it also does pattern matching against some rules.

  • sympy.integrals.trigonometry.trigintegrate() - Integrates trigonometric functions. Also uses pattern matching.

  • sympy.integrals.rationaltools.ratint() - Integrates rational functions.

  • sympy.integrals.meijerint.meijerg_definite() and sympy.integrals.meijerint.meijerg_indefinite() - Integration using the Meijer G algorithm (roughly, by translating the integral to a Meijer G-function, integrating, then translating back).

  • sympy.integrals.heurisch.heurisch() - The heuristic Risch algorithm. It can be very slow (sometimes hanging the integrator), but there are cases where only it can produce an answer.

Rubi will be an entirely new inclusion into SymPy. It utilizes a decision tree for pattern matching making things faster, more organised and easy to maintain. Although some work is in progress to include some rules to SymPy(cos, sinh and cosh rules see PR #8036 initiated by Ondřej Čertík).

Advantages of a rule based system

A rule based system is aware of the possible cases and hence handles it accordingly. Here are examples how Rubi helps making expressions more simpler, symmetric and hence machine and human friendly taking advantage of its general rules.

This is the output we get in Rubi.

Output here is simplified and consists of just three terms, where as other CASs like, Mathematica and Maple provide more sophisticated outputs which are not simplified.

Rubi handles this taking advantage of its general rule.

Maple ends up with a wrong result.

Motivation

The following summarizes the advantages of storing mathematical knowledge in the form of a repository based on properly defined transformation rules:

  • Human and machine readable - Since rules are defined using mathematical formulas rather than procedural programming constructs, they express a self-contained mathematical fact that can be attractively displayed in standard two-dimensional mathematical notation.

  • Able to show simplification steps - The successive application of rules exactly corresponds to the steps required to simplify an expression. Thus when a rule is applied, it can display itself in standard mathematical notation as justification for the step, and then suspend further simplification so the partially simplified result is returned.

  • Mechanical rule verification - Since the right side of a properly defined rule is just a mathematical expression, the rules validity can often be mechanically verified. For example, the right side of integration rules can be differentiated to see if they equal the integrand on the left.

  • Facilitates program development - The fact that properly defined rules are inherently self-contained and free of side-effects makes it easy to test the effect on the system of selectively adding, modifying or deleting rules. Although collections of rules may be highly recursive, each individual rule must be able to stand on its own, thus making it possible to test it on examples before adding it to the collection.

  • Platform independent - Since properly defined transformation rules consists only of mathematical expressions and pattern matching specifications, the translation of rules from the syntax of one computer algebra system to another is relatively straight-forward.

  • White box transparency - For the most part, computer algebra systems appear as mysterious black boxes to users with little or no explanation given as to how results are obtained. However, if the source file of rules on which a CAS is based were included with the system, it becomes a transparent white box, making it possible for users to modify existing rules and even add new ones.

The Plan

Note - This is a basic plan which I propose according to the discussions in #12233, #7748 and #7749, but may vary as my mentors suggest. The important structure of planning out this project includes:

  • An efficient Pattern Matching Technique and Transformation Rules, this would be the engine of Rubi, better the Pattern Matcher better is the performance.

  • Implementing almost 6000 rules into SymPy from Rubi 4.10, using the Pattern Matcher the given integrand's expression is broken into subexpression and traversing through the Matching tree a rule is concluded else no output is derived. Once into a rule the respected output is presented.

New Pattern Matching

Overview

Currently, the are two algorithms providing pattern matching in SymPy, the .match( ) provided by SymPy's core and the unify module. .match( ) is mathematically aware, while the unification module provides a structural matching.

In the unification module there is no distinction between expression and pattern, there are just two expressions and wildcards can be placed in both of them. Unification provides a substitution for the wildcards such as the two expressions become the same, while in ordinary pattern matching wildcards can be only in the pattern, not in the expression.

In both modules of SymPy, the matching acts on the whole expression, not on subexpressions. Wolfram Mathematica has a powerful matching and term rewriting system, which acts on expression trees by matching subtrees of expressions.

Drawbacks of current Pattern Matcher - The current Pattern Matching algorithm is not powerful enough and fail very often, some instances of failure are:

In [1]: b = Wild('b',exclude=[x])

In [2]: m = Wild('m',exclude=[x])

In [3]: expr = (4*x)**2

In [4]: expr.match((b*x)**m)

expected output: {b_: 4, m_: 2} got None

In [78]: a=Wild('a')

In [79]: b=Wild('b')

In [80]: m=Wild('m')

In [81]: p=Wild('p')

In [82]: expr = ((q*log(y))/5)**2

In [83]: expr.match((((b*a)/m))**p)
Out[83]: {m_: log(y)**(-2), p_: 1, a_: 1/25, b_: q**2}

expected a simpler output: {m_: 5, p_: 2, a_: log(y), b_: q}

The New Pattern Matcher is expected to overcome such flaws.

Basic Idea

The idea here is to use an if-then-else tree for pattern matching which would be more robust and fast, a new pattern matching is to provide matching on subtrees, enabling more options to be available on wildcards (such as optionality, types allowed to be matched, assumptions on wildcard, filtering conditions, information on whether it has to be a node or a subset of a node's children). Benefits of using a tree:

  • It can match the wildcards to subtree in the expression.
  • Extract expressions an and apply transformation to arrive at the correct match.

Wildcard objects

Create wildcard objects such as WildNode and WildSequence to enhance the usability of the tree while matching.

  • WildNode - matches a tree node.
  • WildSequence - matches a tuple of node siblings.

For example

>>> wn1 = WildNode('wn1')
>>> wn2 = WildNode('wn2')
>>> ws = WildSequence('ws')
>>> match(f(x, y, z), wn1(x, y, wn2))
{wn1: f(x, y, z), wn2: z}
>>> match(f(x, y, z), ws)
{ws: f(x, y, z)}
>>> match(f(x, g(y, z), w), g(ws))
{ws: (y, z)}
>>> wn1 = WildNode('wn1', exclude=[x, y])
>>> match(f(x, y, z), f(ws, wn1))
{ws: (x, y), wn1: z}

Example: replace all occurrences of w_*conjugate(w_) by abs(w_)**2

>>> replace(sin(u*conjugate(u))+exp(a+z*o*conjugate(o)), {wn1*conjugate(wn1): abs(wn1)**2})
sin(abs(u)**2) + exp(a+z*abs(o)**2)

WildNode just matches a single node, not a sequence:

>>> replace(u*o*conjugate(u*o), {wn1*conjugate(wn1): abs(wn1)**2})
u*o*conjugate(u*o)

WildSequence would work in this case:

>>> replace(u*o*conjugate(u*o), {ws*conjugate(ws): abs(ws)**2})
abs(u*o)**2

Checking Pattern Form

Test if the expression is integer:

>>> match[12345, _Integer]
>>> True	

Test if an expression is a sum of two or more terms:

>>> match[(x - 1)*(1 + 2*x + 3*x**2), Add[_, __]]
>>> False	
>>> match[Expand[x*(1 + 2*x + 3*x**2)], Add[_, __]] 
>>> True

After expansion, x*(1 + 2*x + 3*x**2) becomes (x + 2*x**2 + 3*x**3) and tree representation is Add(Mul(Integer(3), Pow(Symbol('x'), Integer(3))), Mul(Integer(2), Pow(Symbol('x'), Integer(2))), Symbol('x')) therefore returns True.

Multiple patterns

A list of pattern could be provided:

>>> match(f(x), [f(wn1), wn2(wn1)])
{matched: 0, wn1: x}                            # {wn1: x, wn2: f(x)} has lesser priority
>>> match(g(x), [f(wn1), wn2(wn1)])
{matched: 1, wn1: x, wn2: g(x)}

Or conditions

PatternOr function, creates many possible matching patterns, splitting the pattern at the node it is found:

>>> match(f(x), [PatternOr(f(wn1), g(wn1))])
{wn1: x}

Optional wilds

Wilds that need to be matched:

>>> match(f(x), f(wn1, wn2), optional=(wn2,))
{wn1: x, wn2: None}

Default values for unmatched wilds

>>> match(f(x), f(wn1, wn2), default={wn2: S.Zero})
{wn1: x, wn2: 0}

Type wilds can match

TODO: decide how to distinguish subtypes and exact type matches.

>>> match(f(x), wn1, typeis={wn1: Function})
{wn1: f(x)}
>>> match(f(x), wn1, typeis={wn1: Symbol})
{wn1: x}

Pattern dispatch

Needs Decision

Using decorators to define a function f to be dispatched according to pattern matching rules:

@patterndispatch(g(wn1))
def f(expr, matches):
    ...

@patterndispatch(wn1)
def f(expr, matches):
    ...

So

>>> f(g(x))  # calls first function, with expr=g(x) and matches={wn1: x}
>>> f(3)     # calls second function, with expr=3 and matches={wn1: 3}

Predicate dispatch

Needs Decision

By predicate dispatch it is meant a pattern dispatch endowed of conditions.

Predicate dispatch example:

In[1]:= f[x_] = 1
Out[1]= 1

In[2]:= f[x_Integer] = 2
Out[2]= 2

In[3]:= f[x_Integer] = 3 /; x> 5
Out[3]= 3 /; x > 5

In[4]:= Definition[f]
Out[4]= f[x_Integer] /; x > 5 = 3
        
        f[x_Integer] = 2
        
        f[x_] = 1

Here, f has been defined three times. In the definition of f, the matching set is sorted to allow more specialized matches to have priority over less specialized ones.

It is possible to see how this dispatching selects the correct function:

In[8]:= f[3/2]
Out[8]= 1

In[9]:= f[2]
Out[9]= 2

In[10]:= f[7]
Out[10]= 3

This case is ambiguous, it is set to the last definition:

In[15]:= g[1, 1]
Out[15]= 2

In general, it is better to raise an exception when trying to call a function with an ambiguous dispatch.

Orderless attribute

Orderless is an attribute that can be assigned to a symbol f to indicate that the elements ei in expressions of the form f[e1,e2,…] should automatically be sorted into canonical order. This property is accounted for in pattern matching.

>>> set_attributes[f, Orderless];
>>> f[b, a]
>>> f[a, b]
>>> a = Symbol('a', Integer = True)
>>> match[2 + x, x + a]
>>> True

Implementing Rubi Rules

There are more than 6000 rules in the current version of Rubi that needs to be implemented into Sympy. These rules are distributed into:

  • Algebraic
  • Exponential
  • Logarithmic
  • Trigonometric
  • Inverse trigonometric
  • Hyperbolic
  • Inverse hyperbolic
  • Special functions
  • Miscellaneous

Further these rules are divided and subdivided into different categories see here. Here is a small example how rules will be implemented into Rubi Module in SymPy.

def Algebraic_1_1(Expr)               # rule Algebraic_1_1 indicates Algebraic > Linear Product > (a + b*x)**m category
    a = wild('a', exclude=[x])
    b = wild('b', exclude=[x])
    m = wild('m', exclude=[x])
    type = match(Expr, [((a + b*x)**m), ((b*x)**m)])
        if type[matched] == 1:
            a = 0
        if (m == -1):
            return log(a + b*x)/b
        else: 
            return Piecewise((log(a + b*x)/b, (m == -1)),((a + b*x)**(m+1)/(b*(m+1)), True))