GSoC 2020 proposal Arpan Chattopadhyay: Improving SymEngine Sympy Integration and SymPy core to SymEngine - sympy/sympy GitHub Wiki

Google Summer of Code 2020 - Proposal

Improving SymEngine - Sympy Integration by working on SymPy Modules

Me, the person:

Name : Arpan Chattopadhyay

University : Birla Institute of Technology and Science, Pilani

Email : [email protected]

GitHub : Arpan612 (Arpan Chattopadhyay)

Time-zone : IST (UTC+5:30)

Age : 19

I am Arpan Chattopadhyay, pursuing B.E.(Hons.) Electrical and Electronics (2nd Year) at Birla Institute of Technology and Science, Pilani (BITS Pilani). I am highly interested in the fields of Symbolic Mathematics, Applied Mathematics, Machine Learning, and Mathematical Modeling. I would love to continue working in open source projects and hopefully, one day, create my team of programmers for building an open-source project as big as that of SymPy. I am very well versed in English and love interacting with people. Apart from coding, I love watching films, particularly mystery ones. I play and follow football as well. I believe I have been a good fit in the community culture and I hope to contribute even after the GSoC program gets over.

Me, the programmer:

I have been introduced to programming in Class 11 where we were taught C++ programming language as a part of our course curriculum. I started Python a month after joining college. At that time, I was introduced to the world of open source and the limitless possibilities it held for me. I was fortunate enough to take part in a Study Oriented Project which involved Symbolic Mathematics and the use of SymPy in Python. I found out about the various tasks which can be done by SymPy and was very impressed. Thus, I decided to finally give wings to my dream of working and collaborating in a large open-source project and SymPy was a natural choice, given the interest, it sparked in me. OS : Ubuntu 16.04 IDE Hardware Configuration : Intel i7 8th generation Python : Version 3.7.4 Editor : Atom and Anaconda (Version 4.7.12)

Python Projects Created:

Gateway Interface with Google Accounts Blogging Application Polling Application Implementable Google O-Auth Gateway Biometric Verification Application Algorithmic Trading Application Algorithms for Stock Market Trading with Python Automated Trading Platform with Python and C++ Retina Sensor Detection Algorithms

Internship and Research Experience with Python:

Winter Intern as a Quantitative Research Analyst in Veda Capital, Opera House, Mumbai Responsible for making Algorithms for Trading in the Indian Equity and Commodity Market, managing and mining F&O data using MySQL, coding the algorithms and alphas in Python programming language with Pandas and NumPy for back testing and live testing.

Laboratory and Study Project on Application of Machine and Deep Learning in Communication System and Spectrum Sensing

This is a Laboratory and Study Project on Application of Machine and Deep Learning in Communication System and Spectrum Sensing under Shishir Maheshwari, Department of Electrical and Electronics, BITS Pilani. Here we apply Machine Learning techniques in Python for Spectrum Sensing when the primary user has Multiple Transmit Powers. We also investigate and build a machine learning model for spectrum sensing in cognitive radio networks.

The simplicity and ease of handling complex mathematics is an awesome feature I feel SymPy has.

Contributions in SymPy:

Merged PRs:

#18946: Issue 18921: “Extra Strong” Lucas Pseudoprimes #18945: Issue 18666: Move Matrix out of test_matrices.py #18961: Issue 18959: Give ‘digits’ a ‘bits’ argument #18928: Issue 18891: Improper docstring in codegen.array_utils recognize_matrix_expression

Open PRs:

#18973: Issue 18963: ibin should raise OverflowError if too few bits are requested; negative arg

Closed PRs:

#18944: Issue 18666: Move Matrix out of test_matrices.py #18943 : Issue 18666: Move Matrix out of test_matrices.py Reviews and Discussion: Issue #18889: Running sympy in Brython (Python in the browser) PR #18681: Issue 16234: Refine should simplify symmetric matrix element

My Project:

In this competitive world and rapid expansion in technology, speed of executing tasks is very important. SymEngine can be used to achieve that speed in SymPy.

Though SymEngine was initially developed to be a part of core for the SymPy CAS in the recent past, it has matured enough to be used as a symbolic backend. Using SymEngine can significantly increase speeds of various symbolic operations, and would increase the value and importance of SymPy requiring brisk computation as it gives users the option to tap into SymEngine’s routine. Also using SymEngine in SymPy is very fast and easy and hence will attract more people to it.

Aim and Objective:

The ultimate objective is to speed SymPy. An effective way to do it is to use SymEngine. SymEngine provides a very fast implementation of core symbolic algorithms, and SymPy should use it to gain speed. The theme of this project is to expand it to the modules in SymPy and also implement missing features from SymPy core in SymEngine. The project also explores the various changes SymEngine needs to become more efficient. This project builds on the commendable work done by @ShikharJ in GSoC 2017 and also solves the various problems SymEngine faces.

Discussions and Research: I believe that the most important problems of this world are solved by Mathematics. In particular, calculus was a revolutionary discovery that accelerated study in Mathematics and development in the world at an astonishing pace. The vast application, in particular, Calculus I like most about is the vast application it has and how it is used in almost all practical tasks we do.

Motivation:

I believe that the most important problems of this world are solved by Mathematics. In particular, calculus was a revolutionary discovery that accelerated study in Mathematics and development in the world at an astonishing pace. The vast application, in particular, Calculus I like most about is the vast application it has and how it is used in almost all practical tasks we do.

Importance of This Project: We want SymPy and SymEngine to have wider acceptance and use. For that, one of the biggest challenges we face is slow operations. Integration of SymEngine to SymPy is essential for increasing the speed and for that we need to increase the number of SymPy modules integrated with SymEngine.

Time Available and Other Commitments:

I have my end-semester examinations from May 1st to 15th. This lies completely inside the community bonding period therefore I will complete the tasks I was going to do in the community bonding period in pre-community bonding period. Other than that, I will be devoting 45-55 hrs a week as I have no other commitment during the GSoC period and have great interest in the project I want to do with SymPy.

Documentation:

Good documentation is an integral part of any successful project. It also helps in receiving a wider acceptance from the developer community. After working on the project for some time, I would write a documentation on the various aspects of the project.

Communication:

I will be in regular contact with mentors using email. In case I am stuck somewhere, I would reach out to them via Gitter Chat and Mailing List. If selected, I also plan to make a blog where I will be giving regular updates about the work completed. I believe that the problems faced by me and their solutions would certainly help other fellow developers later just like I have received great help from various blogs on the internet.

Importance of Benchmarking:

In any project benchmarking is essential. This project along with other goals works on ASV benchmarking as there is a need to identify the parts of the code which make our code slow. This problem arises especially when heavy computational power is used for multivariable problems. Along with other tasks, this project will work on adding benchmarks to modules of SymPy and also set up a benchmarking environment.

Phase 2 Modules:

Combinatorics Assumption Polynomials Calculus Integrals

Phase 3 Modules:

Number Theory Geometry Series Stats Vectors Project Details: Phase 0: (Community Bonding) In this phase, we explore the existing work done in the integration of SymEngine with SymPy. We look at the various changes made in the existing modules which already have integration of SymEngine with SymPy. We renovate and improve documentation of the necessity of the various changes done so far. We also keep a track of the difference in speed these changes have brought about and also include it in the documentation. I have already started exploring SymEngine and I believe there is ample scope of testing and benchmarking. As I will mention in the timeline, this phase is almost completely inside the community bonding phase. Hence, I will continue my conversation with the mentors at SymEngine and get some information on the additional issues SymEngine faces and can work on it. For benchmarking and speed testing the following issues and PRs are relevant: Speedup tests compilation time Consolidated tests The idea here is to have just one main executable, and all other test files just get linked to it. This should vastly improve the recompilation time if a test gets modified, as well as the overall time to compile tests. I will discuss with the mentors about the feasibility of this task, and if we are able to come up with a plan to work on this, I will try to finish this task in this phase itself. After this, I will be exploring the work done on the Polynomial Module and Wrappers and discuss with the mentors what changes and additions we can introduce to them. After that, I will draw up a plan and this will be implemented in phase 2. For Polynomial Module and Wrappers the following issues and PRs are relevant: Polynomial Module Polynomial Wrappers Polynomial Final Work I will also work on the issue Remove all non-trivial global constants Issue #1595 Phase 1: We are inside the coding period now. We start by working on the various issues discussed with the mentors and already existing issues in the SymEngine repository. Some of the existing issues are below: Inconsistent substitution behaviors · Issue #1600 Implement matrix expressions · Issue #912 · symengine/symengine Implement apart() · Issue #1324 As Aaron Meurer Sir has said before, in a large organization, it happens that issues get neglected. I have taken this task before implementing something new because I feel it is better to perfect the work already done before working on other modules. After that, we come to the main part of this phase which is implementing missing features from SymPy core to SymEngine. SymPy core is not completely isolated and it is dependent upon python and the rest of SymPy to some extent. Another aspect that makes this implementation challenging is its complex buildup. First, we first classify SymPy classes and files belonging to the core and then we create a module, which will define the API to the core. This module is used to import things from the current core.

All client code (that is, the rest of SymPy that uses the core) will access things from the core through the module only, e.g.: from sympy.core import Add

will be changed to from sympy.old_core_api import Add After this step, we start porting symbols from modules one by one by taking a symbol and building a new wrapper that has the same methods and arguments. The work done here is crucial for this entire implementation. After this, the whole SymPy test suite passes. Validation is done with SymPy core and minor changes are done as mentioned in the Timeline. The one by one approach will consume time. Though I believe that giving about 50 hours a week will be enough to complete before the phase 1 deadline, I will be having a 3-day buffer in phase 2 for this task.
I will use the Python reference implementation of the core, that way that could be the default core in SymPy, and people can then optionally switch to use SymEngine. I would like to use the SymEngine's API for the tasks During the period, if my mentors feel that some other aspects of SymEngine have higher priority, I am open to work for it as well.

An interesting thing I have observed with SymPy is that people often compare the speed which libraries bring as against the other. The following issue highlights it:

Investigate RE-flex for the tokenizer Issue #1589

I want to spend a day exploring how other libraries tackle the issue of speed and by exploring their open-source Github channels to find out what different they do from us. I believe this activity can bring great value as we might find something which we were unaware of.

Phase 2: We need to first import changes for pre-implemented functions/classes in the previous modules we will work on in phase 2. We will move the few benchmarks in the main repo into the Benchmarks repo for phase 2 modules. The ASV benchmarking style has a particular convention and that is the reason why it is done. After this, I will start working with the Phase 2 modules.

For the Combinatorics module, I think there is good scope of implementation in permutations and utilities.

For the Assumptions module, significant work is already done so I would like to build up on it. I will improve the capabilities of the 'new assumptions' system to make it faster and more powerful. Reorganisation of the old assumption system through a handler mechanism. I will address issues popping up over the definition of assumptions (mostly in the old system). Since it is a seperate GSoC project in itself, I will coordinate with my mentors to work as much as possible, before moving on.

For the Polynomials module, the plan discussed in the Community Bonding period will be implemented.

For the Calculus module, I think there is scope of implementation in all its classes. I will start off with Finite Difference Weights. After this, we implement the changes in Euler and Utilities. In utilities, special attention will be given to accumulation bounds.

Next we move on to the Integrals module. I think implementation of SymEngine will be difficult here due to its complexity. I think chalking out the plan of implementation with the mentors before proceeding will be a good idea. The starting point of implementation will be the various kinds of transforms i.e. mellin, fourier and laplace, sine, cosine etc. These transforms are the most used in the integrals module. These transforms play a key role in signals and control system analysis along with sampling. Fourier transform is used in signal analysis and Laplace for system analysis. I would implement the missing properties in these transforms and also add documentation of its uses. I am studying both Control System and Signal and Systems as my coursework, this semester in college and hence have a deep understanding in this area. We have practical sessions with MATLAB and I would like to implement the extra features provided there as well here. I will be working on the ASV benchmark and the corresponding code for the integration of this module in SymPy with SymEngine. The main focus here will be to implement functionalities unavailable right now. This task is expected to be heavy and time-consuming. We will need to implement routines in SymEngine as well as update the python wrapper with the latest development.

Phase 3:

II would like to keep a four-day buffer as the work in Phase 2 is time-consuming. These days can also be utilized in debugging and better documentation for the work done in the previous modules. I am expecting my mentors as well to suggest some extra work which we can do to make this implementation innovative, so the time may be used there as well.

After this, I would like to give a day on ASV Benchmarking all the modules in which SymEngine has been implemented thus far as the main objective of this project is for our code to be fast. Then update the python wrapper with the latest development and move on to Phase 3 module.

After giving the finishing touches to the previous module, we shift our focus on the phase 3 modules.

For the Number Theory modules, generate, multinomial and primetest have the highest usage and it will be my starting point of implementation. In number theory, I have observed plenty of useful relations taught to us in college but are not implemented here. I will make sure all those relations are implemented.

For the Geometry module, I have observed maximum usage of ellipse (due to inclusion of a circle in it) and polygon and it will be my starting and main area of implementation.

For the Series module, I think a lot of implementations are possible. I will be focusing first on both Discrete and Continuous Time Fourier Series, then sequences. Both singleton and compound sequences have applications and I will implement functionalities based on popular usage.

For the Stats module, Bernoulli, Poisson, Chi and Maxwell are very important. There is extensive usage of these in Data Analysis and Simulation applications in multiple fields.

For the vector module, no particular area has significant weightage. Areas to be focused on will be discussed with the mentors and implemented

After each module, update the python wrapper with the latest development.

Like before, I will be working on the code for the integration of this module in SymPy with SymEngine regarding the implementation of the mentioned unavailable functionalities. Since we have worked in a similar manner in Phase 2, I believe I will be able to complete the work at least five days before the deadline. I would like to implement some additional functionalities in some of the previous modules. The last five days will be used for removing bugs, solving issues and improving documentation of the work done in GSoC.

Timeline: (As per the latest schedule sent by GSoC authorities due to COVID 19)

Community Bonding Period: (Present - May 31)

Exploring existing work done in integration of SymEngine and SymPy Figuring out additional work which we can do during with the mentors and deciding the timeline for it as well Renovate and improve documentation of existing SymEngine integrated modules of SymPy. Starting fixing issues in SymEngine (May 4 onwards) PR for all issues fixed by May 17 Discussion of the idea of benchmarking and speed testing with the mentors. Continue and complete the work done in Consolidated tests PR for complete work by May 22 Discussion for the remaining work to be done on the Polynomial Module and Wrappers by May 24. Work on the issue Remove all non-trivial global constants Issue #1595 PR for the issue by May 26 Start Phase 1 work from May 26 if no other tasks remain.

Phase 1: June 1 to June 7: (Week 1) Complete working on issues: Inconsistent substitution behaviors · Issue #1600 · symengine/symengine PR for the issue by June 2 latest. Implement matrix expressions · Issue #912 · symengine/symengine PR for the issue by June 5 latest. June 8 to June 14: (Week 2) Complete working on issues: Implement apart() · Issue #1324 · symengine/symengine PR for the above issue by June 8 latest. Creation of the module, which will define the API to the core and be used to import things from the current core by by June 12 latest

June 15 to June 21: (Week 3) Complete porting symbols from Algebras upto Functions and build a new wrapper that has the same methods and arguments by June 15. The order of modules in the below document is used. Welcome to SymPy's documentation! — SymPy 1.5.1 documentation Complete porting symbols from Geometry upto Number Theory and build a new wrapper that has the same methods and arguments by June 20. June 22 to June 28: (Week 4) Complete porting symbols from ODE upto Vectors and build a new wrapper that has the same methods and arguments by June 24. Remove the validation and remove the SymPy's core, that is not used at this point. Tests must still pass. Remove the conversions SymPy to SymEngine to SymPy. Simply accept SymEngine.py objects and return SymEngine.py objects. After all tasks are done, the whole of SymPy is ported on top of SymEngine. PR for the tasks by June 28 latest.

Exploring how other libraries tackle the issue of speed and by exploring their open-source Github channels. June 29 to July 1: (Week 5) Import changes for pre-implemented functions/classes in the previous modules. Move the few benchmarks in the main sympy repo into the Sympy-Benchmarks repo. The ASV benchmarking style has a particular convention and that is the reason why it is done.

Phase 1 evaluation submission by July 1 latest

July 2 to July 5: (Week 5)

Benchmarking the work done in phase 1 as per requirements. Import changes for pre-implemented functions/classes in the previous modules. Start Implementation on permutations in the Combinatorics module.

Phase 2: July 6 to July 19: (Week 6 and 7)

Implementation of SymEngine on permutations in the Combinatorics module. PR for the work done by July 6 latest. Implementation of SymEngine on utilities and other classes in the Combinatorics module. Add additional functionalities. PR for the work done by July 9 latest. Reorganisation of the old assumption system through a handler mechanism.

Address issues popping up over the definition of assumptions.

Add additional functionalities. Work on anything else suggested by the mentors.

PR for the work done by June 13 latest.

Functionalities will be wrapped up in SymEngine.py. Testing to be done as required.

July 20 to August 2: (Week 8 and 9)

Implementation of SymEngine on the Polynomials module as per the plan discussed in the Community Bonding period.

PR for the work done on the Polynomials module by June 16 latest. Implementation of SymEngine on Finite Difference Weights in the Calculus module. PRs for the work done on by July 18 latest Implementation of SymEngine on Euler and Utilities in Calculus module. Add additional functionalities. PRs for the work done on by July 21 latest Implementation of SymEngine on transforms in Integrals module. PRs for the work done on by July 24 latest Add additional functionalities as mentioned in project details. PRs for the work done on by July 30 latest All the remaining functionalities will be wrapped up in SymEngine.py. Testing to be done as per requirement. Update the python wrapper with the latest development

Phase 2 evaluation submission by July 30 latest

Phase 3:

August 3 to August 9: (Week 10) Debugging and Documentation for Phase 2 modules. Benchmarking all previous modules where SymEngine is used. The above tasks to be completed by August 3 latest. Start working on the Phase 3 Modules and its corresponding classes. Special attention should be given to some classes where the scope of SymEngine implementation is maximum. Implementation of SymEngine on generate and multinomial in Number Theory module. PRs for the work done by August 7 latest Implementation of SymEngine on primetest and addition of relations in Number Theory module. PRs for the work done by August 9 latest Implementation of SymEngine on ellipse and polygon in the Geometry module. PRs for the work done by August 11 latest August 10 to August 23: (Week 12 and 13) Implementation of SymEngine on Discrete and Continuous Time Fourier Series in Series module. PRs for the work done by August 13 latest

Implementation of SymEngine on sequences in Series module PRs for the work done by August 15 latest Implementation of SymEngine in all areas of Stats module discussed in project details PRs for the work done by August 18 latest Implementation of SymEngine in areas of Vectors module as discussed with the mentors. PRs for the work done by August 22 latest Additional work depending on the discussion with mentors. All the remaining functionalities will be wrapped up in SymEngine.py. Testing to be done as per requirement. Update the python wrapper with the latest development.

August 24 to August 31: (Week 14) Finishing up documentation and blogs Checking for any issues or conflicts unattended Final Benchmarking and updating it to SymPy wiki Completion of above tasks by August 28 latest Submission for Final Evaluation by August 28 latest.

I plan on implementing SymEngine in some modules of SymPy across all functions and classes in it which are compatible with SymEngine. For additional functionalities, I will be exploring packages and languages like Numpy and MATLAB to find out the extra functions they offer and implement the same here. I will be in contact with some of the computer science and mathematics teachers in my college. They have promised to give regular suggestions on the functionalities we can implement based on their own research experience. I expect to finish the tasks a few days earlier than the timeline above. The extra time given by Google in the community bonding time boosts my chances of doing so, as I will hopefully be able to start with phase 1 earlier than that mentioned in the timeline above. I would like to utilize this additional time to work on the additional goals. If I don’t get the additional time, I will request the Mentors to allow me to work for a few days more on my additional goals. If I am allowed to do so, I would love to work on the goals below. Some of them were previously proposed on WiKi, but not implemented mostly due to time constraints. If any aspect of the project is left out, I will discuss with the mentors in the Community Period itself. Additional Goals: Explore the modules “Logic” and “ODE” and find out Functions where SymEngine can be implemented and work on the same. Exploring codecov.io and its role in efficiency tests. Exploring and suggesting steps for Thread safety in SymEngine.

References: ShikharJ/GSoC-2017-Work-Report: Google Summer Of Code 2017 Work Report GSoC 2017 Application Shikhar Jaiswal: Improving SymEngine's Python Wrappers and SymPy SymEngine Integration · sympy/sympy Wiki https://github.com/sympy/sympy/wiki/GSoC-2020-Ideas#improve-sympy-integration SymPy core upgrade to SymEngine · symengine/symengine Wiki Beginner Contributor Guide Design of SymEngine · symengine/symengine Wiki Building SymEngine · symengine/symengine Wiki Improving Assumptions Other Proposals on Wiki

Acknowledgment:

I would like to thank Isuru Fernando and Shikhar Jaiswal for guiding me in this project proposal. I am grateful for the help extended by Aaron Meurer, S.Y. Lee and Gagandeep Singh till now and I hope all these people will continue to help me during GSoC.

Thank You So Much!