GSoC 2020 Application Omar Wagih: Benchmarks and performance - sympy/sympy GitHub Wiki

GSoC 2020 Proposal Omar Wagih: Benchmarks and performance

Personal Details

Name: Omar Wagih

University: Faculty of engineering, Cairo University

Field: Computer Engineering

Email: [email protected]

Personal Background

I'm a final year student at the Faculty of Engineering, Cairo University, Egypt. I have been coding for almost 4 years with different languages, paradigms and different interests, i did a little bit of everything from web and cross-platform development with JavaScript to Assemblyx86 programming to machine learning and scripts with python. Most of my programming in the last couple of years has been with python exploring and developing machine learning, web scraping and scientific calculations projects.

Me as a Programmer

I am currently on a dual boot between Ubuntu 18.04 and Windows 10, I use windows 10 whenever i can but some projects run way better on Ubuntu so I use it instead, I use VSCode on both platforms, I think its a very cool editor with pretty advanced packages that makes the development environment much easier, if you're doing web development you can start a live server from there, it can automatically solve the trailing white spaces issue and there's a package that can automatically apply PEP8 style guide.

I have about 4 years of different programming experience, one of my recent projects that I'm proud of was doing Arabic OCR on a dataset of different Arabic texts, this was done entirely in python and I went through the entire process of data cleansing, data pre-processing, applying different image processing techniques and finally classification and post-processing. One of the things i really like about Python is its flexibility, this is a programming language that was created for its code readability and for being non-programmer friendly, but it can still be powerful enough to be used in almost anything you can imagine. The most advanced feature i used was probably the threading/multi-processing feature and the lambda concept.

I haven't used SymPy extensively but one of my favorite features, although not the most complex, is the fibonacci function, we all know the importance of the golden ratio in several fields and being able to calculate and use both the golden ratio and the fibonnaci series is awesome.

I have used git in personal projects, group projects and internships, mostly as collaboration between a team of four people and sometimes between a team of 16 people, i have also used it to contribute to SymPy in the past couple of days.

Performance and Benchmarking

Benchmarking is really important, it's almost as important as testing, in testing you make sure your functions works under any circumstances and in benchmarking you make sure that your function is not holding the rest of your code back by being unnecessarily slow, this can be especially important when your code is being used in an intensive way like solving equations with hundreds of variables or huge matrix operations, that is why i got interested in this project, currently Sympy's test coverage is high but the coverage for benchmarking is not that high, adding more benchmarks and setting up a suitable benchmarking environment would be a really good experience for myself as it would allow me to go deep into Sympy, understand new mathematics concepts and measure its programming performance. It would also allow future Sympy developers to get real-time benchmarking comparisons between new and old code.

My Qualifications and Research

I've used Python in several courses like

  • Numerical Analysis
  • Advanced Programming Techniques
  • Advanced Database
  • Data Structures and Algorithms

Benchmarking was not a new concept to me but writing benchmarks was and in the past few weeks I wrote benchmarks for Sympy and other personal code to practice, i also got myself familiarized with ASV, the Python benchmarking library used in the Sympy-Benchmarks repo.

Previous efforts in this topic are done on the Sympy-Benchmarks repo and some in the original Sympy repo, these efforts benchmark several important functions like the basic algebraic functions, matrices, physics and several other functions. There is still some modules that are completely untouched like the geometry module and some that are not fully benchmarked like the integration module.

Time Plan

During the summer I'm prepared to work about 40 hours per week as requested, as of the current schedule, the first month is going to include some of my final examinations, this will affect my hours by about 8-10 hours per week (I am prepared to work during examinations, just not as much as i will be preoccupied) but I am willing to increase my working hours after this period to cover the difference. I also expect to be doing some work during the community bonding period as i understand more about the community and the functions I need to benchmark.

Time Schedule

The program is going to last about 13 weeks, the following schedule will list the expected work to be completed in these weeks, the exact functions written here is only preliminary as after thorough discussions with my mentor our priorities function-wise could change. Every listed implementation is going to be accompanied with testing code, pull requests are going to be created after every sub is done, roughly every week.

MAY 18 - JUNE 1 (Week 1,2)

The first requirement of the project is to move the benchmarks that are still in the main Sympy repo into the Sympy-Benchmarks repo, this is going to be somewhat of a rewrite of the benchmarks written in the Sympy repository as ASV requires benchmarks to be written in a certain way.

June 1 - June 15 (Week 3,4)

Add Benchmarks for the geometry module, the geometry module is currently not benchmarked at all except for the Polygon class, during these two weeks I'll be adding benchmarks to the other classes in the module.

Phase 1 Evaluation

June 15 - June 29 (Week 5, 6)

Add Benchmarks for the series module, the series module includes series expansions like limits, sequences, Fourier series and formal power series, some of these will have multiple benchmarks covering edge cases to be representative of a typical user's inputs.

June 29 - July 13 (Week 7,8)

Add Benchmarks for the Parsing and the Functions submodule.

Phase 2 Evaluation

July 13 - July 27 (Week 9,10)

Adding Benchmarks for the ntheory, sets and vector submodules.

July 27 - August 10 (Week 11,12,13)

Discussions of how to setup a stable hardware machine running ASV benchmarks each commit, writing of any scripts required to run the benchmarks every commit in a stable manner and raising warnings if a decline in speed is detected, if this ends with more free time then benchmarks for the unify and stats modules could be written.

If the schedule goes as planned, these modules should be covered (to a great extent) by the end of the 3 months:

  • Geometry
  • Series
  • Functions
  • Parsing
  • Vector
  • Sets
  • Ntheory
  • Unify
  • Stats

The specific modules are up for discussion with the mentor and other members of the Sympy community, naturally their could be more used/important modules that members of the community would like to get benchmarked more than the modules I mentioned.

My Contributions to Sympy

Pull Requests

The following is a list of pull requests i created in Sympy, more to be added until the end of the application period.

#18954 Testing: Adding Tests for Geometry module. (Open)

#18908 Printing: Added Airy functions to the SciPyPrinter class. (Merged)

#18883 Benchmarking: Added benchmarks for the Polygon class in the geometry module. (Closed)

#66 Benchmarking: Added benchmarks for the Polygon class in the geometry module using the ASV package. (Merged)