GSoC 2024 : Adding the NumPy Module to pocketpy by Anurag Bhat - pocketpy/pocketpy GitHub Wiki
This report summarizes my work during GSoC 2024, on the project Adding the NumPy Module with core feature set to pocketpy.
About Me
I am Anurag Bhat, a fourth-year undergraduate student pursuing a major in Computer Science from the Indian Institute of Technology (IIT) Jodhpur. I'm particularly interested in the fields of mathematics, algorithms and fast computations. I usually work on open source PyData repositories and would love to contribute and collaborate on interesting projects.
Overview
pocketpy is a lightweight Python interpreter primarily designed for game scripting, boasting a vast feature set encompassing various Python modules. It's competitive performance and elegant syntax have made it a user choice for scientific computations. To enhance its capabilities, this project focuses on adding the core feature set from the NumPy library to pocketpy.
For an in depth understanding of the idea, you can go through my submitted proposal. The project focused on adding the core routines of NumPy, namely the array and the random module to pocketpy.
Phase by Phase Synopsis
1. Community Bonding
- Go through the pocketpy codebase and documentation.
- Understand the relevant libraries required for my project :
a. xtensor
b. pybind11
2. Phase 1
- Prepare a C++ numpy equivalent API, by wrapping around xtensor classes.
- Figure out how the C++ template implementations will be exposed to the python side, essentially a dispatch mechanism.
3. Phase 2
- Write bindings for a common target, referred to as the
ndarray_base
class. - Implement detailed classes for each numpy dtype, deriving from the common base.
- Write robust and isolated tests for each implementation.
4. Phase 3
- Make custom implementations to optimize disk space and improve xtensor results.
- Resolve bugs which come up during testing and prepare final pull requests for pocketpy v1.x / pocketpy v2.
Functionality
Name | List | Supported |
---|---|---|
Dtype | bool , int8/int16/int32/int64 , float32/float64 , double |
✅ |
Classes | ndarray , random |
✅ |
Properties | dtype , ndim , size , shape |
✅ |
Binary Op. | add , sub , mul , truediv , pow , matmul , getitem , setitem , len , eq , ne |
✅ |
Bitwise Op. | and , or , xor , invert |
✅ |
Representation | repr , str |
✅ |
Boolean Op. | all , any |
✅ |
Aggregation | sum , prod , min , max , mean , std , var |
✅ |
Search / Sort | argmin , argmax , sort , argsort |
✅ |
Shape Manipulation | reshape , resize , repeat , squeeze , transpose , flatten |
✅ |
Miscellaneous | astype , copy , tolist |
✅ |
Array Creation | array , ones , zeros , identity , full , arange , linspace |
✅ |
Trigonometry | sin , cos , tan , arcsin , arccos , arctan |
✅ |
Exponent | exp , log , log2 , log10 |
✅ |
Rounding | ceil , floor , round |
✅ |
Miscellaneous | abs , concatenate , allclose |
✅ |
Random No. Gen. | rand , randn , randint , uniform |
✅ |
Contributions
-
Issues - These are the issues I faced throughout my coding period. You can have a look at them here.
-
Pull Requests - These are the PR's I made throughout my coding period. You can have a look at them here.
You can have a look at the final code submission to pocketpy main here.
Future Work
1. Write bindings for dtypes
which are already supported in our C++ API. These include the following -
-
Un-signed datatypes:
uint8
,uint16
,uint32
,uint64
. -
Complex datatypes:
complex64
,complex128
.These
dtypes
are present innumpy.hpp
include file and work with most of the implemented functionality. Although I could not expose them to python falling short of time, I believe this would be a relatively easy task. I plan to do attempt this soon.
2. Investigate slower build in Windows MSVC machines. Some potential reasons for that are -
-
Slow xtensor implementations or inclusion of many interlinked external headers.
-
Our dispatch mechanism logic can be slow. We can try dispatching on the python side directly and check if that eases out the build time.
-
Custom pybind implementation in pocketpy is slow. Since the numpy code was first implemented for original pybind11 and then modified in accordance with our pybind, this can be tested and ruled out with relative ease.
The best method for accurately figuring this out is to use build profilers. Throughout GSoC I did my development on a MAC OS, so I never fell prey to slower builds but I do plan to get my hands on a Windows MSVC machine and try this out. If there are any bottlenecks, I would like to report and get them fixed.
3. Add pressure tests to check the robustness of numpy and find possible memory leaks in pybind11.
Conclusion
I am quite satisfied with the current state of the numpy project right now. My overall target of adding an end to end numerical module to pocketpy by the end of this summer seems closer than ever. The next phase of development should target improving build speed, low level optimizations and heavy testing.
My tenure lived upto its expectation. I had an amazing experience past summer working for pocketpy. Coming from a python heavy skillset, I got to learn C++ from the ground level up. Learning the nitty-gritties of CMake and building executables was definitely the most fun and challenging part.
I am highly grateful to my mentor, blueloveTH for being flexible and allowing me to deviate from the proposed project plan whenever needed. I would also like to thank other contributors like ykiko for being readily available and always teaching me instead of directly telling the answer to my doubts.
I plan to keep contributing to pocketpy, trying to finish the work proposed in the future work. I would also like to pitch in as a co-mentor for GSoC'25.