GSoC 2019 Report Nikhil Maan: Creating a C and Fortran parser for SymPy - sympy/sympy GitHub Wiki

About Me

My name is Nikhil Maan and I'm an undergraduate at Amity University with a Computer Science and Engineering Major. My project was to create a parser for SymPy that can convert C and Fortran source code to SymPy syntax.

About the Project

The project aimed to create a parser that can convert C and Fortran source code to SymPy syntax. The Application Program Interface(API) for the parsers will be implemented into the SymPy API.

Project Link: Creating a C and Fortran Parser for SymPy

The Plan

The plan for the project was to use Clang to extract the Abstract Syntax Tree(AST) for any provided C source code. Then, I will create a parser which will convert the given AST into SymPy expressions.

Similarly, LFortran will be used to generate and extract the Abstract Syntax Representation(ASR) from Fortran source code and the parser will convert the ASR to SymPy expressions.

The parsers were to be implemented in SymPy and work under SymPy's API.

Mor details about the project and the initial plan can be found in the proposal: C and Fortran Parser Proposal

Work Done

The Parsers have been created and implemented in SymPy as per the plan. They can be found under Sympy's parsing Module. The C parser can be found in the c submodule and the Fortran parser can be found in the fortran submodule. sym_expr contains SymPyExpression which acts as the superclass for the parsers and handle the front-end and API for the parsers.

You can check the daily progress of the project on Check-Ins

The C Parser

The C parser uses Clang's python bindings to parse C source code and generate an AST for it. The parser consists of visitor functions which then visit every node of the AST and process them. The Node Visitors extract important information from the Clang AST and create a corresponding Codegen AST Node with that information. The newly created nodes are compiled into a list of SymPy expressions arranged chronologically.

The C parser currently supports the following features for conversion:

  • Variable Declarations (integers and reals)
  • Assignment (using integer & floating literals and function calls)
  • Function Definitions and Declaration
  • Function Calls
  • Compound statements & Return statements

The Fortran Parser

The Fortran Parser also works similarly. LFortran is used to generate an ASR for the provided Fortran source code. The ASR is then traversed by the parser's visitor functions which create new Codegen AST nodes based on the information retrieved from the ASR node. The new expression nodes are then compiled into a list.

The Fortran Parser currently supports the following features:

  • Variable Declarations (integers and reals)
  • Function Definitions
  • Assignments and Basic Binary Operations

SymPyExpression

SymPyExpression stores and manages the SymPy expressions generated by the parsers. It also acts as the front-end for the parsers. It's API is used to call the parsers to parse the given source code. It stores the list of sympy expressions. The method convert_to_expr or the initializer can be used to generate expressions.

The users can retrieve the expressions using the method return_expr and work with them. They can also generate the source code for other languages for the expressions using the provided methods which use StmPy's code printers.

  • convert_to_c can be used to convert to C source code
  • convert_to_fortran can be used to convert to Fortran source code
  • convert_to_python can be used to convert to Python source code

Here is an example of how to use the parsers:

>>> from sympy.parsing.sym_expr import SymPyExpression
>>> expr = SymPyExpression('int a = 2;', 'c')
>>> expr.return_expr()
[Declaration(Variable(Symbol('a'), type=IntBaseType(String('integer')), value=Integer(2)))]
>>> expr.convert_to_python()
['a = 2']
>>> expr.convert_to_expr('real :: a', 'f')
>>> expr.convert_to_c()
['double a = 0.0']

External Dependencies

Since the parsers use Clang and LFortran to extract information from the given source code, they are dependent on modules which are imported as external dependencies. The dependencies are as follows:

  • C Parser: Clang
  • Fortran Parser: LFortran
  • SymPyExpression: LFortran and Clang

The parsers will work normally if their dependency is installed and will throw ImportError if the required dependency is not installed.

Documentation and Testing

All the modules have been documented according to the needs of expected users of the respective module. I have documented sym_expr and SymPyExpression according to the needs of end-users including multiple examples of how to use it. The c and fortran modules containing the parser implementations have been documented according to the needs of developers that use them. The docstrings for these modules explain how the parsers and the visitor functions work, and what each visitor does.

I have written tests for all the modules. When the dependencies are installed, the parsers are tested for multiple strings of source code by expression comparison to test as much of the visitors as possible. When the dependencies are not installed, the raise statements are tested. The code coverage for the parsers is as follows:

  • sym_expr.py 88%
  • c_parser.py 80%
  • fortran_parser.py 90%

Pull Requests

Both the parsers have been merged into SymPy and are usable in the current development version. The Pull requests with the necessary changes are:

The Meetings:

Blogs:

Future Work

The first stage for the future progress of the parsers will be to test the parsers extensively and let the community use and test the parsers. Then, the parsers can be improved implementing the community feedback.

The parsers can also be improved to include some nodes that could not be implemented this summer. The first ones in the list are Numbers for the Fortran Parser as it was not supported by LFortran yet and string for the C Parser.

The next step after improving the parsers and making them more useful can be to add support for another language like Julia. The parser can further be extended to support Natural Language Processing in the future.