GSoC 2013 Application Shravas Rao: Coding Theory - sympy/sympy GitHub Wiki

Hi, I'm Shravas Rao, I'm a senior at MIT studying mathematics and computer science, and will be starting the PhD program in computer science at NYU in the fall.

Contact Information

E-mail: [email protected]

Github username: shravas

Irc handle: shravas

Programming and Mathematical Experience

I have significant experience programming in C, Java, and Python. I first learned C in middle school, using it to compete in various programming competitions. I first learned Java in high school, as it was the language used in the AP Computer Science curriculum. I later used Java in a software engineering class I took in college. Finally, I learned Python my freshman year in school, using it for a bioinformatics research project. I later used Python in my school's introductory computer science course.

My favorite feature of python are list comprehensions. For instance, given two lists of integers, x and y, a list containing all sums of a number from x and a number from y can be obtained by the following.

[a+b for a in x for b in y]

In cases like these, list comprehensions make the code look more elegant, than if nested for loops were used. In sympy itself, I especially enjoy the Permutations module, and in particular, I think it is really cool how permutations can be made to act on any list.

I usually use my laptop which runs on Window 7, but I will often use the computers in my school's computer labs, which are Ubuntu. My preferred editor is Notepad++, though sometimes I will used IDLE if I only need to make a few small changes. I have used the version control system svn in various cases, including my freshman year research project, my software engineering class, and while editing lecture notes as a part of an assignment for another class. I am fairly new to git, but have been getting used to it while I also find my way around sympy.

I also have a fair bit of background in mathematics. Courses that I have taken that are also relevant to this project include Linear Algebra, Algebra I, Algebra II, and Algebraic Combinatorics. I am also currently taking a course on Coding Theory, which is what sparked my interest in the subject and this project.

Finally, if selected, I plan on spending the vast majority of my summer working on this project. In particular, because I start school in the end of August, I will try to work extra (maybe around 50-60 hours a week) before then to make up for the lost time I will have after school starts again. However, I do plan on sticking with SymPy once this summer is over to help keep up this project, and also contribute in other ways.

Contributions to SymPy:

The following are my contributions to SymPy thusfar

  • Pull Request 2059: Implement prev_lexicographic and next_lexicographic

Project:

The overarching goal of my project is to add coding theory functionality to sympy, which does not exist now. Coding theory is the study of Error Correcting Codes, which can be thought of as a function f from F^k to F^n, for some field F, so that no two elements in the image are very "similar" to each other. More specifically, two vectors from F^n are similar if they agree on many of their coordinates. Such a code can give a way to send messages that are vectors in F^k through a noisy channel, so that it's not ambiguous what the message is. Coding theory has many applications and is widely used in things like music CDs and telephone transmissions. Because of this, and interest from SymPy in implementing coding theory, I hope that this project will be useful.

For this project, I would like to focus on linear codes. Linear codes are functions that are linear transformations. This means that if f(a) and f(b) are codewords, then so is f(a+b), and if k is an element of F, then kf(a) and kf(b) are also codewords. Linear codes are often easier to work with, as we can use ideas from linear algebra to work with them. On the other hand, linear codes can be very powerful, and are used in practice. There should be enough work relating to just linear codes to make a reasonable project for the summer.

The original inspiration for this project comes from the course I'm currently taking on Coding Theory. In addition to its practical uses, I've found that many of the constructions and algorithms are very nice for their own sake, and fun to work with. Although this was not a project listed on the ideas page, there seems to be some interest for this type of project after discussing it on the mailing list.

Implementation Details:

The work involved with my project will be divided up into three different parts, including implementing finite fields, implementing general linear codes, and implementing more specific linear codes.

  • Finite Fields: As mentioned previously, linear codes are use finite fields. Currently, SymPy only supports cyclic finite fields, so it will be helpful to implement arbitrary finite fields. Although these would be developed to use for this project, I hope that implementing arbitrary finite fields will be useful in many other contexts. Finally, once finite fields are implemented as domains, matrix algorithms and operations on matrices whose entries are from finite fields will become available to be used. These are also needed for this project. In case matrices do not support domains by the time the summer starts, there are a few workarounds suggested to be able to proceed on working on this project (one of these was to make these classes a subclass of Basic).

In particular, I plan to implement non-cyclic finite fields by creating a class analogous to polys/domains/modularinteger.py, but for a quotient ring of the polynomial ring over a cyclic finite field instead. Using this class, I plan to modify polys/domains/finitefield.py to incorporate non-cyclic finite fields. Because non-cyclic finite fields can be defined as F[x]/f for any irreducible polynomial, it will be allowed for the f to be provided. If only the size of the finite field is provided, then such an f can be chosen arbitrarily. Because non-cyclic finite fields are constructed slightly differently from cyclic finite fields, it might even be useful to split these up into two different classes. Finally, NZMATH [2] provides an implementation of both cyclic and noncyclic finite fields in Python, so it will be useful to consult that when working on this part of the project.

  • Linear Codes: There will be a class for linear codes, which will serve as the base for my project. This class will support any linear code, and will include the following.

    • Constructor: The constructor will take in the k by n generator matrix M representing the linear transformation from F^k to F^n that defines the code. If given, the constructor will also take in the minimum distance between any two codewords, and otherwise will calculate it on its own.
    • Encoding: Given a vector v in F^k, will return the encoding of v according to M, which is just vM
    • Checking codewords: Givien a vector v in F^n, will decide if v is a codeword or not
    • Decoding: Given a vector v in F^n, will return the codeword v' with the closest distance to v

All of the above should be straight forward to implement, besides trying to decode a vector from F^n. For general linear codes, there does not seem to be a good way to do this. This might just be implemented using a brute force algorithm, just so that some decoding function exists, rather than none at all. In addition to the above, I plan to add at least the following.

- Outputting basic properties of a code, including the length, dimension, minimum distance between two codewords, and the finite field used
- Outputting more involved properties of a code, including the parity check matrix, 
- Operations on codes, including concatention and direct sums
- Auxillary methods that might have uses outside of this project, including a function that returns the hamming weight of a vector
  • Specific Linear Codes: There are a few specific linear codes, such as the Hadamard, Hamming, Reed Muller, Reed Solomon, and more, that are often used. In these cases, there are often more efficient algorithms for the tasks described above, especially in the case of decoding. Additionally, it should be possible to have constructors for these codes that do not require the generator matrix. For instance, the Hadamard code can just be defined by its length, or by its dimension, and there will be constructors that will take this into account.

Interface:

The following should give an idea as to how I expect this will work. Matrices do not yet support domains in SymPy, but it should be assumed that all of the examples listed below are working over F_2.

#Create a basic repetition code of length 6 and dimension 2 directly from the matrix
>>> C = LinearCode(Matrix([ [1,1,1,0,0,0],[0,0,0,1,1,1] ]))
LinearCode([1,1,1,0,0,0]
[           ]
[0,0,0,1,1,1])

#Using C, we can encode messages
>>> C.encode(Matrix([ [0, 1] ]))
[0, 0, 0, 1, 1, 1]

#Check if a given word is a codeword
>>> C.isCodeWord(Matrix([ [1,1,1,0,0,0] ]))
True

>>> C.isCodeWord(Matrix([ [1,1,1,1,1,0] ]))
False

#We can also decode messages
>>> C.decode(Matrix([ [1,1,1,0,0,0] ]))
[1, 0]

>>> C.decode(Matrix([ [1,1,1,1,1,0] ]))
[1, 1]

#Some basic properties of the code
>>> C.length
6

>>> C.dimension
2

>>> C.distance
3

#Gives the parity check matrix
>>> C.parityCheckMatrix
[1, 0]
[    ]
[1, 0]
[    ]
[1, 0]
[    ]
[0, 1]
[    ]
[0, 1]
[    ]
[0, 1]

#Creates a Hadamard code by just giving the length
>>> D = HadamardCode(4)
HadamardCode(4)

#We can also encode, check if something is a codeword, and decode for this code
>>> D.encode(Matrix([ [1, 1] ]))
[0, 1, 0, 1]

>>> D.isCodeWord(Matrix([ [1, 0, 0, 0] ]))
False

>>> D.decode(Matrix([ [1, 0, 0, 0] ]))
[0, 0]

#Create a code that is the direct sum of C and D
>>> E = C.directSum(D)

#This code can also encode, check if something is a codeword, and decode
>>> E.encode(Matrix([ [0, 1, 1, 1] ]))
[0, 0, 0, 1, 1, 1, 0, 1, 0, 1]

#Gives the Hamming weight of a vector 
>>> hammingWeight(Matrix([ [1, 0, 0, 1] ]))
2

Timeline

May 27th - June 17th: This is the community bonding period, and as suggested by Google, I'll take this time to learn more about how SymPy goes about things, but also to learn more about the code, and to understand where my project will fit in. In addition, I will use a lot of this time to finalize my design for this project.

June 17th - June 4th: Implement and test non-cyclic finite fields. My first pull request would be around this time.

June 4th - July 11th: Being working on the main linear codes class, and focus on coding the first four functions described above.

July 11th - July 22th: Finish working on the main linear codes class, and start adding some of the other functions mentioned. My second pull request would be around this tim.

July 22th - August 1st: Implement a subclass for Hadamard codes, andS a subclass for Hamming codes

August 1st - August 15th: Implement subclasses for Reed Muller and Reed Solomon codes. My third pull request would be around this time.

August 15th - August 22nd: This mostly exists as a buffer in case I get behind schedule, but if time exists, look over the functionality of the preceding classes, and add anything useful that is not already added.

August 22nd - September 23rd: Use this time to clean up code and work on finishing up testing and documentation leftover from the summer. I start school around here, so I will not be able to devote as much time, which is why this period is much longer than recommended.

References

[1] - Essential Coding Theory Notes http://people.csail.mit.edu/madhu/ST13/

[2] - NZMATH - http://tnt.math.se.tmu.ac.jp/nzmath/