GSoC 2012 Application Aleksandar Makelov: Group theory - sympy/sympy GitHub Wiki

About me

Hi, my name is Aleksandar Makelov and I'm currently a freshman at Harvard University. I haven't declared my concentration (major) yet, but I'm certain my main focus will be on mathematics (my primary interest), with strong focus of computer science and physics (my other scientific interests).

I'm from Bulgaria, and my background in the sciences includes many national and international competitions in mathematics and physics (most notably, the International Mathematical Olympiad and the International Physics Olympiad) - and of course the theoretical preparation behind them. Also, from time to time during the last five years I taught myself various concepts from computer science (recursive/iterative algorithms, complexity analysis, data structures,...).

Please don't hesitate to contact me by email ([email protected], [email protected]), GitHub (amakelov), or Skype (petko_voivoda).

My coding skills

I've done coding in several high-level languages (C, C++, C#, Python, Scheme); What I like about Python, as a mathematician, is the simplicity and readability of the language which allow for what I find to be very elegant programs. In the fall of the academic year I took a formal introductory course in computer science taught in C which was in part review and in part new to me (especially the part about web programming). The course culminated in an open-ended project, and I chose to write a PHP-based website aimed at extracting information solely from the number of Google/Yahoo/Bing search results for a given query by cross-comparing it with numbers for related queries. I found the idea extremely interesting and immensely enjoyed implementing it!

For over half a year now I've been coding on a Linux workstation (in a virtual machine running under Windows). I've been doing work mainly in gedit recently; several years ago when I was learning Python I was working under Windows using IDLE; I've also done coding in Dr. Racket and Microsoft Visual Studio.

I cannot say I'm considerably fluent in Python, and I see my level somewhere around intermediate. I'm counting on my background in computer science and mathematics, and my experience in other programming languages, and I'm eager to learn how to write programs that are really "pythonic".

I'd heard of git before, but I started using it because of my involvement in the sympy project. I'm still learning the little tricks, but now that I've submitted my first patch I feel more confident about using it.

My project

My intention is to create a module for computation with (exclusively finite and finitely presented) groups. My proposal was motivated by the following discussion on the mailing list:

http://groups.google.com/group/sympy/browse_thread/thread/999f0ad35dd85994?tvc=2

This idea was not originally on the list, but there is some work already done in this direction in sympy (the combinatorics module). I chose the subject because I enjoy abstract algebra and combinatorics and I'm interested in learning more about these subjects and the efficient ways they can be handled by a computer. Also, this academic year I convinced myself that computations, even in rather "small" groups, can become impossible for a human to carry out manually. That's why I'd be extremely happy to develop a software for this purpose that I completely understand and I can change as I find appropriate. The foremost open-source project for computations with groups, GAP, is written in C so I think that a group-theoretical module would be a valuable addition to sympy (and to all the python code in the world :) ).

I feel I have the mathematical background for the project - this year I'm taking rigorous high-level treatments of Abstract algebra covering Galois theory and Representation theory (see http://www.registrar.fas.harvard.edu/Courses/Mathematics.html : Mathematics 55 and Mathematics 123), and I acquired a solid understanding of many combinatorial concepts through my participation in math Olympiads. Thus I think what makes me a suited candidate is my strong mathematical and not as strong but still present CS background and my motivation for the subject at hand.

Implementation details

The group-theoretical framework will be developed by itself (i.e., without relation to other more general abstract-algebraic objects like semigroups or even magmas as is done in GAP), because groups are fairly general and appear in practice more frequently; if there is need to generalize the notion at a later stage this won't be too hard to do.

The group class

Constructor

There will be a Group class - indeed, the central class of the module - with a constructor accepting different presentations of the group in the sort of obvious way:

  • if given a list of permutations on the natural numbers, the group object returned will be the group generated by them. It'll be nice to have some basic groups (Sn, An, Cn, Dn, ...) hardcoded as a list of generators so that they can be called by name. This presentation is very convenient because it makes problems that may be generally hard (or even undecidable) like membership testing relatively easy. Even though any finite group can be realized as a permutation group, we'll refer to the groups presented using permutations as permutation groups.
  • if given a list of matrices, the group object returned will be the group generated by them (by multiplication). There are many reservations here. Firstly, if the field (if we use matrix multiplication as the operation, a ring won't work in most cases) in which the matrix entries live is infinite the group may be infinite as well. I was advised on the mailing list that sticking with finite groups only would be a better idea - what would you suggest? Secondly, if the field is finite, the group is guaranteed to be finite as well -- but finite fields are not implemented in sympy. Would it be hard to do along the way? There isn't much theory involved, and I'm familiar with it; but the "Handbook on computational group theory" says that computation in finite fields F_q can be made far more efficient by storing certain information in lookup tables for q up to 10^6. Any suggestions here? The usual approach for small matrix groups will be according to [1], either reducing them to permutation groups via a faithful permutation representation, or by extending the notion of a base and a strong generating set that makes the permutation presentation so convenient in the first place.
  • if given a list of generators and relations between them, e.g. [(a,b),(a^5,b^2,baba^-1)] meaning generators a,b such that a^5,b^2 and baba^-1 are the identity element, the group object returned will be the corresponding group. This enters the realm of finitely presented groups which are not necessarily finite; so I'll probably stick with the simpler case when one is sure the group given is finite, and if I have time I might add some functionality for infinite finitely presented groups as well. A powerful technique for analyzing these objects is the Todd-Coxeter algorithm which can, among other things, be used to describe the permutation representation of a group G over the set of cosets with respect to a subgroup H

Properties

Important information about the group will be stored in it once computed, to avoid computing it again. This includes:

  • basic boolean properties: Abelian, (finite), symmetric, alternating, dihedral, cyclic, polycyclic; also, the type of presentation - permutation, generators&relations, matrix
  • size of the group
  • a generating set for the group, in particular one that has nice properties, such as the one produced from the Schreier-Sims algorithm (as a list) - when the group is presented by permutations
  • a base for the group, corresponding to the strong generating set from the previous point
  • references to group objects representing subgroups already computed.
  • ...

Storing the group as the set of all its elements will be avoided at all costs, as it is almost never necessary for successful computing and it is highly inefficient for large groups (and impossible for sufficiently large groups). For permutation groups, the strong generating set from the Schreier-Sims algorithm has the advantage that group elements can be expressed as reasonably short words in this set [1].

Methods

These will be exclusively presentation dependent, since different things can be done with different presentations, and different algorithms are needed. Basics will include:

  • Calculation of orbits and stabilizers (permutation groups)
  • Order of a group (permutation groups / finite finitely presented groups)
  • Order of an element (permutation groups / finite finitely presented groups - by the number of cosets of the cyclic group generated by the element)
  • Outputting all elements of a group as a list (which makes sense only for fairly small groups)
  • Membership testing for permutation groups
  • Testing if a permutation group is "large" (S_n or A_n)
  • Testing for subgroups (easy by membership testing for permutation groups)
  • Testing for normal subgroups (using normal closure)
  • Finding the normal closure of a subgroup (done in [1], optimized for groups of large index by adjoining random conjugates; membership testing necessary)
  • Commutator subgroups & derived series
  • Cosets and transversals for subgroups (done in [1])
  • Quotients: for permutation groups this can be done by considering the quotient as a permutation group acting on the set of cosets; for finitely presented groups, this is more complicated (but done in [1])
  • Algorithms specific for Abelian groups: decomposition according to the structure theorem, optimized algorithms for Abelian groups. Presently the problem with this is that my references don't have a lot to say about Abelian groups specifically, so I might need other sources (or, in the worst case, come up with some naive algorithm (or, in the best case, come up with some efficient algorithm)). Though in [1] there is a broad section for algorithms optimized for polycyclic groups, of which Abelian groups form a subclass.

External methods

Some operations treat several groups in a symmetric fashion and thus it doesn't make much sense for them to be associated with one group object.

  • Direct product of several groups: this can be implemented for permutation groups, but the references don't address this operation explicitly. I came up with one possible way of doing it, but it may not be the most efficient one. Further inquiry is needed here.

Other data structures

Apart from that, the concept of a Schreier vector will be helpful. It encodes information about the orbit of an element in {1,2,...,n} under the action of a permutation group on that set, and in logical terms is nothing more than an indexed list.

Another important object is the so-called permutation word. When we have a permutation group on a large set generated by a small number of permutations, it is more convenient to store group elements as "words" in the generators rather than complete permutations, evaluating them only when we need to, which motivates this simple data structure. Multiplication of words is done by concatenating the corresponding "letter". Words come up in optimized versions of the Schreier-Sims algorithm [1] and are used to provide a "normal" expression in the strong generating set for each group element.

Yet another helpful structure will be the coset table used during the coset enumeration in the Todd-Coxeter algorithm; it holds information about the cosets found so far.

All these can be effectively implemented using python lists.

Representation theory and character tables

On a separate note, I'd like to implement character tables and have at least 'manual' functionality for computing with them. By manual I mean being able to manually (or by reading from a file, if the table is large) input sizes of conjugacy classes and characters of representations and do various things with them (tensoring, direct-summing and decomposing representations into irreducible ones, for starters). I have already implemented this to some extent. Of course, having the group structure somehow embedded in the table will be great, as it will make it possible to automate, for example, the computation of the characters of symmetric powers of a representation (or indeed anything involving the various dependencies between conjugacy classes).

Interface

I'd like to have an interface that is as simple and concise as possible. For example,

# Define the alternating group A4 using a generating set of permutations 
>>> A4 = Group((1,2,3),(2,3,4)) 
Group((1,2,3),(2,3,4))

# Or define the same (well-known) group by name 
>>> A4 = Group(A,4)
Group(A,4)

# Get the order of the group 
>>> A4.order
12

# Determine if the group is Abelian
>>> A4.isAbelian 
False

# Membership testing
>>> A4.member((1,2,3,4)) # not an even permutation
False

# Full list of elements of a group
# GAP output currently; hopefully sympy output will look as simple as that
>>> A4.list
[ (), (2,3,4), (2,4,3), (1,2)(3,4), (1,2,3), (1,2,4), (1,3,2), (1,3,4), 
(1,3)(2,4), (1,4,2), (1,4,3), (1,4)(2,3) ] 

# Define another group, the symmetric group S4
>>> S4 = Group((1,2,3,4),(1,2))
Group((1,2,3,4),(1,2))

# Subgroup & Normality testing
>>> S4.subgroup(A4)
True

>>> S4.normalSubgroup(A4)
True


# And now for something completely different: finitely presented groups.
# This creates the Dihedral group D6 of order 6 (pretend we don't know it's D6)
>>> Spam = Group(a**3,b**2,b*a*b*~a)
# This creates the Subgroup of rotations
>>> Eggs = Group(a**3)
# The index of Eggs in Spam (Todd-Coxeter):
>>> Spam.index(Eggs)
2


# Character tables
# Create a character table for the group S4 with conjugacy classes ordered as
# (), (12), (123), (1234), (12)(34) with respective sizes 1, 6, 8, 6, 3
>>> TS4 = CharacterTable([1,6,8,6,3])

# Add some representations
>>> u = Representation([1,1,1,1,1], 'Trivial')
>>> uprime = Representation([1,-1,1,-1,1], 'Alternating')
>>> v = Representation([3,1,0,-1,-1], 'Standard')

# Append them to the table
>>> TS4.append(u,uprime,v)

# Tensor representations, test for irreducibility & append to table
>>> vprime = v*uprime
>>> vprime.name = 'Another'
>>> v.isIrreducible
True
>>> TS4.append(v)

# Decompose according to table so far (it is still missing one irreducible representation)
# For example, take the canonical representation on the edges of a cube
>>> e = Representation([12,2,0,0,0])
>>> TS4.decompose(e)
[('Trivial', 1), ('Alternating', 0), ('Standard', 2), ('Another', 1)]

and so on...

Testing

The GAP software provides powerful tools for analyzing groups, so it won't be very hard to verify the correctness of the code with some fairly sophisticated examples.

General focus

I feel there is a lot more computational group theory that can go into sympy in the future, and so it is vital that these first steps establish considerably optimized versions of the fundamental algorithms discussed, and indeed of the entire infrastructure. The books [1],[2] offer powerful randomized versions which definitely should be used. Some work has already been done on permutations by Saptarshi Mandal, and it needs a bit of revision to make it more efficient (for example, improving the presentation of permutations in cycle notation). Also, Saptarshi has a pull request involving an implementation of permutation groups that can be used as a basis for subsequent work.

On the whole, in the end I want to have a project that is as extensible and efficient as possible given the time I have.

(A pessimistic) timeline: more permutation groups, less finitely presented groups, no matrix groups

  • Pre-coding (until May 21st)
    • Getting familiar with the books/software resources (e.g. GAP, [1], [2]) and doing more research on the subject
    • Finding a way to represent groups in sympy that would facilitate eventual extensions of the functionality and the incorporation of groups into a more abstract classification of algebraic objects - along the lines discussed above
  • Week 1 (May 21st - May 28th): Beginning of coding
    • Basic functionality: implementing the group object and the basic methods like handling different representations, and placeholders for the basic group properties like Abelian, cyclic... that will be used later & pull request
    • Redesigning the permutation class for better performance - for example, excluding singleton cycles from the cycle decomposition; redesigning the perm_group class in the pull request by Saptarshi to serve the new, more general infrastructure & pull request
  • Weeks 2 & 3 (May 28th - June 11th)
    • Powers and orders of elements, orbits and stabilizers of the action on {1,2,...,n} - probably the most basic algorithms, but they are needed all the time in advanced computations & pull request
    • Schreier vectors and the (randomized) Schreier-Sims algorithm, base & strong generating sets for a group of permutations & pull request
  • Weeks 4 & 5 (June 11th - June 25th)
  • Consequences of the Schreier-Sims algorithm for permutation groups: membership testing and the "rewriting" algorithm (used to provide a canonical expression for a group element in terms of a strong generating set), order of a group, optimized order of an element (the advantage of knowning the order of the group reduces the possibilities), subgroup testing,... & pull request
  • Weeks 6 & 7 (June 25th - July 9th)
  • Coset enumeration by the Todd-Coxeter algorithm, and consequences: index of a subgroup of a (finite) finitely presented subgroup & pull request
  • Weeks 8 & 9 (July 9th - July 23rd)
  • More on permutation groups, and more consequences of the Schreier-Sims algorithm: outputting all elements of a group, the center, pointwise stabilizers & pull request
  • Week 10 (July 23rd - July 30th)
  • More on (finite) finitely presented groups: computing quotients & pull request
  • Week 11 (July 30th - Aug 6th)
  • More on permutation groups: Primitivity of a transitive action & finding block systems for a group action; backtrack searches for elements satisfying certain properties
  • Week 12 (July 6th - Aug 13th): (suggested) Last week of actual coding
  • Catching up with any potentially unfinished work Or otherwise, if everything is OK,
  • Integrating some sort of database of known groups, like the ones used in GAP & pull request
  • Week 13 (Aug 13th - Aug 20th):
  • Look back at the entire code and make improvements & stylistic changes if necessary; more sophisticated testing, benchmarks (if not done already); add details to documentation
  • Week 14 (Aug 20th - Aug 27th):
    • Final evaluations

(An optimistic) timeline: even more permutation groups, a bit more finitely presented groups, no matrix groups

  • Pre-coding (until May 21st)
    • Getting familiar with the books/software resources (e.g. GAP, [1], [2]) and doing more research on the subject
    • Finding a way to represent groups in sympy that would facilitate eventual extensions of the functionality and the incorporation of groups into a more abstract classification of algebraic objects - along the lines discussed above
  • Week 1 (May 21st - May 28th): Beginning of coding
    • Basic functionality: implementing the group object and the basic methods like handling different representations, and placeholders for the basic group properties like (finite), finitely presented, Abelian, ... that will be used later & pull request
    • Redesigning the permutation class for better performance - for example, excluding singleton cycles from the cycle decomposition; redesigning the perm_group class in the pull request by Saptarshi to serve the new, more general infrastructure & pull request
  • Weeks 2 & 3 (May 28th - June 11th)
    • Powers and orders of elements, orbits and stabilizers of the action on {1,2,...,n} - probably the most basic algorithms, but they are needed all the time in advanced computations & pull request
    • Schreier vectors and the randomized Schreier-Sims algorithm, base & strong generating sets for a group of permutations & pull request
  • Weeks 4 & 5 (June 11th - June 25th)
  • Consequences of the Schreier-Sims algorithm for permutation groups: membership testing and the "rewriting" algorithm (used to provide a canonical expression for a group element in terms of a strong generating set), action on the cosets of a subgroup, transversals and quotient groups, order of a group, optimized order of an element (the advantage of knowning the order of the group reduces the possibilities), subgroup testing,... & pull request
  • Weeks 6 & 7 (June 25th - July 9th)
  • Coset enumeration by the Todd-Coxeter algorithm, and consequences: index of a subgroup of a (finite) finitely presented subgroup, presentations of subgroups H < G of finite index in terms of the presentation of G & pull request
  • Weeks 8 & 9 (July 9th - July 23rd)
  • More on permutation groups, and more consequences of the Schreier-Sims algorithm: Primitivity of a transitive action & finding block systems for a group action; backtrack searches for elements satisfying certain properties, outputting all elements of a group, Sylow subgroups, the center, pointwise stabilizers, change of base, product groups... & pull request
  • Week 10 (July 23rd - July 30th)
  • More on (finite) finitely presented groups: computing quotients; switching from a permutation presentation to generators & relations & pull request
  • Week 11 (July 30th - Aug 6th)
  • More on permutation groups: the p-core and the solvable radical
  • Week 12 (July 6th - Aug 13th): (suggested) Last week of actual coding
  • Integrating some sort of database of known groups, like the ones used in GAP
  • Week 13 (Aug 13th - Aug 20th):
  • Look back at the entire code and make improvements & stylistic changes if necessary; more sophisticated testing, benchmarks (if not done already); add details to documentation
  • Week 14 (Aug 20th - Aug 27th):
    • Final evaluations

(A maybe too optimistic) timeline: like the optimistic one, but

  • Sometime
  • (Efficient? We'd be toast without efficiency) finite field arithmetic in sympy & pull request
  • Some other time after that time
  • Matrix groups and some basic methods for them, for example reducing them to permutation groups by left action on the group itself -- terribly inefficient. Hopefully implement some algorithm for finding permutation representations of reasonable size for matrix groups.

Patch Requirement & work not yet merged

Here's a link to my first pull request that got merged to sympy: https://github.com/sympy/sympy/pull/1146#

Currently I'm working on a small module for manual manipulation of character tables (entering characters, taking direct sums, tensor products and duals of representations, decomposing representations into irreducible ones,...) which was motivated by a math course I'm taking this spring. Also, I have to brush up an even smaller module for determination of the Galois group of polynomials with rational coefficients of degrees up to 4.

References:

[1] Derek F. Holt, Bettina Eick, Bettina, Eamonn A. O'Brien, "Handbook of computational group theory", Discrete Mathematics and its Applications (Boca Raton). Chapman & Hall/CRC, Boca Raton, FL, 2005. ISBN 1-58488-372-3

[2] Permutation Group Algorithms, Ákos Seress, Cambridge University Press