SymPy Group Theory (GSoC 2017) gsoc, sympy, computational group theory

The order method

Last week and some of this one I was working on changing the order() method of FpGroups. Currently SymPy attempts to perform coset enumeration on the trivial subgroup and, if it terminates, the order of the group is the length of the coset table. A somewhat better way, at least theoretically, is to try and find a subgroup of finite index and compute the order of this subgroup separately. The function I’ve implemented only looks for a finite index subgroup generated by a subset of the group’s generators with a pseudo-random element thrown in (this can sometimes give smaller index subgroups and make the computation faster). The PR is here.

The idea is to split the list of generators (with a random element) into two halves and try coset enumeration on one of the halves. To make sure this doesn’t go on for too long, it is necessary to limit the number of cosets that the coset enumeration algorithm is allowed to define. (Currently, the only way to set the maximum number of cosets is by changing the class variable CosetTable.coset_table_max_limit which is very large (4096000) by default - in the PR, I added a keyword argument to all functions relevant to coset enumeration so that the maximum can be set when calling the function.) If the coset enumeration fails (because the maximum number of cosets was exceeded), try the other half. If this doesn’t succeed, double the maximum number of cosets and try again. Once (if) a suitable subgroup is found, the order of the group is just the index times the order of the subgroup. The latter is computed in the same way by having order() call itself recursively.

The implementation wasn’t hard in itself but I did notice that finding the subgroup’s presentation was taking far too long in certain cases (specifically when the subgroup’s index wasn’t small enough) and spent a while trying to think of a way around it. I think that for cyclic subgroups, there is a way to calculate the order during coset enumeration without having to find the presentation explicitly but I couldn’t quite work out how to do that. Perhaps, I will eventually find a way and implement it. For now, I left it as it is. For large groups, coset enumeration will take a long time anyway and at least the new way will be faster in some cases and may also be able to tell if a group is infinite (while coset enumeration on the trivial subgroup wouldn’t terminate at all).

Now I am going to actually start working on homomorphisms which will be the main and largest part of my project. I’ll begin by writing the GroupHomomorphism class in this coming week. This won’t actually become a PR for a while but it is easier to do it first because I still have exams in the next several days. After that I’ll implement the Knuth-Bendix algorithm for rewriting systems (I might make a start on this later this week as I’ll have more time once the exams are over). Then I’ll send a PR with rewriting systems and once that’s merged, the GroupHomomorphism class one, because it would depend on rewriting systems.

The Smith Normal Form

Last week I was working on implementing the Smith Normal Form for matrices over principal ideal domains. I’m still making corrections as the PR is being reviewed. I used the standard algorithm: use invertable (in the ring) row and column operations to make the matrix diagonal and make sure the diagonal entries have the property that an entry divides all of the entries that come after it (this is described in more detail on wikipedia for example). I ran into trouble when trying to determine the domain of the matrix entries if the user hadn’t explicitly specified one. Matrices in SymPy don’t have a .domain attribute or anything similar, and can contain objects of different types. So, if I did attempt to find some suitable principal ideal domain over which to consider all of them, the only way would be to try a few of the ones that are currently implemented until something fits, and that would have to be extended every time a new domain is implemented and generally sounds tedious to do. I’ve asked on the Group Theory channel if there was a better way and that started a discussion about changing the Matrix class to have a .domain or a .ring attribute and have the entries checked at construction. In fact, this has been brought up by other people before as well. Unfortunately, adding this attribute would require going over the matrix methods implemented so far and making sure they don’t assume anything that might not hold for general rings (especially non-commutative ones: like the determinant in its traditional form wouldn’t even make sense; however, it turns out there are several generalisations of determinants to non-commutative rings like quasideterminants and Dieudonné determinant). And this would probably take quite a while and is not directly related to my project. So in the end we decided to have the function work only for matrices for which a .ring attribute has been added manually by the user. For example,

>>> from sympy.polys.solvers import RawMatrix as Matrix
>>> from sympy.polys.domains import ZZ
>>> m = Matrix([0,1,3],[2,4,1])
>>> setattr(m, "ring", ZZ)

Hopefully, at some point in the future matrices will have this attribute by default.

The Smith Normal Form was necessary to analyse the structure of the abelianisation of a group: abelian groups are modules over integers which is a PID and so the Smith Normal form is applicable if the relators of the abelianisation are written in the form of a matrix. If 0 is one of the abelian invariants (the diagonal entries of the Smith Normal form), then the abelianisation is infinite and so must be the whole group. I’ve added this test to the .order() method for FpGroups and now it is able to terminate with the answer oo (infinity) for certain groups for which it wouldn’t terminate previously. I hope to extend this further with another way to evaluate the order: trying to find a finite index cyclic subgroup (this can be achieved by generating a random word in the group and considering the coset table corresponging to it), and obtain the order of the group by multiplying the index with the order of the cyclic subgroup. The latter could be infinite, in which case the whole group is. Of course, this might not always terminate, but it will terminate in more cases than coset enumeration applied directly to the whole group. This is also what’s done in GAP.

I have begun working on it but this week is the last before my exams, and I feel that I should spend more time revising. For this reason, I probably wouldn’t be able to send a PR with this new test by the end of the week. However, it would most likely be ready by the end of the next one, and considering that the only other thing I planned to do until the first evaluation period was to write (the main parts of) the GroupHomomorphism class assuming that the things it depends on (e.g. rewriting systems) are already implemented, I believe I am going to stay on schedule.

The start

GSoC officially starts tomorrow but I’ve already begun working on my project because of exams later in June. So far everything is going according to plan. I’ve implemented the subgroup() method for PermutationGroups and FpGroups as well as is_subgroup() for FpGroups. That was straight-forward except this one moment where I wanted to avoid creating a new instance of FreeGroup and rewriting the relators returned by reidemeister_presentation() in terms of this new instance because it seemed inelegant. I spend quite a while trying to come up with a better way and even started thinking of reimplementing the FreeGroup class but after a discussion with one of my mentors on SymPy’s Group Theory channel decided to create a new instance after all. It shouldn’t affect the performance too much anyway and reimplementing a whole class would be quite extreme.

Then I looked into improving the techniques for simplifying the presentation of subgroups. I hadn’t been sure about whether there would be much to do but actually I found that the code had quite a few redundant bits and there were some more serious potential problems with it, like a loop that deleted elements of the list over which it iterated. I ended up rewriting it to be more readable and making sure the list of relators is traversed properly. Additionally, the reidemeister_presentation() code used to apply the simplification techniques 20 times in a for loop which in some cases would be excessive and in others not enough - I replaced it with a while loop so that the representaion is simplified until no further improvement is achieved.

While I was at it, I found myself modifying some of the FreeGroupElement methods either to extend their functionality or because I noticed that their implementation would lead to bugs in some special cases. I also added a couple of new methods for manipulation of group words. Intuitively, FreeGroupElement words are just symbolic expressions on non-commutative (in the general case) symbols that only admit integer powers. But their implementation in SymPy is different from the regular symbolic expressions which are instances of the Expr class. As a result, the Expr methods, e.g. subs() or simplify() can’t be used with them directly (though of course, theoretically, one could build an equivalent symbolic expression out of a FreeGroupElement, apply the desired Expr method, make sure no fractional powers appear and go back) so equivalent methods have to be written separately. I have extended the group word substitution to probably all (or at least most) cases one might need. Simplification, on the other hand, is still missing. That said, the simplify() method of the Exrp class doesn’t currently do much in terms of simplification of non-commutative expressions so it couldn’t be used anyway (perhaps, some other method would be appropriate but I’m not sure which). However, unlike instances of Expr, group elements only involve one binary operation so the problem is somewhat easier (conceptually). The only real simplification one can make is finding powers of subwords and collecting them into one thing. Say, if we have the word x*y*x*y*x*y, a simpler way to write it would be (x*y)**3 and it would be good to have that done automatically. I might work on implementing that sort of thing somewhere along the way because then simplifications for reidemeister_presentation() can be improved even further, and the output relators will be more readable. Plus it’s just a nice feature to have, though at present I’m not sure about how exactly I would go about that efficiently and how to best store the simplification so that it doesn’t need to be done again.

The PRs I’ve sent so far are here and here and are currently still being reviewed.

The next thing I’m going to work on are a couple of methods that could be used to determine if a given FpGroup is infinite. The one I plan to do first is considering the abelianisation of the group, writing the relators in the form of an integer matrix and essentially finding its Smith Normal form. This will probably occupy me for the next week.

The plan

This blog will follow the progress of my GSoC 2017 project for SymPy. My primary goal is to implement group homomorphisms and set it up in such a way that certain finite presentation groups (implemented as an FpGroup class in SymPy) could be considered as permutation groups via an isomorphism. That would be good because SymPy already has many group-theoretic algorithms implemented for perm groups. To get this to work, I will also need to implement rewriting systems for FpGroups. You can see my application for details.

This post is a revised timeline for the project. I am currently in the middle of a term at uni and I’m going to have exams at the end of it: that’s 12-21 June (with 3 exam-free days in that period). For that reason I won’t be fully productive until 21 June but I’ll do my best (and certainly will catch up afterwards). As such, I think I should do a bit of coding before 30 May. I’ll start with some functionality unrelated to homomorphisms, as explained in my application. My current plan until the end of the First Evaluation period (26-30 June) is as follows:

  • By 30 May: Add a .subgroup() method and a .parent attribute to FpGroups and PermutationGroups. Probably make some improvements in the Reidemeister-Schreier elimination techniques. Submit a PR.

  • 31 May - 11 June: Implement better checks for infinite FpGroups. That will include the abelian_invariants function somewhere in the rings module (I’ll do it for integers unless there is an obvious way to make the same algorithm work for general rings) and a search for a finite cyclic subgroup (which might not terminate but it’s better to give it a try since if a group is infinite, this might works but coset_enumeration is unlikely to terminate). Create a PR by the end of it if it’s finished.

  • 12-25 June My exams start. I’ll start writing the GroupHomomorphism class and its functions with the assumption that rewriting systems and other things I might need are already implemented (which I will actually do later on). I’ll also start writing my evaluation at the end of this period.