I have to say that I am quite pleased with how the workshop played out. We planned the whole thing to be hands on: people were strongly encouraged to work on projects, i.e., to write code preferably together, in addition to attending talks. Those who attended a Sage Days workshop in the past, will know what workshop format I am referring to. Continue reading “Report: Workshop on Efficient Linear Algebra for Gröbner Basis Computations”
If you care about the stuff I care about (and why else would you read this blog?) you might get excited about a few changes in Sage.
Efficient linear algebra for
The very first non-trivial patch I ever produced for Sage was about interfacing with NTL for dense linear algebra over (I was interested in algebraic attacks against AES at the time). Here’s William’s reply:
Your NTL patch worked perfectly for me first try. I tried more benchmarks (on Pentium-M 1.8Ghz).
This is pretty good; vastly better than what’s was in SAGE by default, and way better than PARI. Note that MAGMA is much faster though (nearly 8 times faster):
MAGMA uses (1) […] and (2) a totally different algorithm for computing the echelon form. […] As far as I know, the MAGMA method is not implemented anywhere in the open source world But I’d love to be wrong about that… or even remedy that.
Well, that was 2006. Fast forward to the year 2011 and we get the following timings for computing the reduced row echelon form of a 1,000 x 1,000 matrix over : Sage 4.7.2 takes 36.53 seconds , NTL 5.4.2 takes 31.06 seconds and Magma 2.15 does it in 0.87 seconds. So essentially, the situation didn’t change at all for the better.
With Sage 4.8 this situation changes dramatically and we get that Sage performs this computation in 0.08 seconds, that’s 450 times faster than Sage 4.7.2. This is because M4RIE was merged in Sage 4.8. Hence, Sage is now (in some cases by far) fastest system to do linear algebra with dense matrices over for and usually also for .
Efficient linear algebra for
One can tell a similar story for for, say, small to medium sized primes . In Sage 4.7.2 it took 1.12* seconds to multiply two 1,000 x 1,000 matrices over (although you always had the option to call LinBox explicitly which was way faster but took more memory). With Sage 4.8 the same computation takes *0.16 seconds. For comparison, Magma 2.15 takes 0.22 seconds. So here again Sage moved from poor performance to best in class performance between 4.7.2 and 4.8 simply by making proper use of available libraries.
Viable Alternative yet?
Overall, the story for dense linear algebra in Sage for small finite fields is as follows.
|M4RI||Fastest implementation or equal performance depending on platform|
|LinBox||Decent performance, but faster implementations are known in the literature. Also, Magma is a bit faster on my machine.|
|prime <||LinBox||Fastest implementation or equal performance depending on platform.|
|for||Generic||Very poor performance, but some work is being done.|
So, once we fix that last row Sage finally achieves “viable alternative” quality when it comes to dense linear algebra over if is .
by Claude-Pierre Jeannerod, Clément Pernet, Arne Storjohann is now available on the archive. I like this paper a lot and we also referenced it in both the M4RI elimination paper and the M4RIE paper so three cheers that it’s now available.
Abstract: Transforming a matrix over a field to echelon form, or decomposing the matrix as a product of structured matrices that reveal the rank profile, is a fundamental building block of computational exact linear algebra. This paper surveys the well known variations of such decompositions and transformations that have been proposed in the literature. We present an algorithm to compute the CUP decomposition of a matrix, adapted from the LSP algorithm of Ibarra, Moran and Hui (1982), and show reductions from the other most common Gaussian elimination based matrix transformations and decompositions to the CUP decomposition. We discuss the advantages of the CUP algorithm over other existing algorithms by studying time and space complexities: the asymptotic time complexity is rank sensitive, and comparing the constants of the leading terms, the algorithms for computing matrix invariants based on the CUP decomposition are always at least as good except in one case. We also show that the CUP algorithm, as well as the computation of other invariants such as transformation to reduced column echelon form using the CUP algorithm, all work in place, allowing for example to compute the inverse of a matrix on the same storage as the input matrix.
Finally, we finished our paper about Gaussian elimination in the M4RI library.
Abstract: In this work we describe an efficient implementation of a hierarchy of algorithms for Gaussian elimination upon dense matrices over the field with two elements (). We discuss both well-known and new algorithms as well as our implementations in the M4RI library, which has been adopted into Sage. The focus of our discussion is a block iterative algorithm for PLE decomposition which is inspired by the M4RI algorithm. The implementation presented in this work provides considerable performance gains in practice when compared to the previously fastest implementation. We provide performance figures on x86_64 CPUs to demonstrate the alacrity of our approach.
Update: arXiv link.
“A Sage Days workshop around the theme of Algorithms in Number Theory and FLINT.”
See http://wiki.sagemath.org/SageFlintDays for more information and registration.
PS: I’ll be talking about M4RI(E) … big surprise.
Now, that we have a decent PNG reader/writer in M4RI, it’s much easier to get some challenge matrices out of the library. Below, I list and link a few such matrices as they appear during Gröbner basis computations.
|HFE 25 matrix 5 (5.1M)||12307 x 13508||0.07600||1.03||0.59||0.81|
|HFE 30 matrix 5 (16M)||19907 x 29323||0.06731||4.79||2.70||4.76|
|HFE 35 matrix 5 (37M)||29969 x 55800||0.05949||19.33||9.28||19.51|
|Mutant matrix (39M)||26075 x 26407||0.18497||5.71||3.98||2.10|
|random n=24, m=26 matrix 3 (30M)||37587 x 38483||0.03832||20.69||21.08||19.36|
|random n=24_ m=26 matrix 4 (24M)||37576 x 32288||0.04073||18.65||28.44||17.05|
|SR(2,2,2,4) compressed, matrix 2 (328K)||5640 x 14297||0.00333||0.40||0.29||0.18|
|SR(2,2,2,4) compressed, matrix 4 (2.4M)||13665 x 17394||0.01376||2.18||3.04||2.04|
|SR(2,2,2,4) compressed, matrix 5 (2.8M)||11606 x 16282||0.03532||1.94||4.46||1.59|
|SR(2,2,2,4) matrix 6 (1.4M)||13067 x 17511||0.00892||1.90||2.09||1.38|
|SR(2,2,2,4) matrix 7 (1.7M)||12058 x 16662||0.01536||1.53||1.93||1.66|
|SR(2,2,2,4) matrix 9 (36M)||115834 x 118589||0.00376||528.21||578.54||522.98|
The first three rows are from GB computations for the hidden field equations cryptosystem (those matrices were provided by Michael Brickenstein). The “mutant” row is a matrix as it appears during a run of the MXL2 algorithm on a random system (I believe). It was contributed by Wael Said. The rows “random n=25,m=26” are matrices as they appear during a GB computation with PolyBoRi for a random system of equations in 24 variables and 26 equations. The remaining rows are matrices from PolyBoRi computations on small scale AES instances. Those rows which have “compressed” in their description correspond to systems where “linear variables” were eliminate before running the Gröbner basis algorithm.
The last three columns give running times (quite rough ones!) for computing an echelon form (not reduced) using (a) the M4RI algorithm, (b) PLE decomposition and (c) a first implementation of the TRSM for trivial pivots trick. As you can see, currently it’s not straight-forward to pick which strategy to use to eliminate matrices appearing during Gröbner basis computations: the best algorithm to pick varies between different problems and the differences can be dramatic.
I had the pleasure to attend a meeting of the LinBox developers this week in Raleigh, NC. One of the question that came up was how to exchange and store matrices. For example, matrices for whose “class” one wants to find/implement dedicated algorithms. To give a more concrete example: matrices appearing during Gröbner basis computations have a special structure which allows to reduce them faster than random matrices (cf. this paper or this code). Hence, we’d like to have some sort of format to store such matrices such that we can work on these dedicated algorithms. Of course, such a format should be flexible, human readable (preferably writable as well) and reasonably simple. I’m not sure how the LinBox crew would think about me blogging about details of their meeting, hence I’ll only use this opportunity to plug my own proposal (at least for reasonably dense matrices): PNG images.
The PNG file format definitely is readable, since there exists many viewers for it (you are probably using one right now). The format can also be edited thanks to GIMP and friends. That is, we can draw matrices! Furthermore, PNG allows to use lossless compression, i.e., it can compress the actual image data using GZIP. Finally, it is pretty flexible: it supports between one (grayscale) and four (RGB + alpha) channels and various bit depths per channel (1,2,4,8,16). Hence, it can store between 1 bit and 64 bits per pixel. Thus, for finite fields up cardinality we can store each entry simply as a pixel. If we pick our colour assignment right, the pictures even make sense (using the convention that darker is larger as an integer for example).
Of course, the whole format is fundamentally biased towards dense matrices. In fact, we’ve been using 1-bit PNG images as data storage for M4RI matrices for a while now: both Sage and PolyBoRi use the GD library to write M4RI matrices to disk. However, using GD has some shortcomings such as high memory requirements (the whole image is constructed in RAM before being written to disk). This week, I implemented reading/writing of PNG 1-bit images feature directly in the M4RI library using libpng directly. This allows to save a lot of memory and for some cool other features such as control over the GZIP compression level, custom comments, “unknown chunks” as attachments, etc.
I’ve also conducted some experiments to get an impression how well this format works in terms of storage space and loading time:
|file size||matrix dimension||density||loading time on 2.6Ghz i7|
|2.8M||11606 x 16282||0.03532||0.190s|
|5.1M||12307 x 13508||0.07600||0.205s|
|16M||19907 x 29323||0.06731||0.619s|
|30M||37587 x 38483||0.03832||1.565s|
|36M||115834 x 118589||0.00376||12.132s|
|37M||29969 x 55800||0.05949||1.685s|
|39M||26075 x 26407||0.18497||0.847s|
To me, the above table – which lists some matrices from Gröbner basis computations over GF(2) – suggests that the format is reasonably efficient. However, I don’t really have anything to compare with, so my sense might be off. Still, compared to some ASCII based formats out there, it’s pretty competitive, as far as I can tell. Note, however, that the above file sizes were produced using GZIP compression level 9 which takes pretty long to write. Using a lower level (such as the default) produces slightly larger files (about 10%-20% depending on the structure).
Finally, wouldn’t it be very awesome if we could use these pictures when debugging our code? So who speaks GDB’s macro language?