M4RI 20121224

I have just pushed the button to release M4RI 20121224. The main feature of this release is a considerable performance improvement. It all started with Fast matrix decomposition in F2 by Enrico Bertolazzi and Anna Rimoldi showing up on the arXiv. Here’s the abstract

In this work an efficient algorithm to perform a block decomposition (and so to compute the rank) of large dense rectangular matrices with entries in F2 is presented. Depending on the way the matrix is stored, the operations acting on rows or block of consecutive columns (stored as one integer) should be preferred. In this paper, an algorithm that completely avoids the column permutations is given. In particular, a block decomposition is presented and its running times are compared with the ones adopted into SAGE.

… and that comparison made M4RI (which realises this functionality in Sage) look pretty bad. I did’t (and still don’t) share the implicit assumption that avoiding column swaps was the key ingredient in making this code so much faster than ours. I assume the impressive timings are due to a very efficient base case implementation. Anyway, we sat down  and looked for performance bottlenecks the result of which is 20121224. I actually have no idea whether we caught up to the code described in Enrico’s and Anna’s pre-print as they did not publish their sources.

Still, the performance improvements over 20120613 were worth the trouble. Below two plots of the (normalised) leading constants giving the leading constants for multiplication and elimination respectively (more plots on imgur) That is, it plots the running time divided by n^{2.807} \cdot 10^9. In theory these plots should all have slope 0.

Multiplication on Intel Core i7
PLE on Intel Core i7

Finally, here’s the plot for Fast matrix decomposition in F2 which starts very small but has a rather large slope. That’s why I concluded that the performance stems from a very efficient base case. I should get in touch with Enrio and Anna about this.

M4RIE: support for finite fields up to degree 16 added

I committed support for finite fields up to degree 16 to M4RIE a few days ago. Furthermore, the dependency on Givaro for constructing finite fields was dropped.

Don’t get me wrong. Givaro is a fine library, much better than what I wrote for M4RIE. However, it is a C++ library while M4RIE is a C library and the little functionality of finite field arithmetic needed in M4RIE was not that hard to add natively. In the past M4RIE relied on Givaro for running tests and benchmarks, the core library was always free of C++. However, as we plan to add support for high-degree polynomials over matrices over\mathbb{F}_2, we need the ability to create finite extensions of \mathbb{F}_2 on the fly in the core library.

Continue reading “M4RIE: support for finite fields up to degree 16 added”

Linear Algebra for Gröbner Bases over GF(2): LELA

The Efficient Linear Algebra for Gröbner Basis Computations workshop in Kaiserslautern two weeks ago was a welcome opportunity to finally test out LELA, a library specifically written for linear algebra for Gröbner basis computations including for GF(2). The library implements the “Faugère-Lachartre” algorithm (a similar trick, though less developed, appeared before in PolyBoRi) and uses M4RI for dense parts over GF(2).

So, I ran my benchmark matrices through LELA, discovered a bug in the process, then Bradford returned the favour and discovered a bug in M4RI … Finally, below are the timings. The column PLE is the PLE algorithm as implemented in M4RI, M4RI is the M4RI algorithm as implemented in M4RI, GB is a very naive variant of the algorithm LELA uses and LELA is, well, LELA.

problem m n density PLE M4RI GB LELA
HFE 25 12307 13508 0.076 1.0 0.5 0.8 0.56
HFE 30 19907 29323 0.067 4.7 2.7 4.7 3.42
HFE 35 29969 55800 0.059 19.3 9.2 19.5 13.92
Mutant 26075 26407 0.184 5.7 3.9 2.1 12.07
n=24, m=26 37587 38483 0.038 20.6 21.0 19.3 7.72
n=24, m=26 37576 32288 0.040 18.6 28.4 17.0 4.09
SR(2,2,2,4) c 5640 14297 0.003 0.4 0.2 0.1 0.40
SR(2,2,2,4) c 13665 17394 0.013 2.1 3.0 2.0 1.78
SR(2,2,2,4) c 11606 16282 0.035 1.9 4.4 1.5 0.81
SR(2,2,2,4) 13067 17511 0.008 1.9 2.0 1.3 1.45
SR(2,2,2,4) 12058 16662 0.015 1.5 1.9 1.6 1.01
SR(2,2,2,4) 115834 118589 0.003 528.2 578.5 522.9 48.39

What this table means is that one can expect more than an order of magnitude of speed-up when using LELA – which is dedicated to these computations – instead of M4RI – which does not have the specialised algorithm implemented yet. For very small matrices sometimes M4RI/PLE win, but then not by a large margin. The only row where LELA doesn’t do so good is Mutant, which – btw. – is not an F4 matrix but comes from the MXL2 algorithm.  It is possible that LELA’s sparse data structures are not that well equipped to deal with this rather dense matrix.

I am in the process of implementing the algorithm LELA uses in M4RI and will report updated timings here.

Report: Workshop on Efficient Linear Algebra for Gröbner Basis Computations

As you may know, today is the last day of the wokshop on Efficient Linear Algebra for Gröbner Basis Computations that Christian Eder, Burcin Eröcal, Alexander Dreyer and I organised.

I have to say that I am quite pleased with how the workshop played out. We planned the whole thing to be hands on: people were strongly encouraged to work on projects, i.e., to write code preferably together, in addition to attending talks. Those who attended a Sage Days workshop in the past, will know what workshop format I am referring to. Continue reading “Report: Workshop on Efficient Linear Algebra for Gröbner Basis Computations”

Efficient Linear Algebra for Gröbner Basis Computations – June 4-8, 2012 – Kaiserslautern, Germany

Linear algebra plays an important role in modern efficient implementations of Gröbner basis algorithms. Consequently, a number of groups aim at developing linear algebra packages for these computations: we mention the HPAC project, LELA by the Singular team, the FGB package by Jean-Charles Faugère, the M4RI libraries, specialised linear algebra routines in PolyBoRi as well as non-public projects. In this workshop we want to bring researchers interested in this problem and developers of these packages together to discuss and develop solutions. The format of this workshop will be a mixture of talks, coding sprints and design discussions.

Topics will include but are not limited to:

  • presentation of existing software packages and solutions for linear algebra suitable for Gröbner basis computations
  • presentation of scientific results on linear algebra for Gröbner basis computations
  • modular approaches to Gröbner basis computations which allow to swap linear algebra packages
  • approaches to parallelization of linear algebra routines on multicore machines, multiple machines and GPUs.
  • suitable benchmark and test matrices, ideals and their format.

Invited Speakers

  • Brice Boyer (Grenoble, France)
  • Michael Brickenstein (Oberwolfach, Germany)
  • Daniel Cabarcas (Darmstadt, Germany)
  • Jean-Charles Faugère (Paris, France)
  • Bradford Hovinen (Munich, Germany)
  • Sylvain Lachartre (Paris, France)
  • Emmanuel Thomé (Nancy, France)

The workshop will feature mathematical talks, presentations on software and coding sprints.

Registration

There is no registration fee for the workshop. Please email the organizers beforehand if you intend to participate.

It is strongly recommended that participants bring their own laptop for use during the coding sprints.

More information regarding this event is available at http://wiki.lmona.de/events/elagb

Summer School on Tools :: Mykonos, Greece :: 28.5 – 1.6.

Slightly redacted announcement for the 2012 Summer School on Tools below.

Following the success of the ECRYPT Workshop on Tools for Cryptanalysis 2010,the ECRYPT II Symmetric Techniques Virtual Lab (SymLab) is pleased to announce the 2012 Summer School on Tools. Covering selected topics in both symmetric and asymmetric cryptography, this summer school will provide a thorough overview of some of the most important cryptographic tools that emerged in recent years. While the summer school is aimed primarily at postgraduate students, attendance is open to all. Continue reading “Summer School on Tools :: Mykonos, Greece :: 28.5 – 1.6.”

Sage 4.8 is out

If you care about the stuff I care about (and why else would you read this blog?) you might get excited about a few changes in Sage.

Efficient linear algebra for \mathbb{F}_{2^e}

The very first non-trivial patch I ever produced for Sage was about interfacing with NTL for dense linear algebra over \mathbb{F}_{2^e} (I was interested in algebraic attacks against AES at the time). Here’s William’s reply:

Your NTL patch worked perfectly for me first try. I tried more benchmarks (on Pentium-M 1.8Ghz).

[…]

This is pretty good; vastly better than what’s was in SAGE by default, and way better than PARI. Note that MAGMA is much faster though (nearly 8 times faster):

[…]

MAGMA uses (1) […] and (2) a totally different algorithm for computing the echelon form. […] As far as I know, the MAGMA method is not implemented anywhere in the open source world But I’d love to be wrong about that… or even remedy that.

Well, that was 2006. Fast forward to the year 2011 and we get the following timings for computing the reduced row echelon form of a 1,000 x 1,000 matrix over \mathbb{F}_{256}: Sage 4.7.2 takes 36.53 seconds , NTL 5.4.2 takes 31.06 seconds and Magma 2.15 does it in 0.87 seconds. So essentially, the situation didn’t change at all for the better.

With Sage 4.8 this situation changes dramatically  and we get that Sage performs this computation in 0.08 seconds, that’s 450 times faster than Sage 4.7.2. This is because M4RIE was merged in Sage 4.8. Hence, Sage is now (in some cases by far) fastest system to do linear algebra with dense matrices over \mathbb{F}_{2^e} for 1 \leq e \leq 8  and usually also for 9 \leq e \leq 10.

Efficient linear algebra for \mathbb{F}_{p}

One can tell a similar story for \mathbb{F}_p for, say, small to medium sized primes p. In Sage 4.7.2 it took 1.12* seconds to multiply two 1,000 x 1,000 matrices over \mathbb{F}_{251} (although you always had the option to call LinBox explicitly which was way faster but took more memory). With Sage 4.8 the same computation takes *0.16 seconds. For comparison, Magma 2.15 takes 0.22 seconds. So here again Sage moved from poor performance to best in class performance between 4.7.2 and 4.8 simply by making proper use of available libraries.

Viable Alternative yet?

Overall, the story for dense linear algebra in Sage for small finite fields \mathbb{F}_q  is as follows.

q Implementation Comments
2 M4RI Fastest implementation or equal performance depending on platform
3,5,7 \dots LinBox Decent performance, but faster implementations are known in the literature. Also, Magma is a bit faster on my machine.
prime < 2^{23} LinBox Fastest implementation or equal performance depending on platform.
2^e for 2 \leq e \leq 8 M4RIE Fastest
p^e for p>2 Generic Very poor performance, but some work is being done.

So, once we fix that last row Sage finally achieves “viable alternative” quality when it comes to dense linear algebra over \mathbb{F}_{q} if q is q < 2^{16}.

Rank-profile revealing Gaussian elimination and the CUP matrix decomposition

by Claude-Pierre Jeannerod, Clément Pernet, Arne Storjohann is now available on the archive. I like this paper a lot and we also referenced it in both the M4RI elimination paper and the M4RIE paper so three cheers that it’s now available.

Abstract: Transforming a matrix over a field to echelon form, or decomposing the matrix as a product of structured matrices that reveal the rank profile, is a fundamental building block of computational exact linear algebra. This paper surveys the well known variations of such decompositions and transformations that have been proposed in the literature. We present an algorithm to compute the CUP decomposition of a matrix, adapted from the LSP algorithm of Ibarra, Moran and Hui (1982), and show reductions from the other most common Gaussian elimination based matrix transformations and decompositions to the CUP decomposition. We discuss the advantages of the CUP algorithm over other existing algorithms by studying time and space complexities: the asymptotic time complexity is rank sensitive, and comparing the constants of the leading terms, the algorithms for computing matrix invariants based on the CUP decomposition are always at least as good except in one case. We also show that the CUP algorithm, as well as the computation of other invariants such as transformation to reduced column echelon form using the CUP algorithm, all work in place, allowing for example to compute the inverse of a matrix on the same storage as the input matrix.

http://arxiv.org/abs/1112.5717

Sage/FLINT Days aka Sage Days 35

I am writing this while waiting for my taxi to leave Sage Days 35. Although, I didn’t get much actual coding done, it was great fun and very useful. I met a lot of old friend, new faces and managed to put faces to e-mail addresses.

In terms of coding projects, first, I tried to speed up linear algebra mod p where p is a 32 or 64 bit prime. But it turns out that any trick I could think of could not improve on Frederik’s code. So that didn’t lead anywhere but I allowed me to read some code of FLINT2 (very readable) and admire how carefully it is written.

My other two projects both involved evaluate–pointwise-multiply–interpolate algorithms for fast matrix-matrix products over finite extension fields or for matrices with polynomial coefficients (over prime fields).  After my talk on M4RI(E) David Harvey worked out how to improve multiplication over \mathbb{F}_{2^6} from 17 multiplications over \mathbb{F}_2 to 15, which then lead to a general approach for \mathbb{F}_{2^m} with composite m. Much of it remains to be implemented (efficiently), but the \mathbb{F}_{2^6} example indeed shows a 10% speed-up as expected. The code is not clean yet, uses way too much memory and doesn’t deal with the more advanced finite field stuff appropriately. It should end up in M4RIE eventually though.

I also contributed a bit to #12177 which is about a “prime slice” implementation of matrices over \mathbb{F}_{p^k}. The idea is essentially to represent  these matrices as polynomials with matrix coefficients and to use fast polynomial multiplication algorithms for these polynomials. It turns out, this works very well even for small finite fields. Burcin Eröcal did all the coding, I only helped with some discussions. We need to polish the code a lot to be usable, so if you like matrices over \mathbb{F}_{p^k} head over to #12177 and help out.