I have just pushed the button to release M4RI 20121224. The main feature of this release is a considerable performance improvement. It all started with *Fast matrix decomposition in F2* by Enrico Bertolazzi and Anna Rimoldi showing up on the arXiv. Here’s the abstract

In this work an efficient algorithm to perform a block decomposition (and so to compute the rank) of large dense rectangular matrices with entries in F2 is presented. Depending on the way the matrix is stored, the operations acting on rows or block of consecutive columns (stored as one integer) should be preferred. In this paper, an algorithm that completely avoids the column permutations is given. In particular, a block decomposition is presented and its running times are compared with the ones adopted into SAGE.

… and that comparison made M4RI (which realises this functionality in Sage) look pretty bad. I did’t (and still don’t) share the implicit assumption that avoiding column swaps was the key ingredient in making this code so much faster than ours. I assume the **impressive** timings are due to a very efficient base case implementation. Anyway, we sat down and looked for performance bottlenecks the result of which is 20121224. I actually have no idea whether we caught up to the code described in Enrico’s and Anna’s pre-print as they did not publish their sources.

Still, the performance improvements over 20120613 were worth the trouble. Below two plots of the (normalised) leading constants giving the leading constants for multiplication and elimination respectively (more plots on imgur) That is, it plots the running time divided by . In theory these plots should all have slope 0.

Finally, here’s the plot for *Fast matrix decomposition in F2* which starts very small but has a rather large slope. That’s why I concluded that the performance stems from a very efficient base case. I should get in touch with Enrio and Anna about this.