# Looking Back and Forward for Open-Source Mathematics Software (2014)

When a year ends people make lists. I can only guess that several people are currently busy with writing “The 5 most revised papers on eprint ” and “The 8 best IACR flagship conference rump session presentations of 2014”. Since all the good lists are taken, my list has to be a little bit more personal. Alas, here is my list of stuff that happened in open-source computational mathematics in 2014 around me. That is, below I list what developments happened in 2014 and try to provide an outlook for 2015 (so that I can come back in a year to notice that nothing played out as planned).

If you are interested in any of the projects below feel invited to get involved. Also, if you are student and you are interested in working on one of the (bigger) projects listed below over the summer, get in touch: we could try to turn it into a Google Summer of Code 2015 project.

# Sage 5.10

Sage 5.10 was released earlier today. It has the following goodies I particularly care about:

## Faster Dense Linear Algebra over GF(2)

TL;DR: We updated M4RI to the most recent upstream release which is better suited for modern CPUs.

After Enrico Bertolazzi and Anna Rimoldi kicked out butts with their pre-print we went to work to re-tune M4RI. That is, I don’t agree with their premise that their new algorithm is the (main) cause of their impressive performance. As a result M4RI got considerably faster on modern CPUs.

Here’s a comparison of Sage 5.8 (which has the same performance characteristics as 5.9 for this stuff) and Sage 5.10. Sage 5.8 goes first:

sage: A = random_matrix(GF(2),2^14, 2^14)
sage: B = random_matrix(GF(2),2^14, 2^14)
sage: %time A*B
CPU times: user 4.46 s, sys: 0.02 s, total: 4.48 s
Wall time: 4.50 s

sage: %time A.echelonize()
CPU times: user 2.53 s, sys: 0.00 s, total: 2.53 s
Wall time: 2.54 s


Now Sage 5.10 which is 1.22 times faster for multiplication and 1.17 times faster for elimination in this particular benchmark.

sage: A = random_matrix(GF(2),2^14, 2^14)
sage: B = random_matrix(GF(2),2^14, 2^14)
sage: %time A*B
CPU times: user 3.61 s, sys: 0.04 s, total: 3.65 s
Wall time: 3.66 s

sage: %time A.echelonize()
CPU times: user 2.16 s, sys: 0.00 s, total: 2.17 s
Wall time: 2.17 s


For comparision, Magma 2.15-10 takes 4.5 seconds for this multiplication and Magma 2.18-7 takes 5 seconds on the same machine. See here for details on the M4RI update.

## Faster Dense Linear Algebra over GF(2^e)

TL;DR: Improvements over GF(2) have a knock-on effect on GF(2^e) and we upgraded M4RIE to the newest upstream release which extends the supported degree size up to $e \leq 16$

M4RIE recently dropped its dependency on Givaro and extended the degrees it supports up to 16. Sage 5.10 updates to this new release and extends the finite field size that is covered by M4RIE to $\mathbb{F}_{2^16}$. This means a huge performance improvements for dense linear algebra over $\mathbb{F}_{2^e}$ for $8 < e \leq 16$. Note, however, that these cases are not fully optimised yet, so that it’s not the fastest implementation  – in this range – yet. Sage 5.8 first:

sage: A = random_matrix(GF(2^8,'a'),10^4, 10^4)
sage: B = random_matrix(GF(2^8,'a'),10^4, 10^4)
sage: %time A*B
CPU times: user 32.07 s, sys: 0.48 s, total: 32.56 s
Wall time: 32.67 s
10000 x 10000 dense matrix over Finite Field in a of size 2^8

sage: A = random_matrix(GF(2^12,'a'),10^3, 10^3)
sage: B = random_matrix(GF(2^12,'a'),10^3, 10^3)
sage: %time A*B # Sage 5.8 uses generic Python code to do this
CPU times: user 339.02 s, sys: 0.70 s, total: 339.72 s
Wall time: 340.86 s
1000 x 1000 dense matrix over Finite Field in a of size 2^12


Now, Sage 5.10 which is 1.16 times and 1420 times faster respectively for these benchmarks.

sage: A = random_matrix(GF(2^8,'a'),10^4, 10^4)
sage: B = random_matrix(GF(2^8,'a'),10^4, 10^4)
sage: %time A*B # knock-on effect from GF(2) improvements
CPU times: user 27.42 s, sys: 0.62 s, total: 28.04 s
Wall time: 28.14 s
10000 x 10000 dense matrix over Finite Field in a of size 2^8

sage: A = random_matrix(GF(2^12,'a'),10^3, 10^3)
sage: B = random_matrix(GF(2^12,'a'),10^3, 10^3)
sage: %time A*B # new code in M4RIE
CPU times: user 0.23 s, sys: 0.01 s, total: 0.24 s
Wall time: 0.24 s
1000 x 1000 dense matrix over Finite Field in a of size 2^12


For comparison, Magma 2.15-10 takes 3.79 seconds and Magam 2.18-7 takes 0.16 seconds for the latter benchmark. This highlights that M4RIE isn’t what it should be yet in that range (see here for details).

# Matrix Multiplication over GF(p^e)

After my talk at Sage Days 35 in Warwick (that was in winter 2011) David Harvey had an idea on how to speed up matrix multiplication over $\mathbb{F}_{p^n}$. We spend some time on this in Warwick and developed this idea further (adding fun stuff like Mixed Integer Programming in the process) but did not get around to do much on this project in the mean time (I have explained the idea at the end of my talk in Mykonos, though).

Just now, in a conversation with Richard Parker I was reminded of this dormant project, i.e., the question of how many multiplications i $\mathbb{F}_p$ it takes to do a multiplication in $\mathbb{F}_{p^n}$. In particular, I recalled to have written some code for Sage which gives some upper bound to this answer which is better than Karatsuba.

Well, here’s an interactive demo … gosh, I love the Sage cell server.

# M4RIE: support for finite fields up to degree 16 added

I committed support for finite fields up to degree 16 to M4RIE a few days ago. Furthermore, the dependency on Givaro for constructing finite fields was dropped.

Don’t get me wrong. Givaro is a fine library, much better than what I wrote for M4RIE. However, it is a C++ library while M4RIE is a C library and the little functionality of finite field arithmetic needed in M4RIE was not that hard to add natively. In the past M4RIE relied on Givaro for running tests and benchmarks, the core library was always free of C++. However, as we plan to add support for high-degree polynomials over matrices over$\mathbb{F}_2$, we need the ability to create finite extensions of $\mathbb{F}_2$ on the fly in the core library.

# Report: Workshop on Efficient Linear Algebra for Gröbner Basis Computations

As you may know, today is the last day of the wokshop on Efficient Linear Algebra for Gröbner Basis Computations that Christian Eder, Burcin Eröcal, Alexander Dreyer and I organised.

I have to say that I am quite pleased with how the workshop played out. We planned the whole thing to be hands on: people were strongly encouraged to work on projects, i.e., to write code preferably together, in addition to attending talks. Those who attended a Sage Days workshop in the past, will know what workshop format I am referring to. Continue reading “Report: Workshop on Efficient Linear Algebra for Gröbner Basis Computations”

# Sage 4.8 is out

If you care about the stuff I care about (and why else would you read this blog?) you might get excited about a few changes in Sage.

## Efficient linear algebra for $\mathbb{F}_{2^e}$

The very first non-trivial patch I ever produced for Sage was about interfacing with NTL for dense linear algebra over $\mathbb{F}_{2^e}$ (I was interested in algebraic attacks against AES at the time). Here’s William’s reply:

Your NTL patch worked perfectly for me first try. I tried more benchmarks (on Pentium-M 1.8Ghz).

[…]

This is pretty good; vastly better than what’s was in SAGE by default, and way better than PARI. Note that MAGMA is much faster though (nearly 8 times faster):

[…]

MAGMA uses (1) […] and (2) a totally different algorithm for computing the echelon form. […] As far as I know, the MAGMA method is not implemented anywhere in the open source world But I’d love to be wrong about that… or even remedy that.

Well, that was 2006. Fast forward to the year 2011 and we get the following timings for computing the reduced row echelon form of a 1,000 x 1,000 matrix over $\mathbb{F}_{256}$: Sage 4.7.2 takes 36.53 seconds , NTL 5.4.2 takes 31.06 seconds and Magma 2.15 does it in 0.87 seconds. So essentially, the situation didn’t change at all for the better.

With Sage 4.8 this situation changes dramatically  and we get that Sage performs this computation in 0.08 seconds, that’s 450 times faster than Sage 4.7.2. This is because M4RIE was merged in Sage 4.8. Hence, Sage is now (in some cases by far) fastest system to do linear algebra with dense matrices over $\mathbb{F}_{2^e}$ for $1 \leq e \leq 8$  and usually also for $9 \leq e \leq 10$.

## Efficient linear algebra for $\mathbb{F}_{p}$

One can tell a similar story for $\mathbb{F}_p$ for, say, small to medium sized primes $p$. In Sage 4.7.2 it took 1.12* seconds to multiply two 1,000 x 1,000 matrices over $\mathbb{F}_{251}$ (although you always had the option to call LinBox explicitly which was way faster but took more memory). With Sage 4.8 the same computation takes *0.16 seconds. For comparison, Magma 2.15 takes 0.22 seconds. So here again Sage moved from poor performance to best in class performance between 4.7.2 and 4.8 simply by making proper use of available libraries.

## Viable Alternative yet?

Overall, the story for dense linear algebra in Sage for small finite fields $\mathbb{F}_q$  is as follows.

 $q$ Implementation Comments $2$ M4RI Fastest implementation or equal performance depending on platform $3,5,7 \dots$ LinBox Decent performance, but faster implementations are known in the literature. Also, Magma is a bit faster on my machine. prime < $2^{23}$ LinBox Fastest implementation or equal performance depending on platform. $2^e$ for $2 \leq e \leq 8$ M4RIE Fastest $p^e$ for $p>2$ Generic Very poor performance, but some work is being done.

So, once we fix that last row Sage finally achieves “viable alternative” quality when it comes to dense linear algebra over $\mathbb{F}_{q}$ if $q$ is $q < 2^{16}$.

# Rank-profile revealing Gaussian elimination and the CUP matrix decomposition

by Claude-Pierre Jeannerod, Clément Pernet, Arne Storjohann is now available on the archive. I like this paper a lot and we also referenced it in both the M4RI elimination paper and the M4RIE paper so three cheers that it’s now available.

Abstract: Transforming a matrix over a field to echelon form, or decomposing the matrix as a product of structured matrices that reveal the rank profile, is a fundamental building block of computational exact linear algebra. This paper surveys the well known variations of such decompositions and transformations that have been proposed in the literature. We present an algorithm to compute the CUP decomposition of a matrix, adapted from the LSP algorithm of Ibarra, Moran and Hui (1982), and show reductions from the other most common Gaussian elimination based matrix transformations and decompositions to the CUP decomposition. We discuss the advantages of the CUP algorithm over other existing algorithms by studying time and space complexities: the asymptotic time complexity is rank sensitive, and comparing the constants of the leading terms, the algorithms for computing matrix invariants based on the CUP decomposition are always at least as good except in one case. We also show that the CUP algorithm, as well as the computation of other invariants such as transformation to reduced column echelon form using the CUP algorithm, all work in place, allowing for example to compute the inverse of a matrix on the same storage as the input matrix.

http://arxiv.org/abs/1112.5717

# Sage/FLINT Days aka Sage Days 35

I am writing this while waiting for my taxi to leave Sage Days 35. Although, I didn’t get much actual coding done, it was great fun and very useful. I met a lot of old friend, new faces and managed to put faces to e-mail addresses.

In terms of coding projects, first, I tried to speed up linear algebra mod p where p is a 32 or 64 bit prime. But it turns out that any trick I could think of could not improve on Frederik’s code. So that didn’t lead anywhere but I allowed me to read some code of FLINT2 (very readable) and admire how carefully it is written.

My other two projects both involved evaluate–pointwise-multiply–interpolate algorithms for fast matrix-matrix products over finite extension fields or for matrices with polynomial coefficients (over prime fields).  After my talk on M4RI(E) David Harvey worked out how to improve multiplication over $\mathbb{F}_{2^6}$ from 17 multiplications over $\mathbb{F}_2$ to 15, which then lead to a general approach for $\mathbb{F}_{2^m}$ with composite $m$. Much of it remains to be implemented (efficiently), but the $\mathbb{F}_{2^6}$ example indeed shows a 10% speed-up as expected. The code is not clean yet, uses way too much memory and doesn’t deal with the more advanced finite field stuff appropriately. It should end up in M4RIE eventually though.

I also contributed a bit to #12177 which is about a “prime slice” implementation of matrices over $\mathbb{F}_{p^k}$. The idea is essentially to represent  these matrices as polynomials with matrix coefficients and to use fast polynomial multiplication algorithms for these polynomials. It turns out, this works very well even for small finite fields. Burcin Eröcal did all the coding, I only helped with some discussions. We need to polish the code a lot to be usable, so if you like matrices over $\mathbb{F}_{p^k}$ head over to #12177 and help out.

# The M4RIE library for dense linear algebra over small fields with even characteristic

I finally uploaded a pre-print of the M4RIE paper to the arXiv:

Abstract: In this work, we present the M4RIE library which implements efficient algorithms for linear algebra with dense matrices over $\mathbb{F}_{2^e}$ for $2 \leq e \leq 10$. As the name of the library indicates, it makes heavy use of the M4RI library both directly (i.e., by calling it) and indirectly (i.e., by using its concepts). We provide an open-source GPLv2+ C library for efficient linear algebra over $\mathbb{F}_{2^e}$ for e small. In this library we implemented an idea due to Bradshaw and Boothby which reduces matrix multiplication over $\mathbb{F}_{p^k}$ to a series of matrix multiplications over $\mathbb{F}_p$. Furthermore, we propose a caching technique – Newton-John tables – to avoid finite field multiplications which is inspired by Kronrod’s method (“M4RM”) for matrix multiplication over $\mathbb{F}_2$. Using these two techniques we provide asymptotically fast triangular solving with matrices (TRSM) and PLE-based Gaussian elimination. As a result, we are able to significantly improve upon the state of the art in dense linear algebra over $\mathbb{F}_{2^e}$ with $2 \leq e \leq 10$.

# Sage/FLINT Days in Warwick 17 – 22nd December 2011

“A Sage Days workshop around the theme of Algorithms in Number Theory and FLINT.”