M4RIE

My coding spring project here at Sage Days 24 was/is to write a new library for dense linear algebra over small extensions of GF(2).

The main idea is to represent matrices over GF(2^e) internally as M4RI matrices over GF(2) in a row major flat representation. This allows to  re-use many of the M4RI functions such as addition of rows and matrices, stacking, augmenting etc.

For the elimination and multiplication base  cases we use a simple trick inspired by M4RI: Assume we found a pivot, now we need to rescale it and then add the right multiple of it to every row with a nonzero entry in the same column. If we have an n x n matrix we expect to perform n^2 multiplications and n^2 additions per pivot. However, if the field is small then we can precompute all possible multiplies of our pivot row and store them in a table. Then, we use this table as a lookup to perform the elimination of other rows. This costs 2^k * n multiplications and  n^2 additions per pivot. Note that M4RI has efficient functions to perform the look-up + elimination part.

We haven’t implemented any asymptotically fast methods yet but I hope to finish with Strassen multiplication this week. Asymptotically fast elimination requires slightly more infrastructure (TRSM, taking care of  offsets which are not zero % 64 etc.) If anybody is interested in helping out that would be  greatly appreciated. Also, I guess one should also try M4RI (the algorithm) for GF(2^2).

Some first results

In these plots red means that Magma is faster by a factor of e^abs(red) and blue means that our code is faster by a factor of e^abs(blue). More details can be found on the SD24 wiki page.

To get a sense for absolute numbers,  here are some preliminary benchmarks for row echelon forms of dense  random n x n matrices over GF(2^4)

|| n    || Sage 4.5 || NTL/2 || Magma || M4RIE (old)|| M4RIE (new) ||
|| 1000 ||    49.49 || 18.84 || 0.090 || 0.097      || 0.046       ||
|| 2000 ||   429.05 || 149.11|| 0.510 || 0.529      || 0.28        ||
|| 3000 ||  1494.33 ||526.57 || 1.640 || 2.315      || 1.00        ||

Bottom line it is much much faster (> 1,000x) than what we have in Sage right now (which is the generic implementation). Also note that switching to NTL would not improve things much.

I should point out that the strategy for multiplication in Tom and  Robert’s paper Bitslicing and the Method of Four Russians Over Larger Finite Fields is likely to be better.  Judging from the timings in that paper we are about a factor of two  behind them. I plan to implement/port their very cool trick for finite  extension fields at some point in the future. The trick is limited to  multiplication as far as I understand it thus it might make still sense to consider my representation for the elimination base case etc.

Cool New Stuff in Recent Sage Releases

Now that Sage 4.3 was released, maybe it’s time to point out some of the cool recent developments. Of course the following list is very very biased.

  • libSingular functions interface. We now have some code which makes it possible to call any function available in Singular using the libSingular C wrapper directly, like this.
    sage: P = PolynomialRing(GF(127),10,'x')
    sage: I = Ideal(P.random_element() for _ in range(3000))
    sage: from sage.libs.singular.function import singular_function, lib
    sage: groebner = singular_function('groebner')
    sage: %time groebner(I)
    CPU times: user 0.07 s, sys: 0.00 s, total: 0.08 s
    Wall time: 0.08 s
    [1]

    For comparison, the Singular pexpect interface needs almost two seconds for the same task (due to string parsing on both ends, IPC, etc.)

    sage:%time groebner_basis()
    CPU times: user 0.96 s, sys: 0.24 s, total: 1.21 s
    Wall time: 1.92 s
    [1]

    Michael Brickenstein wrote a lot of this code, so three cheers to him!

  • linear algebra over F_2 got better. For once, we implemented vectors over F_2 on top of M4RI matrices (cf. #7715), which makes them much faster. Furthermore, we call more dedicated M4RI functions now instead of the generic slow functions available for all fields (cf. #3684). Finally, asymptotically fast matrix factorisation got faster again. However, we still didn’t switch to this implementation as the default implementation because of the slow-down for sparse-ish matrices: use the algorithm=’pluq’ option to force the new implementation.
  • PolyBoRi was updated to version 0.6.3 and the interface received some considerable update too during a visit to Kaiserslautern. Please, please, please report any regressions etc. either to me, to [sage-support] or to [polybori-discuss]. didn’t make it, cf. #7271
  • Linear Programming is now available in Sage (though it requires to install at least one optional package). Still, this opens up quite a few possibilities (cf. Nathann Cohen’s tutorial).

M4RI API Change and Big Matrices

Finally, I found some time to work on M4RI again: I changed the internal representation of matrices to support more than one malloc() call per matrix, i.e. each matrix is split into blocks of some maximal size. This allows to deal with much bigger matrices under Linux because the kernel often won’t just give you 8GB mapped to consecutive addresses but it will give you 1GB eight times. This limitation bugged me quite some time now, since this also limited the kind of systems I could solve using PolyBoRi with M4RI enabled. The result is available at http://bitbucket.org/malb/m4ri/ but bear in mind that the API changed quite a bit for this (e.g., I renamed the packedmatrix to mzd_t) on the way).

64-bit Ubuntu, Xeon X7400 @2.66GHz
Matrix
Dimension
Memory
(expected)
M4RI/M4RI
(64-bit, 1 core)
M4RI/PLUQ
(64-bit, 1 core)
100,000 x 100,000 > 1.16 GB 1078.72 s 429.32 s
200,000 x 200,000 > 4.65 GB 2298.30 s
256,000 x 256,000 > 7.63 GB 8979.33 s 3709.25 s

The above table gives the time to compute the rank of random matrices of the given dimension using the given algorithms on http://geom.math.washington.edu.