Meataxe64 is a large software development project to produce programs for working at high performance with large matrices over finite fields.
At the lowest level, the aim is to work modulo primes (only), using grease (much like “four Russians”) to reduce the amount of work, to use vectorized code in x86 assembler (SSE/AVX) to do the basic operations and to have short rows and few columns so that matrices fit suitably into the various levels of cache. The objective is to run as fast as possible with as little use of real-memory bandwidth as possible.
At a middle level, the aim is to use linear functions to work with extension fields, and to chop the matrices up so that the lowest level can operate.
At a higher level, the aim is to make effective use of a multi-core environment, building on the advantage that the cache-friendly lower level provides to ensure that many cores can be used effectively. The thread-farm looks after the messy signals, locks and thread handling.
It is hoped soon that a layer will be added to take a matrix that fits on disk but not in memory to extend the possible scale of operations further.
Finally I dream that a fault-tolerant distributed system can be build on top of this to handle matrices of gargantuan proportions, but this lies some considerable way into the future.
Go read the development blog, I certainly learned a lot from Richard Parker whenever we talked.
Enrico made the code (if the link doesn’t work search for his name on Research Gate) for his LU factorisation code over GF(2) available online under the GPL. This is an implement of the algorithm described by Anna and him in Fast matrix decomposition in F2 and for which they give timings in that paper (discussed a bit here). I had to fix some includes to make it compile on my box, but nothing major. I can also confirm the impressive performance of their software (I ran testRankComputation).
Continue reading “Enrico Bertolazzi’s linear algebra code over GF(2) available”
Two days ago I wrote about LELA’s implementation of Gaussian elimination for Gröbner basis computations over . Yesterday, I implemented LELA’s algorithm (which is from Faugere & Lachartre paper) in M4RI. Continue reading “Linear Algebra for Gröbner Bases over GF(2): M4RI”
As you may know, today is the last day of the wokshop on Efficient Linear Algebra for Gröbner Basis Computations that Christian Eder, Burcin Eröcal, Alexander Dreyer and I organised.
I have to say that I am quite pleased with how the workshop played out. We planned the whole thing to be hands on: people were strongly encouraged to work on projects, i.e., to write code preferably together, in addition to attending talks. Those who attended a Sage Days workshop in the past, will know what workshop format I am referring to. Continue reading “Report: Workshop on Efficient Linear Algebra for Gröbner Basis Computations”
by Claude-Pierre Jeannerod, Clément Pernet, Arne Storjohann is now available on the archive. I like this paper a lot and we also referenced it in both the M4RI elimination paper and the M4RIE paper so three cheers that it’s now available.
Abstract: Transforming a matrix over a field to echelon form, or decomposing the matrix as a product of structured matrices that reveal the rank profile, is a fundamental building block of computational exact linear algebra. This paper surveys the well known variations of such decompositions and transformations that have been proposed in the literature. We present an algorithm to compute the CUP decomposition of a matrix, adapted from the LSP algorithm of Ibarra, Moran and Hui (1982), and show reductions from the other most common Gaussian elimination based matrix transformations and decompositions to the CUP decomposition. We discuss the advantages of the CUP algorithm over other existing algorithms by studying time and space complexities: the asymptotic time complexity is rank sensitive, and comparing the constants of the leading terms, the algorithms for computing matrix invariants based on the CUP decomposition are always at least as good except in one case. We also show that the CUP algorithm, as well as the computation of other invariants such as transformation to reduced column echelon form using the CUP algorithm, all work in place, allowing for example to compute the inverse of a matrix on the same storage as the input matrix.
Finally, we finished our paper about Gaussian elimination in the M4RI library.
Abstract: In this work we describe an efficient implementation of a hierarchy of algorithms for Gaussian elimination upon dense matrices over the field with two elements (). We discuss both well-known and new algorithms as well as our implementations in the M4RI library, which has been adopted into Sage. The focus of our discussion is a block iterative algorithm for PLE decomposition which is inspired by the M4RI algorithm. The implementation presented in this work provides considerable performance gains in practice when compared to the previously fastest implementation. We provide performance figures on x86_64 CPUs to demonstrate the alacrity of our approach.
The sources of this document are available on bitbucket. But I also compiled a PDF.
Update: arXiv link.
I guess one advantage of not finishing a paper that is “done” is that one can (a) change the name of the algorithm over and over again and (b) improve said algorithm. Continue reading “Block-iterative PLE decomposition”