I finally uploaded a pre-print of the M4RIE paper to the arXiv:
Abstract: In this work, we present the M4RIE library which implements efficient algorithms for linear algebra with dense matrices over for . As the name of the library indicates, it makes heavy use of the M4RI library both directly (i.e., by calling it) and indirectly (i.e., by using its concepts). We provide an open-source GPLv2+ C library for efficient linear algebra over for e small. In this library we implemented an idea due to Bradshaw and Boothby which reduces matrix multiplication over to a series of matrix multiplications over . Furthermore, we propose a caching technique – Newton-John tables – to avoid finite field multiplications which is inspired by Kronrod’s method (“M4RM”) for matrix multiplication over . Using these two techniques we provide asymptotically fast triangular solving with matrices (TRSM) and PLE-based Gaussian elimination. As a result, we are able to significantly improve upon the state of the art in dense linear algebra over with .