During last weeks Sage Days 32 I worked on finishing a patch by Clément Pernet and Burcin Eröcal which switches Sage’s native matrices mod n (for ) to FFLAS/FFPACK. The purpose of this exercise was to make operations with dense matrices mod n both faster and more memory efficient (currently, Sage does a lot of copying around when calling any LinBox function). Well, we almost succeeded.
Without #4260 on my 2.6Ghz i7:
sage: A = random_matrix(GF(97),2000,2000) sage: %time A*A CPU times: user 9.66 s, sys: 0.12 s, total: 9.77 s Wall time: 9.82 s
With #4260 on my 2.6Ghz i7:
sage: A = random_matrix(GF(97),2000,2000) sage: %time A*A CPU times: user 1.32 s, sys: 0.00 s, total: 1.32 s Wall time: 1.35 s
Magma on my 2.6Ghz i7:
> A:=RandomMatrix(GF(97),2000,2000); > time C:=A*A; Time: 1.560
So far, so good. Also, currently all doctests pass in Sage with the new patch applied. However, before it can go in, we’ll have to take care of a performance regression introduced with #4260:
Without patch:
sage: MS = MatrixSpace(GF(101),2000,2000) sage: %time A = MS.random_element() CPU times: user 0.17 s, sys: 0.03 s, total: 0.20 s Wall time: 0.20 s sage: %time A.echelonize() CPU times: user 1.22 s, sys: 0.06 s, total: 1.28 s Wall time: 1.38 s
With patch:
sage: MS = MatrixSpace(GF(101),2000,2000) sage: %time A = MS.random_element() CPU times: user 0.19 s, sys: 0.03 s, total: 0.22 s Wall time: 0.22 s sage: %time A.echelonize() CPU times: user 1.87 s, sys: 0.00 s, total: 1.87 s Wall time: 1.92 s
That is, echelon forms actually got slower! In other words, this code:
Modular<Element> F(modulus); size_t * P=new size_t[nrows]; size_t * Q=new size_t[ncols]; size_t r = FFPACK::ReducedRowEchelonForm(F, nrows, ncols, matrix, ncols, P,Q); for (size_t i=0; i<nrows;++i){ for (size_t j=0; j<r; ++j){ *(matrix+i*ncols+j) = 0; } if (i<r) *(matrix + i*(ncols+1)) = 1; } FFPACK::applyP (F, FFLAS::FflasRight, FFLAS::FflasNoTrans, nrows, 0,r, matrix, ncols, Q); delete[] P; delete[] Q; return r;
is slower than this code:
ModInt F((double)modulus); EchelonFormDomain<ModInt> EF(F); BlasMatrix A( nrows, ncols); BlasMatrix E( nrows, ncols); mod_int* row; for (size_t i=0; i < nrows; i++) { row = matrix[i]; for (size_t j=0; j < ncols; j++) A.setEntry(i, j, (double)row[j]); } int rank = EF.rowReducedEchelon(E, A); for (size_t i=0; i < nrows; i++) { row = matrix[i]; for (size_t j=0; j < ncols; j++) row[j] = (mod_int)E.getEntry(i, j); } return rank;
… so close.