Dense matrices mod n via FFLAS/FFPACK

During last weeks Sage Days 32 I worked on finishing a patch by Clément Pernet and Burcin Eröcal which switches Sage’s native matrices mod n (for n<2^{23}) to FFLAS/FFPACK. The purpose of this exercise was to make operations with dense matrices mod n both faster and more memory efficient (currently, Sage does a lot of copying around when calling any LinBox function). Well, we almost succeeded.

Without #4260 on my 2.6Ghz i7:

sage: A = random_matrix(GF(97),2000,2000)
sage: %time A*A
CPU times: user 9.66 s, sys: 0.12 s, total: 9.77 s
Wall time: 9.82 s

With #4260 on my 2.6Ghz i7:

sage: A = random_matrix(GF(97),2000,2000)
sage: %time A*A
CPU times: user 1.32 s, sys: 0.00 s, total: 1.32 s
Wall time: 1.35 s

Magma on my 2.6Ghz i7:

> A:=RandomMatrix(GF(97),2000,2000);
> time C:=A*A;
Time: 1.560

So far, so good. Also, currently all doctests pass in Sage with the new patch applied. However, before it can go in, we’ll have to take care of a performance regression introduced with #4260:

Without patch:

sage: MS = MatrixSpace(GF(101),2000,2000)
sage: %time A = MS.random_element()
CPU times: user 0.17 s, sys: 0.03 s, total: 0.20 s
Wall time: 0.20 s
sage: %time A.echelonize()
CPU times: user 1.22 s, sys: 0.06 s, total: 1.28 s
Wall time: 1.38 s

With patch:

sage: MS = MatrixSpace(GF(101),2000,2000)
sage: %time A = MS.random_element()
CPU times: user 0.19 s, sys: 0.03 s, total: 0.22 s
Wall time: 0.22 s
sage: %time A.echelonize()
CPU times: user 1.87 s, sys: 0.00 s, total: 1.87 s
Wall time: 1.92 s

That is, echelon forms actually got slower! In other words, this code:

Modular<Element> F(modulus);

size_t * P=new size_t[nrows];
size_t * Q=new size_t[ncols];
size_t r = FFPACK::ReducedRowEchelonForm(F, nrows, ncols, matrix, ncols, P,Q);

for (size_t i=0; i<nrows;++i){
  for (size_t j=0; j<r; ++j){
    *(matrix+i*ncols+j) = 0;
  }
  if (i<r)
    *(matrix + i*(ncols+1)) = 1;
}
FFPACK::applyP (F, FFLAS::FflasRight, FFLAS::FflasNoTrans, nrows, 0,r, matrix, ncols, Q);
delete[] P;
delete[] Q;
return r;

is slower than this code:

ModInt F((double)modulus);
EchelonFormDomain<ModInt> EF(F);
BlasMatrix A( nrows, ncols);
BlasMatrix E( nrows, ncols);

mod_int* row;
for (size_t i=0; i &lt; nrows; i++) {
  row = matrix[i];
  for (size_t j=0; j &lt; ncols; j++)
    A.setEntry(i, j, (double)row[j]);
}
int rank = EF.rowReducedEchelon(E, A);
for (size_t i=0; i &lt; nrows; i++) {
  row = matrix[i];
  for (size_t j=0; j &lt; ncols; j++)
    row[j] = (mod_int)E.getEntry(i, j);
}
return rank;

… so close.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s