# BKW: Update

We have updated our pre-print titled “On the Complexity of the BKW Algorithm on LWE” on ePrint.

There are two main changes and the reasons why I am mentioning this update here.

1. We included a more thorough comparison with other approaches, in particular, with lattice reduction (reducing LWE to SIS). To our surprise, BKW is quite competitive even in relatively modest dimensions. For Regev’s and Lindner-Peikert’s parameter sets (as interpreted here) we get that BKW is at least as fast as BKZ starting in dimension $n \approx 250$, which I find very low (see Table 4 on page 19).
2. We also provide an alternative approximating for the running time of BKZ. The standard estimate due to Lindner-Peikert is $\log_2 T_{sec} = \log_2 1.8/\delta_0 - 110$ where $\delta_0$ is the targeted root hermit factor. Interpolating estimates from the BKZ 2.0 simulator and reflecting on the doubly exponential running time of BKZ in the blocksize $\beta$ we found: $\log_2 T_{sec} = \log_2 0.009/\delta^2_0 - 27$. However, since this might be controversial, we include estimates for both models.

# M4RI 20121224

I have just pushed the button to release M4RI 20121224. The main feature of this release is a considerable performance improvement. It all started with Fast matrix decomposition in F2 by Enrico Bertolazzi and Anna Rimoldi showing up on the arXiv. Here’s the abstract

In this work an efficient algorithm to perform a block decomposition (and so to compute the rank) of large dense rectangular matrices with entries in F2 is presented. Depending on the way the matrix is stored, the operations acting on rows or block of consecutive columns (stored as one integer) should be preferred. In this paper, an algorithm that completely avoids the column permutations is given. In particular, a block decomposition is presented and its running times are compared with the ones adopted into SAGE.

… and that comparison made M4RI (which realises this functionality in Sage) look pretty bad. I did’t (and still don’t) share the implicit assumption that avoiding column swaps was the key ingredient in making this code so much faster than ours. I assume the impressive timings are due to a very efficient base case implementation. Anyway, we sat down  and looked for performance bottlenecks the result of which is 20121224. I actually have no idea whether we caught up to the code described in Enrico’s and Anna’s pre-print as they did not publish their sources.

Still, the performance improvements over 20120613 were worth the trouble. Below two plots of the (normalised) leading constants giving the leading constants for multiplication and elimination respectively (more plots on imgur) That is, it plots the running time divided by $n^{2.807} \cdot 10^9$. In theory these plots should all have slope 0.

Multiplication on Intel Core i7

PLE on Intel Core i7

Finally, here’s the plot for Fast matrix decomposition in F2 which starts very small but has a rather large slope. That’s why I concluded that the performance stems from a very efficient base case. I should get in touch with Enrio and Anna about this.

# Rank-profile revealing Gaussian elimination and the CUP matrix decomposition

by Claude-Pierre Jeannerod, Clément Pernet, Arne Storjohann is now available on the archive. I like this paper a lot and we also referenced it in both the M4RI elimination paper and the M4RIE paper so three cheers that it’s now available.

Abstract: Transforming a matrix over a field to echelon form, or decomposing the matrix as a product of structured matrices that reveal the rank profile, is a fundamental building block of computational exact linear algebra. This paper surveys the well known variations of such decompositions and transformations that have been proposed in the literature. We present an algorithm to compute the CUP decomposition of a matrix, adapted from the LSP algorithm of Ibarra, Moran and Hui (1982), and show reductions from the other most common Gaussian elimination based matrix transformations and decompositions to the CUP decomposition. We discuss the advantages of the CUP algorithm over other existing algorithms by studying time and space complexities: the asymptotic time complexity is rank sensitive, and comparing the constants of the leading terms, the algorithms for computing matrix invariants based on the CUP decomposition are always at least as good except in one case. We also show that the CUP algorithm, as well as the computation of other invariants such as transformation to reduced column echelon form using the CUP algorithm, all work in place, allowing for example to compute the inverse of a matrix on the same storage as the input matrix.

http://arxiv.org/abs/1112.5717

# The M4RIE library for dense linear algebra over small fields with even characteristic

I finally uploaded a pre-print of the M4RIE paper to the arXiv:

Abstract: In this work, we present the M4RIE library which implements efficient algorithms for linear algebra with dense matrices over $\mathbb{F}_{2^e}$ for $2 \leq e \leq 10$. As the name of the library indicates, it makes heavy use of the M4RI library both directly (i.e., by calling it) and indirectly (i.e., by using its concepts). We provide an open-source GPLv2+ C library for efficient linear algebra over $\mathbb{F}_{2^e}$ for e small. In this library we implemented an idea due to Bradshaw and Boothby which reduces matrix multiplication over $\mathbb{F}_{p^k}$ to a series of matrix multiplications over $\mathbb{F}_p$. Furthermore, we propose a caching technique – Newton-John tables – to avoid finite field multiplications which is inspired by Kronrod’s method (“M4RM”) for matrix multiplication over $\mathbb{F}_2$. Using these two techniques we provide asymptotically fast triangular solving with matrices (TRSM) and PLE-based Gaussian elimination. As a result, we are able to significantly improve upon the state of the art in dense linear algebra over $\mathbb{F}_{2^e}$ with $2 \leq e \leq 10$.

# Efficient dense Gaussian elimination over the field with two elements

Finally, we finished our paper about Gaussian elimination in the M4RI library.

Abstract: In this work we describe an efficient implementation of a hierarchy of algorithms for Gaussian elimination upon dense matrices over the field with two elements ($\mathbb{F}_2$). We discuss both well-known and new algorithms as well as our implementations in the M4RI library, which has been adopted into Sage. The focus of our discussion is a block iterative algorithm for PLE decomposition which is inspired by the M4RI algorithm. The implementation presented in this work provides considerable performance gains in practice when compared to the previously fastest implementation. We provide performance figures on x86_64 CPUs to demonstrate the alacrity of our approach.

The sources of this document are available on bitbucket. But I also compiled a PDF.