The M4RIE library for dense linear algebra over small fields with even characteristic

I finally uploaded a pre-print of the M4RIE paper to the arXiv:

Abstract: In this work, we present the M4RIE library which implements efficient algorithms for linear algebra with dense matrices over \mathbb{F}_{2^e} for 2 \leq e \leq 10. As the name of the library indicates, it makes heavy use of the M4RI library both directly (i.e., by calling it) and indirectly (i.e., by using its concepts). We provide an open-source GPLv2+ C library for efficient linear algebra over \mathbb{F}_{2^e} for e small. In this library we implemented an idea due to Bradshaw and Boothby which reduces matrix multiplication over \mathbb{F}_{p^k} to a series of matrix multiplications over \mathbb{F}_p. Furthermore, we propose a caching technique – Newton-John tables – to avoid finite field multiplications which is inspired by Kronrod’s method (“M4RM”) for matrix multiplication over \mathbb{F}_2. Using these two techniques we provide asymptotically fast triangular solving with matrices (TRSM) and PLE-based Gaussian elimination. As a result, we are able to significantly improve upon the state of the art in dense linear algebra over \mathbb{F}_{2^e} with 2 \leq e \leq 10.

“Solving Circuit Optimisation Problems in Cryptography and Cryptanalysis”

Recently, Nicolas Courtois sent me a revised version of my PRESENT bitslice implementation which improves the representation of the S-Box and hence the performance considerably. A paper describing the techniques used to arrive at this new S-box representation is now available on eprint:

Solving Circuit Optimisation Problems in Cryptography and Cryptanalysis

Nicolas T. Courtois, Daniel Hulme and Theodosis Mourouzis

Abstract: One of the hardest problems in computer science is the problem of gate-eficient implementation. Such optimizations are particularly important in industrial hardware implementations of standard cryptographic algorithms. In this paper we focus on optimizing some small circuits such as S-boxes in cryptographic algorithms. We consider the notion of Multiplicative Complexity, a new important notion of complexity introduced in 2008 by Boyar and Peralta and applied to find interesting optimizations for the S-box of the AES cipher. We applied this methodology to produce a compact implementation of several ciphers. In this short paper we report our results on PRESENT and GOST, two block ciphers known for their exceptionally low hardware cost. This kind of representation seems to be very promising in implementations aiming at preventing side channel attacks on cryptographic chips such as DPA. More importantly, we postulate that this kind of minimality is also an important and interesting tool in cryptanalysis.

Bitslicing and the Method of the Four Russians over Larger Finite Fields

Tom Boothby’s and Robert Bradshaw’s paper on the “Method of the Four Russian” multiplication algorithm over F_3, F_5, F_7, F_{2^2} and F_{2^3} is available as pre-print on the arXiv. If you’re into fast exact linear algebra I highly recommend reading it since it has some really nice ideas in it and is well written.

Abstract. “We present a method of computing with matrices over very small finite fields of size larger than 2. Specifically, we show how the Method of Four Russians can be efficiently adapted to these larger fields, and introduce a row-wise matrix compression scheme that both reduces memory requirements and allows one to vectorize element operations. We also present timings which confirm the efficiency of these methods and exceed the speed of the fastest implementations the authors are aware of.”