## A nice little trick

It is well known that polynomial system solving algorithms (in particular Gröbner basis algorithm) are more efficient if the polynomial system is overdefined. Thus, a standard approach when trying to cryptanalyse block ciphers using algebraic techniques is to make the system more overdefined. One approach is to use differential characteristics; another one is to use higher-order differential cryptanalysis methods to produce a system where many state variables can be identified.

In order to speed up the computations in practice it is often beneficial to remove “redundant” variables, for instance one can remove $y$ and replace it by $x + 1$ if the polynomial $y + x + 1$ is in the system. This is, in fact, what is done by the Sage function

F.eliminate_linear_variables()


However, this only works well if the variable which ought to be replaced is actually the leading term of a polynomial. If we consider for example algebraic higher-order differential techniques we have quite a few polynomials of the form $x_{i,0} + k_0 (+ 1)$ for each of the $0 \leq i < n$ plaintext-ciphertext pairs (those encode the first key addition). Those allow us to replace $x_{i,0}$ by $k_0 (+1)$ but not to identify $x_{0,0}$ and $x_{1,0}$.

Now, note that, e.g.,

$(x_{0,0} + k_0 + p_{0,0}) + (x_{1,0} + k_0 + p_{1,0}) = x_{0,0} + x_{1,0} + p_{0,0} + p_{1,0}$

(where $p_{0,0}, p_{1,0}$ are constants) which is exactly of the form $x_{0,0} + x_{1,0} (+ 1)$ needed by the simplification rules applied by the  eliminate_linear_variables() function.

Thus, it might be beneficial to not consider a polynomial system of multiple plaintext-ciphertext pairs but instead to consider a system for $P_0, C_0$ and $(P_i - P_0), (C_i - C_0)$ for $i > 0$.

To apply this trick to my algebraic integral attackpolynomial system generator for PRESENT, replace the code:

for j in range(min_round,len(rounds)):
L.append(rounds[j].gens())


by

for j in range(min_round,len(rounds)):
if i == 0:
L.append(rounds[j].gens())
else:


Using the straightforward modelling for five rounds of PRESENT we end up with an equation system with 30423 polynomials in 2410 variables. PolyBoRi takes 1370.19 seconds on my Macbook Pro 6,2.

Using the little trick we end up with an equation system with 19704 polynomials in 1802 variables which PolyBoRi solves in 458.89 seconds on my machine. In both cases the same key and plaintexts were used.

Here’s the testcode:

set_random_seed(0)
F,s = present_ia(PRESENT(80,5))
t = cputime()
F = F.eliminate_linear_variables()
gb = F.groebner_basis(prot=True)
print "%s; CPU Time: %7.2f"%(F, cputime(t))


## Final Version of my PhD thesis

I passed my PhD defence on Friday. I’ve uploaded the final version of my thesis titled Algorithmic Algebraic Techniques and their Application to Block Cipher Cryptanalysis. Not much changed since the last draft, which I also posted on this blog, except a few typos and errors.

## SAT Solving Pointers

This is just a quick note to point out two SAT-solving sources relevant for cryptography.

Have fun.

## Algebraic Attacks and CNF

Since the seminal papers [1] and [2] by Bard, Courtois and Jefferson it seems accepted wisdom that the right thing to do for constructing a CNF representation of a block cipher is to construct an algebraic system of equations first (cf. [3]). This system of equations is then converted to CNF using some ANF to CNF converted (e.g. [4]) which deals with the negative impact of the XORs just introduced via the ANF. On the other hand, it is straight forward to compute some CNF for a given S-Box directly by considering its truth table. Sage now contains code which does this for you:

sage: sr = mq.SR(1,1,1,4,gf2=True,polybori=True)
sage: S = sr.sbox()
sage: print S.cnf()

[(1, 2, 3, 4, 5), (1, 2, 3, 4, 6), (1, 2, 3, 4, 7), (1, 2, 3, 4, 8), (1, 2, 3, 4, 5), (1, 2, 3, 4, 6), (1, 2, 3, 4, 7), (1, 2, 3, 4, 8), (1, 2, 3, 4, 5), (1, 2, 3, 4, 6), (1, 2, 3, 4, 7),(1, 2, 3, 4, 8), (1, 2, 3, 4, 5), (1, 2, 3, 4, 6), (1, 2, 3, 4, 7), (1, 2, 3, 4, 8), (1, 2, 3, 4, 5), (1, 2, 3, 4, 6), (1, 2, 3, 4, 7), (1, 2, 3, 4, 8), (1, 2, 3, 4, 5), (1, 2, 3, 4, 6), (1, 2, 3, 4, 7), (1, 2, 3, 4, 8), (1, 2, 3, 4, 5), (1, 2, 3, 4, 6), (1, 2, 3,4, 7), (1, 2, 3, 4, 8), (1, 2, 3, 4, 5), (1, 2, 3, 4, 6), (1, 2, 3, 4, 7), (1, 2, 3, 4, 8), (1, 2, 3, 4, 5), (1, 2, 3, 4, 6), (1, 2, 3, 4, 7), (1, 2, 3, 4, 8), (1, 2, 3, 4, 5), (1, 2, 3, 4, 6), (1, 2, 3, 4, 7), (1, 2, 3, 4, 8), (1, 2, 3, 4, 5), (1, 2, 3, 4, 6), (1, 2, 3, 4, 7), (1, 2, 3, 4, 8), (1, 2, 3, 4, 5), (1, 2, 3, 4, 6), (1, 2, 3, 4, 7), (1, 2, 3, 4, 8), (1, 2, 3, 4, 5), (1, 2, 3, 4, 6), (1, 2, 3, 4, 7), (1, 2, 3, 4, 8), (1,2, 3, 4, 5), (1, 2, 3, 4, 6), (1, 2, 3, 4, 7), (1, 2, 3, 4, 8), (1, 2, 3, 4, 5), (1, 2, 3, 4, 6), (1, 2, 3, 4, 7), (1, 2, 3, 4, 8), (1, 2, 3, 4, 5), (1, 2, 3, 4,6), (1, 2, 3, 4, 7), (1, 2, 3, 4, 8)]

I am not claiming that this naive approach produces an optimal representation, it seems more compact than what ANF to CNF converters produce, though.