A nice little trick

It is well known that polynomial system solving algorithms (in particular Gröbner basis algorithm) are more efficient if the polynomial system is overdefined. Thus, a standard approach when trying to cryptanalyse block ciphers using algebraic techniques is to make the system more overdefined. One approach is to use differential characteristics; another one is to use higher-order differential cryptanalysis methods to produce a system where many state variables can be identified.

In order to speed up the computations in practice it is often beneficial to remove “redundant” variables, for instance one can remove y and replace it by x + 1 if the polynomial y + x + 1 is in the system. This is, in fact, what is done by the Sage function

F.eliminate_linear_variables()

However, this only works well if the variable which ought to be replaced is actually the leading term of a polynomial. If we consider for example algebraic higher-order differential techniques we have quite a few polynomials of the form x_{i,0} + k_0 (+ 1) for each of the 0 \leq i < n plaintext-ciphertext pairs (those encode the first key addition). Those allow us to replace x_{i,0} by k_0 (+1) but not to identify x_{0,0} and x_{1,0}.

Now, note that, e.g.,

(x_{0,0} + k_0 + p_{0,0}) + (x_{1,0} + k_0 + p_{1,0}) = x_{0,0} + x_{1,0} + p_{0,0} + p_{1,0}

(where p_{0,0}, p_{1,0} are constants) which is exactly of the form x_{0,0} + x_{1,0} (+ 1) needed by the simplification rules applied by the  eliminate_linear_variables() function.

Thus, it might be beneficial to not consider a polynomial system of multiple plaintext-ciphertext pairs but instead to consider a system for P_0, C_0 and (P_i - P_0), (C_i - C_0) for i > 0.

To apply this trick to my algebraic integral attackpolynomial system generator for PRESENT, replace the code:

for j in range(min_round,len(rounds)):
 L.append(rounds[j].gens())

by

for j in range(min_round,len(rounds)):
 if i == 0:
 L.append(rounds[j].gens())
 else:
 L.append(map(add,zip(L[j],rounds[j])))

Using the straightforward modelling for five rounds of PRESENT we end up with an equation system with 30423 polynomials in 2410 variables. PolyBoRi takes 1370.19 seconds on my Macbook Pro 6,2.

Using the little trick we end up with an equation system with 19704 polynomials in 1802 variables which PolyBoRi solves in 458.89 seconds on my machine. In both cases the same key and plaintexts were used.

Here’s the testcode:

set_random_seed(0)
F,s = present_ia(PRESENT(80,5))
t = cputime()
F = F.eliminate_linear_variables()
gb = F.groebner_basis(prot=True)
print "%s; CPU Time: %7.2f"%(F, cputime(t))
Advertisements

7 thoughts on “A nice little trick”

  1. So as far as I understand you take the first system and then for the second system you add not the system itself, but XOR of all its equations with the corresponding equations of the first system. The same with all following systems. Right? I was thinking the following. May be it makes sense to XOR only linear equations, which actually give you the needed property, and not all. For example if you XOR S-Box equations, then the result get more complex.

  2. Agreed, some improvement like this might make sense. Note however that if one uses an explicit representation of the S-Box: y_i = p(x_i) where p(x_i) is some polynomial in the input variables to the S-Box x_i then (y_0 + p(x_0)) + (y_1 + p(x_1)) will simplify to (y_0 + y_1) if x_0 and x_1 are identified, thus the simplification moves through S-boxes if one is careful about the construction of equations.
    Also it might make sense to include both: the XOR and the original equation which might be simpler.

  3. In my rather unscientific quick test I get a polynomial System with 54679 polynomials in 1773 Variables which is solved in 954.85 seconds if I add both the sums described above and the original equations for (Pi,Ci). If I add the direct representation of the S-boxes (y_i = p(x_i)) I end up with a polynomial system with 22523 polynomials in 1754 Variables which is solved in 588.83 seconds.

  4. Nice 🙂 Some remarks:

    1) As far as I remember, using eliminate_linear_variables() is a *big* pain, because it doesn’t actually tell what was equal to what — it just eliminates. So, if it discovers that “a = b”, it replaces “a” with “b”, but then doesn’t tell me that in fact “a = b”, just gives me back a system of equations without “b”. In practical terms, this renders this function utterly useless.

    2) It is very-very interesting that such a simple technique as using (Pi-P0), (Ci-C0) gives a measurable speed increase (2x, one bit!), says a lot about the number of low-hanging fruits in the area of algebraic crypto.

    As for why (2) is true, I think we are lacking the tools to get the area really rolling. Sage is nice, but you need at least 100 measurements to get some real average time of how difficult an instance is (I guess that PolyBori is randomised, like SAT solvers). And then you need to carry this measurement out for all keysizes (i.e. less and less fixed keybits), to get an idea how difficult it really would be to crack the whole cipher. At this point, you _really_ get annoyed with the speed of Python — just to build such an equation set (~20’000 polys) probably takes >30 seconds, with eliminate_linear_variables(), probably upwards of 150 secs, and then you haven’t even launched your solver, PolyBori/SAT/F4/F5.

    1. Hi,

      1) ” In practical terms, this renders this function utterly useless.”

      I disagree: it is trivial to recover the values for the replaced variables once the modified system is solved, just plug in your solutions from the modified system into the big system and collect the solutions for the remaining variables. Remember, no variable renaming is performed by this function.

      2) “I guess that PolyBori is randomised, like SAT solvers”

      No it isn’t.

      3) “At this point, you _really_ get annoyed with the speed of Python — just to build such an equation set (~20’000 polys) probably takes >30 seconds”

      15 seconds my computer, but it could be improved.

      4) “with eliminate_linear_variables(), probably upwards of 150 secs, and then you haven’t even launched your solver, PolyBori/SAT/F4/F5”

      I consider “eliminate_linear_variables()” to be part of the solving step and thus I’ve included the time this function call takes in the timings above.

  5. Hi,

    Oops! Your solution to (1) is actually quite nice 🙂 The fact that PolyBori is not randomised probably doesn’t affect the fact that you need to run it many times with different plaintext/ciphertext pairs to get some average time to solve, though. However, 500 secs sounds very reasonable as a first shot (putting it on a cluster is always possible anyhow). Building 20’000 polys in 15 secs is quite reasonable, too. I did not, however, talk into thin air when I said that Pyhton is sometimes too slow. I have observed with some of my scripts that when trivial things must be done repeatedly, Python is in fact very slow (loop unrolling is probably missing, for example). As for (4), your solution is quite a good one, I must say, but someone in the lab (Luk Bettale) already developed a very fast version of this elimination method, and with this hindsight, I would say that the system developed by Luk is more user-friendly: it gives the eliminated variables as a table, and also replaces very fast. However, as a general-purpose replacer, eliminate_linear_variables() is probably a good choice. I didn’t want my comment to come through as some sort of Sage-bashing: I think Sage is brilliant. I re-wrote most of Grain-of-Salt in it in 2 days, which took me 3-4 months to write in C++ (!). However, I do feel its shortcomings sometimes.

  6. ” The fact that PolyBori is not randomised probably doesn’t affect the fact that you need to run it many times with different plaintext/ciphertext pairs to get some average time to solve, though.”

    Definitely, the running times vary from instance to instance.

    “I did not, however, talk into thin air when I said that Pyhton is sometimes too slow. I have observed with some of my scripts that when trivial things must be done repeatedly, Python is in fact very slow (loop unrolling is probably missing, for example).”

    The beauty of Sage is that you can use the elegance of Python and still get C performance in your inner loop by using Cython, see for example http://sagemath.blogspot.com/2010/11/cython-sage-and-need-for-speed.html

    ” As for (4), your solution is quite a good one, I must say, but someone in the lab (Luk Bettale) already developed a very fast version of this elimination method, and with this hindsight, I would say that the system developed by Luk is more user-friendly: it gives the eliminated variables as a table, and also replaces very fast.”

    Let’s have a race tomorrow 🙂

    “I didn’t want my comment to come through as some sort of Sage-bashing”

    I didn’t take ’em like that.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s