Cryptanalysis of “Fully Homomorphic Encryption over the Binary Polynomials”

Turns out, I’m not he only one who was inspired to adapt the Fully Homomorphic Encryption over the Integers scheme by van Dijk, Gentry, Halevi and Vaikuntanathan to polynomials.  Gu Chunsheng posted a pre-print on the IACR eprint server this week which essentially instantiates the integer scheme over univariate polynomials over \mathbb{F}_2.  Below is my implementation (in Sage) of his somewhat homomorphic scheme:

class BinPolySHE:
    def __init__(self, n):
        n = n
        tau = n # choice here
        P = PolynomialRing(GF(2),'x')

        x = P.gen()

        s = P.random_element(degree=2*n+1)
        while not (s.is_irreducible() and s.degree()==2*n+1):
            s = P.random_element(degree=2*n+1)

        b = []

        a0 = P.random_element(2*n+1)
        if a0.degree() < 2*n+1:
            a0 += x**(2*n+1)
        e0 = P.random_element(degree=n-1)
        b0 = a0*s + x*e0 # deg: 4*n+2

        b.append(b0)

        for i in range(1,tau):
            ai = P.random_element(degree=n) # choice here
            ei = P.random_element(degree=n-1)
            bi = ai*s + x*ei # deg 3*n+1
            bi = bi % b0
            b.append(bi)

        self.n = n
        self.pk = b
        self.sk = s
        self.P = P

    def encrypt(self, m):
        T = []

        for i in range(1, len(self.pk)):
            if random() <= 0.5: # choice here
                T.append(i)

        c = self.P(m%2)

        x = self.P.gen()

        for i in T:
            e = self.P.random_element(degree=self.n-1)
            c += self.pk[i] + x*e

        return c % self.pk[0]

    def decrypt(self, c):
        x = self.P.gen()
        return (c % self.sk) % x

Regular readers of this blog might have noticed that the scheme looks like a bit like a univariate specialisation of this PollyCracker scheme. Indeed, just like this first PollyCracker scheme, Gu’s scheme is badly broken. Below, the source code of my attack:

    def attack(self, c):

        A = Matrix(GF(2),len(self.pk),4*self.n+2)

        x = self.P.gen()

        for i,b in enumerate(self.pk):
            for j in b.exponents():
                A[i,A.ncols()-j-1] = 1
        E = A.echelon_form(reduced=False)

        pk2 = []
        for i,r in enumerate(A.rows()):
            b = 0
            for j in range(A.ncols()):
                b += E[i,A.ncols()-j-1]*x**j
            pk2.append(b)

        for b in pk2:
            if c[b.degree()]:
                c -= b
        return c % x

The attack proceeds pretty much as discussed here: we can compute a triangular basis for the span of the public key and use that to perform all eliminations. Since the noise does not grow with each addition and does not affect the constant coefficient (which holds the message), we can essentially ignore it.

Cryptanalysis of my Somewhat Homomorphic PollyCracker Scheme

About 5 months ago I wrote about a homomorphic scheme based on integers and an adaptation of this schemes to use multivariate polynomials. At the very end I wrote: 

“Of course, I assume that my adaptation above can still be broken somehow since that’s what tends to happen with multivariate crypto schemes. Also, I’m really really not an expert on public-key cryptography. Hey, this is a blog post, not a research paper … so break it in the comments :)

Well, with some delay, here is how to break it. Recall, that the secret is some Gröbner basis g_0,\dots,g_{n-1} in \mathbb{F}[x_0,\dots,x_{n-1}] (assume \mathbb{F}_2 for simplicity) and that a ciphertext is constructed as

c = \sum_{i=0}^{n-1} f_ig_i + m' + m

where m' has constant coefficient 0 and f_i are random polynomials. Now, let \deg(f_i) = Q, \deg(g_i) = P, \deg(m') = N with N < P to ensure correct decryption (i.e. no interference of the noise with the normal form computation).

To break the scheme simply request n^(P+Q) encryptions of zero and compute the row echelon form on the linearisation of those polynomials of degree P+Q without elimination above the pivot (i.e. don’t compute the reduced row echelon form). Throw away any element of degree \leq N and call the resulting list S. This list S allows to decrypt any ciphertext c by reducing c modulo S.

Below, the attack in Sage:

class PolynomialHomomorphic:
    def __init__(self, l):
        self.l = l
        K = GF(2) # pick some field
        # we choose a ring with l unknowns, which
        # should make any GB computation at least
        # as hard as 2^l if we pick the ideal sufficiently
        # random.
        R = PolynomialRing(K, l, 'x', order='degrevlex')
        self.K = K
        self.R = R
        # our small ideal, defines the message space.
        self.b = Ideal([x for x in R.gens()])

        # these parameters are pretty arbitrary from a
        # security perspective!
        self.N = 1
        self.P = 2
        self.Q = 1
        self.key_gen()

    def key_gen(self):
        b,l = self.b, self.l
        K, R = self.K, self.R
        # we pick a random ideal with l element of degree P
        p = [R.gen(i)**self.P + R.random_element(degree=self.P-1)
              for i in range(l)]
        self.p = Ideal(p)

    def encrypt(self, m):
        # we pick some m' which encodes the
        # same plaintext but is bigger
        m = self.R(m)
        mprime = self.R.random_element(degree=self.N)
        mprime -= mprime.constant_coefficient()
        mprime += m.constant_coefficient()

       # adding a random ideal element
        for f in self.p.gens():
            mprime += self.R.random_element(degree=self.Q)*f
        return mprime

    def decrypt(self, c):
        # decryption is just as in the integer case.
        return c.reduce(self.p).reduce(self.b)

def break_cpa(pc, challenge):
    ciphertexts = [pc.encrypt(0) for _ in xrange(2*pc.l**(pc.Q+pc.P))]
    F = Sequence(ciphertexts)
    A,v = F.coefficient_matrix(sparse=False)
    E = A.echelon_form(full=False)
    s = dict([(f.lm(),f) for f in (E*v).list() if f.degree() >= pc.P])

    while True:
        print challenge
        if challenge.lm() in s:
            challenge = challenge - s[challenge.lm()]
        else:
            if challenge.degree() > pc.P:
                raise ValueError("Ah, snap! It didn't work.")
            else:
                return challenge.constant_coefficient()

This works because the noise is constructed in such a way as to never interfere with elimination: it does not affect any leading monomial of the ideal ever. Thus, we don’t need to consider it during the elimination and simply ignore the lower terms once we are done.

Note that this attack does not work against the integer homomorphic scheme by van Dijk et al. because there additions are not free: When we perform elimination of the higher order bits in the integer scheme also the noise accumulates; eventually it surpasses the size of the prime p leaving us with no information. Put differently, we can attack schemes that are inspired by the integer-based scheme if additions are free. Thus, while it might seem tempting to replace integers by, say, univariate polynomials which are often considered essentially computationally equivalent to integers and which would provide free additions  it would break the security of the scheme.

Mutants are people too

Despite being proven to be a redundant variant of the F4 algorithm, the XL algorithm still receives a lot of attention from the cryptographic community. This is partly because XL is considered to be conceptually much simpler than Gröbner basis algorithms. However, in doing so the wealth of theory available to understand algorithms for polynomial system solving is largely ignored.

The most recent and perhaps promising variant of the XL algorithm  is the family of MXL algorithms which are based around the concept of Mutants. Assume in some iteration the XL algorithm finds elements of degree k while considering degree D > k. In a nutshell, the idea of the MutantXL algorithm is to continue the XL algorithm at the degree k+1 instead of D+1 which is what the XL algorithm would do. The natural question to ask is thus what Mutants are in terms of Gröbner basis theory; are they something new or are they a concept which is already known in the symbolic computing world under a different name?

I was in Darmstadt this week visiting the group which mainly drives the effort behind the MXL family of algorithms. As part of my visit I gave a talk about the relation of the Mutant strategy and the normal strategy used in Gröbner basis algorithms for selecting critical pairs called … the Normal Selection Strategy. In the talk we show that the Mutant strategy is a redundant variant of the Normal Selection Strategy. Also, I talked quite a bit about S-polynomials and how they can be used to account for every single reduction that happens in XL-style algorithms. Finally, I briefly touched on the “partial enlargement strategy” which was introduced with MXL2 showing that it is equivalent to selecting a subset of S-polynomials in each iteration of F4.

Unfortunately, there’s no full paper yet, so the presentation has to suffice for now.

Update: It was pointed out to me that a better way of phrasing the relationship is to state that the Mutant selection strategy can be understood as a redundant variant of the Normal selection strategy when used in F4. This way is better because our statement is strictly about an algorithmic relation and not about why did what first knowing what … which is how one could read the original phrasing.

Cold Boot Key Recovery by Solving Polynomial Systems with Noise

Carlos and I finally managed to put our paper on polynomial system solving with noise and its application to the cold boot problem out.

Abstract: A method for extracting cryptographic key material from DRAM used in modern computers has been recently proposed in [9]; the technique was called Cold Boot attacks. When considering block ciphers, such as the AES and DES, simple algorithms were also proposed in [9] to recover the cryptographic key from the observed set of round subkeys in memory (computed via the cipher’s key schedule operation), which were however subject to errors due to memory bits decay. In this work we extend this analysis to consider key recovery for other ciphers used in Full Disk Encryption (FDE) products. Our algorithms are also based on closest code word decoding methods, however apply a novel method for solving a set of non-linear algebraic equations with noise based on Integer Programming. This method should have further applications in cryptology, and is likely to be of independent interest. We demonstrate the viability of the Integer Programming method by applying it against the Serpent block cipher, which has a much more complex key schedule than AES. Furthermore, we also consider the Twofish key schedule, to which we apply a dedicated method of recovery.

Btw. an older version of our code for Sage for solving polynomial systems with errors is available on bitbucket.org (… yes, I should update it to the most recent version). Here’s an example from my talk at the Tools for cryptanalysis workshop 2010:

sage: p = PRESENT(Nr=1,sbox_representation='lex')
sage: F = present_dc(,r=1,return_system=True,characteristic=True)
sage: H = F.gens()[:-64]
sage: S = F.gens()[-64:]
sage: S[:9]
(Y00100 + Y10100, Y00101 + Y10101, Y00102 + Y10102,  Y00103 + Y10103, Y00104 + Y10104, Y00105 + Y10105,  Y00106 + Y10106, Y00107 + Y10107 + 1, Y00108 + Y10108)

sage: F_prob = ProbabilisticMPolynomialSystem(F.ring(),H,S)
sage: s,t = F_prob.solve_mip(solver='SCIP')
Writing problem data to '/home/malb/.sage//temp/road/16007//tmp_1.mps'6605 records were writtenCPU Time: 0.20  Wall time: 25.95, Obj:  3.00

Not that this was a good way of attacking a blockcipher, but you get the idea.

A nice little trick

It is well known that polynomial system solving algorithms (in particular Gröbner basis algorithm) are more efficient if the polynomial system is overdefined. Thus, a standard approach when trying to cryptanalyse block ciphers using algebraic techniques is to make the system more overdefined. One approach is to use differential characteristics; another one is to use higher-order differential cryptanalysis methods to produce a system where many state variables can be identified.

In order to speed up the computations in practice it is often beneficial to remove “redundant” variables, for instance one can remove y and replace it by x + 1 if the polynomial y + x + 1 is in the system. This is, in fact, what is done by the Sage function

F.eliminate_linear_variables()

However, this only works well if the variable which ought to be replaced is actually the leading term of a polynomial. If we consider for example algebraic higher-order differential techniques we have quite a few polynomials of the form x_{i,0} + k_0 (+ 1) for each of the 0 \leq i < n plaintext-ciphertext pairs (those encode the first key addition). Those allow us to replace x_{i,0} by k_0 (+1) but not to identify x_{0,0} and x_{1,0}.

Now, note that, e.g.,

(x_{0,0} + k_0 + p_{0,0}) + (x_{1,0} + k_0 + p_{1,0}) = x_{0,0} + x_{1,0} + p_{0,0} + p_{1,0}

(where p_{0,0}, p_{1,0} are constants) which is exactly of the form x_{0,0} + x_{1,0} (+ 1) needed by the simplification rules applied by the  eliminate_linear_variables() function.

Thus, it might be beneficial to not consider a polynomial system of multiple plaintext-ciphertext pairs but instead to consider a system for P_0, C_0 and (P_i - P_0), (C_i - C_0) for i > 0.

To apply this trick to my algebraic integral attackpolynomial system generator for PRESENT, replace the code:

for j in range(min_round,len(rounds)):
 L.append(rounds[j].gens())

by

for j in range(min_round,len(rounds)):
 if i == 0:
 L.append(rounds[j].gens())
 else:
 L.append(map(add,zip(L[j],rounds[j])))

Using the straightforward modelling for five rounds of PRESENT we end up with an equation system with 30423 polynomials in 2410 variables. PolyBoRi takes 1370.19 seconds on my Macbook Pro 6,2.

Using the little trick we end up with an equation system with 19704 polynomials in 1802 variables which PolyBoRi solves in 458.89 seconds on my machine. In both cases the same key and plaintexts were used.

Here’s the testcode:

set_random_seed(0)
F,s = present_ia(PRESENT(80,5))
t = cputime()
F = F.eliminate_linear_variables()
gb = F.groebner_basis(prot=True)
print "%s; CPU Time: %7.2f"%(F, cputime(t))

Algebraic Attacks and CNF

Since the seminal papers [1] and [2] by Bard, Courtois and Jefferson it seems accepted wisdom that the right thing to do for constructing a CNF representation of a block cipher is to construct an algebraic system of equations first (cf. [3]). This system of equations is then converted to CNF using some ANF to CNF converted (e.g. [4]) which deals with the negative impact of the XORs just introduced via the ANF. On the other hand, it is straight forward to compute some CNF for a given S-Box directly by considering its truth table. Sage now contains code which does this for you:

sage: sr = mq.SR(1,1,1,4,gf2=True,polybori=True)
sage: S = sr.sbox()
sage: print S.cnf()

[(1, 2, 3, 4, 5), (1, 2, 3, 4, 6), (1, 2, 3, 4, 7), (1, 2, 3, 4, 8), (1, 2, 3, 4, 5), (1, 2, 3, 4, 6), (1, 2, 3, 4, 7), (1, 2, 3, 4, 8), (1, 2, 3, 4, 5), (1, 2, 3, 4, 6), (1, 2, 3, 4, 7),(1, 2, 3, 4, 8), (1, 2, 3, 4, 5), (1, 2, 3, 4, 6), (1, 2, 3, 4, 7), (1, 2, 3, 4, 8), (1, 2, 3, 4, 5), (1, 2, 3, 4, 6), (1, 2, 3, 4, 7), (1, 2, 3, 4, 8), (1, 2, 3, 4, 5), (1, 2, 3, 4, 6), (1, 2, 3, 4, 7), (1, 2, 3, 4, 8), (1, 2, 3, 4, 5), (1, 2, 3, 4, 6), (1, 2, 3,4, 7), (1, 2, 3, 4, 8), (1, 2, 3, 4, 5), (1, 2, 3, 4, 6), (1, 2, 3, 4, 7), (1, 2, 3, 4, 8), (1, 2, 3, 4, 5), (1, 2, 3, 4, 6), (1, 2, 3, 4, 7), (1, 2, 3, 4, 8), (1, 2, 3, 4, 5), (1, 2, 3, 4, 6), (1, 2, 3, 4, 7), (1, 2, 3, 4, 8), (1, 2, 3, 4, 5), (1, 2, 3, 4, 6), (1, 2, 3, 4, 7), (1, 2, 3, 4, 8), (1, 2, 3, 4, 5), (1, 2, 3, 4, 6), (1, 2, 3, 4, 7), (1, 2, 3, 4, 8), (1, 2, 3, 4, 5), (1, 2, 3, 4, 6), (1, 2, 3, 4, 7), (1, 2, 3, 4, 8), (1,2, 3, 4, 5), (1, 2, 3, 4, 6), (1, 2, 3, 4, 7), (1, 2, 3, 4, 8), (1, 2, 3, 4, 5), (1, 2, 3, 4, 6), (1, 2, 3, 4, 7), (1, 2, 3, 4, 8), (1, 2, 3, 4, 5), (1, 2, 3, 4,6), (1, 2, 3, 4, 7), (1, 2, 3, 4, 8)]

I am not claiming that this naive approach produces an optimal representation, it seems more compact than what ANF to CNF converters produce, though.