GSW13: 3rd Generation Homomorphic Encryption from Learning with Errors

This week our reading group studied Homomorphic Encryption from Learning with Errors: Conceptually-Simpler, Asymptotically-Faster, Attribute-Based by Craig Gentry, Amit Sahai and Brent Waters: a 3rd generation fully homomorphic encryption scheme.

The paper is partly motivated by that multiplication in previous schemes was complicated or at least not natural. Let’s take the BGV scheme where ciphertexts are simply LWE samples a_i, b_i = -\langle a_i, s\rangle + \mu_i \cdot \lceil q/2\rfloor + e_i for a_i \in \mathbb{Z}_q^n and b_i \in \mathbb{Z}_q with \mu_i being the message bit \in \{0,1\} and e_i is some “small” error. Let’s write this as c_i = (a_i, b_i) \in \mathbb{Z}_q^{n+1} because it simplifies some notation down the line. In this notation, multiplication can be accomplished by c_1 \otimes c_2 because \langle c_1 \otimes c_2, s \otimes s\rangle \approx \mu_1 \cdot \mu_2. However, we now need to map s \otimes s back to s using “relinearisation”, this is the “unnatural” step.

However, this is only unnatural in this particular representation. To see this, let’s rewrite a_i, b_i as a linear multivariate polynomial f_i = b_i - \sum_{j=1}^n a_{ij} \cdot x_j \in \mathbb{Z}_q[x_1,\dots,x_n]. This polynomial evaluates to \approx \mu on the secret s = (s_1,\dots,s_n). Note that evaluating a polynomial on s is the same as reducing it modulo the set of polynomials G = (x_1 - s_1,\dots, x_n - s_n).

Now, multiplying h = f_0 \cdot f_1 produces a quadratic polynomial. Its coefficients are produced from all the terms in the tensor product c_1 \otimes c_2. In other words, the tensor product is just another way of writing the quadratic polynomial h. Evaluating h on s evaluates to \approx \mu_1 \cdot \mu_2. To map this operation to \langle c_1 \otimes c_2, s \otimes s\rangle note that evaluating h on s is the same as taking the inner product of vector of coefficients of h and the vector (s_1^2, s_1\cdot s_2, s_1\cdot s_3, \dots, s_{n-1}, s_n, 1). For example, evaluating h = -x^2 + 3xy + y^2 + 5x + 2y + 1 at x=1 and y=-2 is the same as taking the inner product of (-1,3,1,5,2,1) and (1,-2,4,1,-2,1). That is, if we precompute all the products up to degree two of x and y the remaining operations are just an inner product.

Still, h is a quadratic polynomial, we’d want a linear polynomial. In BGV this reduction is referred to as “relinearisation” (which is not helpful if you’re coming from a commutative algebra background). Phrased in terms of commutative algebra, what we’re doing is to reduce h modulo elements in the ideal generated by G.

Let’s look at what happens when we reduce h by some element in the ideal generated by G. We start with the leading term of h which is -x^2. To reduce this term we’d add x \cdot (x-1) to h, which is an element in the ideal generated by (x-1, y+2) since it is a multiple of x-1. This produces h = 3xy + y^2 + 4x + 2y + 1 which has a smaller leading term. Now, we’d add the appropriate element with leading term 3xy and so on.

Of course, this process, as just described, assumes access to G=(x-1, y+2) which is the same as giving out our secret s. Instead, we want to precompute multiples with leading terms x^2, y^2 and xy and publish those. This is still insecure, but adding some noise, assuming circular security and doing a bit more trickery we can make this as secure as the original scheme. This is then what “relinearisation” does. There is also an issue with noise blow up, e.g. multiplying a sample by a scalar like 3 makes its noise much bigger. Hence, we essentially publish elements with leading terms 2^{\lceil \log q\rceil} x^2, 8x^2, 4x^2, 2x^2, x^2, 2^{\lceil \log q\rceil} xy, 8xy, 4xy, 2xy, xy and so on, which allows us to avoid those multiplications. Before I move on to GSW13 proper, I should note that all this has been pointed out in 2011.

The Scheme

Ciphertexts in the GSW13 scheme are N \times N matrices over \mathbb{Z}_q with 0, 1 entries. The secret key is a vector of dimensions N over \mathbb{Z}_q with at least one big coefficient v_i. Let’s restrict \mu \in \{0,1\}. We say that C encrypts \mu when C \cdot v = \mu \cdot v + e for some small e \in \mathbb{Z}_q^N. To decrypt we simply compute x = C \cdot v and check if there are large coefficients in x.

Ciphertexts in fully homomorphic encryption schemes are meant to be added and multiplied. Starting with addition, consider C_3 = C_1 + C_2. We have

C_3 \times v = (C_1 + C_2)\times v = C_1\times v + C_2\times v \approx \mu _1\times v_i + \mu_2 \times v_i = (\mu _1 + \mu_2)\times v_i

Moving on to multiplication, we first observe that if matrices C_1 and C_2 have a common exact eigenvector v with eigenvalues \mu_1 and \mu_2, then their product has eigenvector v with eigenvalue \mu_1 \cdot \mu_2. But what about the noise?

Considering C_3 = C_1 \times C_2:

  1. = C_1 \times (\mu_2 \cdot v + e_2)
  2. = C_1 \times \mu_2 \cdot v + C_1 \times e_2
  3. = \mu_2 \cdot (C_1 \times v) + C_1 \times e_2
  4. = \mu_2 \cdot (\mu _1 \cdot v + e_1) + C_1 \times e_2
  5. = \mu_2 \cdot \mu _1 \cdot v + \mu_2 \cdot e_1 + C_1 \times e_2
  6. \approx \mu_2 \cdot \mu_1 \cdot v

In the above expression C_1, e_2, \mu _2, e_1 are all small, v is not. In other words, the signal is not drowned by the noise and we have achieved multiplication. How far can we go with this, though?

Multiplicative Depth

Assume C_i, \mu_i, e_i are all \leq B in absolute value. After multiplication C_3 is bounded by (N+1)\cdot B^2. Hence, if we have L levels, the noise grows worse than B^{2^L}. We require B^{2^L} < q, otherwise the wrap-around will kill us. Also,q/B must be sub-exponential in N for security reasons. The bigger this gap, the easier the lattice problem we are basing security on and if the gap is exponential then there is a polynomial-time algorithm for solving this lattice problem: the famous LLL algorithm. As a consequence, the scheme as-is allows to evaluate polynomials of degree sublinear in N.

To improve on this, we need a new notion. We call C strongly B bounded if the entries of C are bounded by 1 and the error e_i is bounded by B.

In what follows, we will only consider a NAND gate: C_3 = I_N - C_1 \cdot C_2. NAND is a universal gate, so we can build any circuit with it. However, in this context its main appeal is that it ensures that the messages \mu_i stay \in \{0,1\}. Note the \mu_2 \cdot e_1 term above which is big if \mu is big. If C_1 and C_2 are strongly B bounded then error vector of C_3 is bounded by (N+1)\cdot B instead of (N+1)\cdot B^2. Now, If we could make the entries of C_3 bounded by 1 again, C_3 itself would be strongly (N+1)\cdot B bounded and we could repeat the above argument. In other words, this would enable circuits of depth (N+1)^L \cdot B, i.e. polynomial depth circuits (instead of merely polynomial degree) when q/B is sub-exponential as above.

Flatten

We’ll make use of an operation BitDecomp which splits a vector of integers \leq q into a \log q longer vector of the bits of the integers. For example, (7,2) \mod 8 becomes (1,1,1,0,1,0) which has length \log_2 8 \cdot 2 = 3 \cdot 2 = 6. Here’s a simple implementation in Sage:

def bit_decomp(v):
    R = v.base_ring()
    k = ZZ((R.order()-1).nbits())
    w = vector(R, len(v)*k)
    for i in range(len(v)):
        for j in range(k):
            if 2**j &amp; ZZ(v[i]):
                w[k*i+j] = 1
            else:
                w[k*i+j] = 0
    return w

We also need a function which reverses the process, i.e. adds up the appropriate powers of two: (1 + 2 + 4 = 7, 2). It is called \texttt{BitDecomp}^{-1} in the GSW13 paper, but I’ll simply call it … BitComp.

def bit_comp(v):
    R = v.base_ring()
    k = ZZ((R.order()-1).nbits())
    assert(k.divides(len(v)))
    w = vector(R, len(v)//k)
    for i in range(len(v)//k):
        for j in range(k):
            w[i] += 2**j * ZZ(v[k*i+j])
    return w

Actually, the definition of BitComp is a bit more general than just adding up bits. As defined above — following the GSW13 paper — it will add up any integer entry of v multiplied with the appropriate power of two multiplied in. This is relevant for the next function we define, namely Flatten which we define as BitDecomp(BitComp(⋅)).

flatten = lambda v: bit_decomp(bit_comp(v))

Finally we also define PowersOf2 which produces a new vector from v as (v_1, 2\, v_1, 2^2\,v_1,2^{\lceil \log q\rceil}\, v_1 \dots ,2^{\lceil \log q\rceil}\, v_n):

def powers_of_two(v):
    R = v.base_ring()
    k = ZZ((R.order()-1).nbits())
    w = vector(R, len(v)*k)
    for i in range(len(v)):
        for j in range(k):
            w[k*i+j] = 2**j * v[i]
    return w

For example the output of PowersOf2 on (3,1) for q = 8 is (3,6,4,1,2,4). Having defined these functions, let’s look at some identities. It holds that

\langle \texttt{BitDecomp}(v),\texttt{PowersOf2}(w)\rangle = \langle v,w\rangle

which can verified by recalling integer multiplication. For example, 5 \cdot 3 = (1\cdot 2^2 + 0 \cdot 2^1 + 1\cdot 2^0) \cdot 3 = 1\cdot (2^2 \cdot 3) + 0\cdot (2^1 \cdot 3) + 1\cdot (2^0 \cdot 3) = 15. Another example: \langle (7,2), (3,1)\rangle = 23 and

\langle (1,1,1,0,1,0), (3,6,12,1,2,4)\rangle = 3 + 6 + 12 + 2 = 23.

Or in the form of some Sage code:

q = 8
R = IntegerModRing(q)
v = random_vector(R, 10)
w = random_vector(R, 10)
v.dot_product(w) == bit_decomp(v).dot_product(powers_of_two(w))

Furthermore, let a be an N dimensional vector and b be a k = N/\log q dimensional vector. Then it holds that

\langle a,\texttt{PowersOf2}(b)\rangle = \langle\texttt{BitComp}(a),b\rangle,

because \sum a_{ij} (2^{j} b_i) = \sum (a_{ij} 2^{j}) b_i.

Finally, we have

\langle \texttt{BitComp}(a),b\rangle = \langle \texttt{Flatten}(a),\texttt{PowersOf2}(b)\rangle,

by combining the previous two statements.

For example, let a = (2,3,0), b=(3,), q=8.

  1. BitComp on a gives 2\cdot 2^2 + 3\cdot 2 + 0\cdot 1 = 6
  2. BitDecomp on 6 gives (1,1,0),
  3. For the left-hand side we have 6 \cdot 3 \bmod 8 = 2.
  4. For the right-hand side we hve 1\cdot 2^2\cdot 3 + 1\cdot 2^1\cdot 3 + 0\cdot 2^0\cdot 3 = 2.

The same example in Sage:

q = 8
R = IntegerModRing(q)
a = vector(R, (2,3,0))
b = vector(R, (3,))

bit_comp(a).dot_product(b) == flatten(a).dot_product(powers_of_two(b))

Hence, running Flatten on C_3 produces a strongly B bounded matrix C_{3}. Boom.1

Key Generation

It remains to sample a key and to argue why this construction is secure if LWE is secure for the choice of parameters.

To generate a public key, pick LWE parameters n, q and m = O(n \log q) and sample A, A\cdot s + e = c as usual.

The secret key is v = \texttt{PowersOf2}(s) of dimension N = (n+1)\cdot \log q. The public key is A,c, where we roll A,c into A, which now has dimension m \times (n+1).

To encrypt, sample an N \times n matrix with 0, 1 entries and compute R \times A which is an N \times (n+1) matrix. That is, we do exactly what we would do with Regev’s original public-key scheme based on LWE: doing random 0,1 linear combinations of the rows of A to make fresh samples. Run BitDecomp on the output to get a N \times N matrix. Finally, set

C = \texttt{Flatten}(\mu \cdot I_N + \texttt{BitDecomp}(R \times A)).

For correctness, observe:

  1. C \times v = \mu \cdot I_N \times v + \texttt{BitDecomp}(R \times A) \times v
  2. C \times v = \mu \cdot I_n \times v + R \times A \times s
  3. C \times v = \mu \cdot I_n \times v + something small.

The security argument is surprisingly simple. Consider C' = \texttt{BitComp}(C). Because C is already the output of Flatten it reveals nothing more than C'.

Unpacking C' to \texttt{BitComp}(\mu \cdot I_N) + R\cdot A, note that R\times A is statistically uniform by the leftover hash lemma for uniform A. Finally, A is indistinguishable from a uniform by the decisional LWE assumption. Boom.

You promised 3rd Generation

So far, this scheme is not more efficient than previous ring-based schemes such as BGV, even asymptotically. However, an observation by Zvika Brakerski and Vinod Vaikuntanathan in Lattice-Based FHE as Secure as PKE changed this. This observation is that the order of multiplications matters. Let’s multiply four ciphertexts C_1, C_2, C_3, C_4.

The traditional approach would be:

  1. C_{12} = C_1 \times C_2,
  2. C_{34} = C_3 \times C_4
  3. C_{1234} = C_{12} \times C_{34}

In this approach the noise grows as follows:

  1. e_{12} = \mu _2 \cdot e_1 + C_1 \times e_2
  2. e_{34} = \mu _4 \cdot e_3 + C_3 \times e_4
  3. e_{1234} = \mu _{34} \cdot e_{12} + C_{12} \times e_{34}
  4. e_{1234} \leq e_1 + C_1 \times e_2 + C_{12} \times e_3 + C_{12} \times C_3 \times e_4
  5. e_{1234} \approx N^2 \cdot B

Note the C_{12} \times C_3 factor. That is, for \ell multiplications we get a noise of size \textrm{poly}(N)^{\log \ell}.

In contrast, consider this sequential approach:

  1. C_{34} = C_3 \times C_4,
  2. C_{234} = C_{34} \times C_2
  3. C_{1234} = C_{234} \times C_1

Under this order, the noise grows as:

  1. e_{34} = \mu _4 \cdot e_3 + C_3 \times e_4
  2. e_{234} = \mu _{2} \cdot e_{34} + C_{34} \times e_{2}
  3. e_{234} = \mu _{2} \cdot (\mu _4 \cdot e_3 + C_3 \times e_4) + C_{34} \times e_{2}
  4. e_{234} \leq e_3 + C_3 \times e_4 + C_{34} \times e_2
  5. e_{1234} = \mu _1 \cdot e_{234} + C_{234} \times e_1
  6. e_{1234} \leq e_3 + C_3 \times e_4 + C_{34} \times e_2 + C_{234} \times e_1
  7. e_{1234} \approx 4N \cdot B

Note that each 0,1 matrix is mutliplied by some e_i only. Hence, here the noise grows linearly in number of multiplications i.e. as \ell \cdot \textrm{poly}(N).

Footnotes:

1

I suggest people start writing Boom instead of qed.
Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s