## NTT Considered Harmful?

In a typical Ring-LWE-based public-key encryption scheme, Alice publishes $(a, b) = (a, a \cdot s + e) \in \mathbb{Z}_q[x]/(x^n+1)$

(with $n$ a power of two1) as the public key, where $s, e$ are both “small” and secret. To encrypt, Bob computes $(c_{0}, c_{1}) = (v \cdot a + e', v \cdot b + e'' + \textnormal{Encode}(m))$

where $v, e', e''$ are small, $m$ is the message $\in \{0,1\}^n$ and $\textnormal{Encode}(\cdot)$ some encoding function, e.g. $\sum_{i=0}^{n-1} \lfloor \frac{q}{2} \rfloor m_i x^i$ . To decrypt, Alice computes $c_{0} \cdot s - c_{1} = (v \cdot a + e')\cdot s - v \cdot (a\cdot s + e) + e'' + \textnormal{Encode}(m),$

which is equal to $e' \cdot s - v \cdot e + e'' + \textnormal{Encode}(m)$. Finally, Alice recovers $m$ from the noisy encoding of $m$ where $e' \cdot s - v \cdot e + e''$ is the noise. In the Module-LWE variant the elements essentially live in $\left(\mathbb{Z}_q[x]/(x^n+1)\right)^k$, e.g. $a$ is not a polynomial but a vector of polynomials.

Thus, both encryption and decryption involve polynomial multiplication modulo $x^n+1$. Using schoolbook multiplication this costs $\mathcal{O}(n^2)$ operations. However, when selecting parameters for Ring-LWE, we can choose $q \equiv 1 \bmod 2n$ which permits to use an NTT to realise this multiplication (we require $\equiv \bmod 2n$ to use the negacyclic NTT which has modular reductions modulo $x^n+1$ baked in). Then, using the NTT we can implement multiplication by

1. evaluation (perform NTT),
2. pointwise multiplication,
3. interpolation (perform inverse NTT).

Steps (1) and (3) take $\mathcal{O}(n \log n)$ operations by using specially chosen evaluation points (roots of one). Step (2) costs $\mathcal{O}(n)$ operations.

This is trick is very popular. For example, many (but not all!) Ring-LWE based schemes submitted to the NIST PQC competition process use it, namely NewHope, LIMA (go LIMA!), LAC, KCL, HILA5, R.EMBLEM, Ding Key-Exchange, CRYSTALS-KYBER, CRYSTALS-DILITHIUM (sorry, if I forgot one). Note that since steps (1) and (3) are the expensive steps, it makes sense to remain in the NTT domain (i.e. after applying the NTT) and only to convert back at the very end. For example, it is faster for Alice to store $s, e$ in NTT domain and, since the NTT maps uniform to uniform, to sample $a$ in NTT domain directly, i.e. to just assume that a random vector $a$ is already the output of an NTT on some other random vector.

This post is about two recent results I was involved in suggesting that this is not necessarily always the best choice (depending on your priorities.)

Warning: This is going to be one of those clickbait-y pieces where the article doesn’t live up to the promise in the headline. The NTT is fine. Some of my best friends use the NTT. In fact I’ve implemented and used the NTT myself.

## Large Modulus Ring-LWE and Module-LWE

Our paper Large Modulus Ring-LWE ≥ Module-LWE — together with Amit Deo — was accepted at AsiaCrypt 2017. Here’s the abstract:

We present a reduction from the module learning with errors problem (MLWE) in dimension $d$ and with modulus $q$ to the ring learning with errors problem (RLWE) with modulus $q^{d}$. Our reduction increases the LWE error rate $\alpha$ by a quadratic factor in the ring dimension $n$ and a square root in the module rank $d$ for power-of-two cyclotomics. Since, on the other hand, MLWE is at least as hard as RLWE, we conclude that the two problems are polynomial-time equivalent. As a corollary, we obtain that the RLWE instance described above is equivalent to solving lattice problems on module lattices. We also present a self reduction for RLWE in power-of-two cyclotomic rings that halves the dimension and squares the modulus while increasing the error rate by a similar factor as our MLWE to RLWE reduction. Our results suggest that when discussing hardness to drop the RLWE/MLWE distinction in favour of distinguishing problems by the module rank required to solve them.

Our reduction is an application of the main result from Classical Hardness of Learning with Errors in the context of MLWE. In its simplest form, that reduction proceeds from the observation that for $\mathbf{a}, \mathbf{s} \in \mathbb{Z}_{q}^{d}$ with $\mathbf{s}$ small it holds that $q^{d-1} \cdot \langle{\mathbf{a},\mathbf{s}}\rangle \approx \left(\sum_{i=0}^{d-1} q^{i} \cdot a_{i}\right) \cdot \left(\sum_{i=0}^{d-1} q^{d-i-1} \cdot s_{i}\right) \bmod q^{d} = \tilde{a} \cdot \tilde{s} \bmod q^{d}.$

Thus, if there exists an efficient algorithm solving the problem in $\mathbb{Z}_{q^d}$, we can use it to solve the problem in $\mathbb{Z}_{q}^d$.

In our paper, we essentially show that we can replace integers mod $q$ resp. $q^d$ with the ring of integers $R$ of a Cyclotomic field (considered mod $q$ resp. $q^d$). That is, we get the analogous reduction from $R_{q}^d$ (MLWE) to $R_{q^d}$ (RLWE). The bulk of our paper is concerned with making sure that the resulting error distribution is sound. This part differs from the Classical Hardness paper since our target distribution is in $R$ rather than $\mathbb{Z}$.

## Revisiting the Expected Cost of Solving uSVP and Applications to LWE

Our — together with Florian Göpfert, Fernando Virdia and Thomas Wunderer — paper Revisiting the Expected Cost of Solving uSVP and Applications to LWE is now available on ePrint. Here’s the abstract:

Reducing the Learning with Errors problem (LWE) to the Unique-SVP problem and then applying lattice reduction is a commonly relied-upon strategy for estimating the cost of solving LWE-based constructions. In the literature, two different conditions are formulated under which this strategy is successful. One, widely used, going back to Gama & Nguyen’s work on predicting lattice reduction (Eurocrypt 2008) and the other recently outlined by Alkim et al. (USENIX 2016). Since these two estimates predict significantly different costs for solving LWE parameter sets from the literature, we revisit the Unique-SVP strategy. We present empirical evidence from lattice-reduction experiments exhibiting a behaviour in line with the latter estimate. However, we also observe that in some situations lattice-reduction behaves somewhat better than expected from Alkim et al.’s work and explain this behaviour under standard assumptions. Finally, we show that the security estimates of some LWE-based constructions from the literature need to be revised and give refined expected solving costs.

Our work is essentially concerned with spelling out in more detail and experimentally verifying a prediction made in the New Hope paper on when lattice reduction successfully recovers an unusually short vector.

Denoting by $v$ the unusually short vector in some lattice $\Lambda$ of dimension $d$ (say, derived from some LWE instance using Kannan’s embedding), $\beta$ the block size used for the BKZ algorithm and $\delta_0$ the root-Hermite factor for $\beta$, then the New Hope paper predicts that $v$ can be found if $\sqrt{\beta/d} \|v\| \leq \delta_0^{2\beta-d} \, {\mathrm{Vol}(\Lambda)}^{1/d},$

under the assumption that the Geometric Series Assumption holds (until a projection of the unusually short vector is found).

The rationale is that this condition ensures that the projection of $v$ orthogonally to the first $d-\beta$ (Gram-Schmidt) vectors (denoted as $\pi_{d-\beta+1}(v)$) is shorter than the expectation for the $d-\beta+1$-th Gram-Schmidt vector $b_{d-\beta+1}^*$ under the GSA and thus would be found by the SVP oracle when called on the last block of size $\beta$. Hence, for any $\beta$ satisfying the above inequality, the actual behaviour would deviate from that predicted by the GSA. Finally, the argument can be completed by appealing to the intuition that a deviation from expected behaviour on random instances — such as the GSA — leads to a revelation of the underlying structural, secret information. In any event, such a deviation would already solve Decision-LWE.

In our work, we spell out this argument in more detail (e.g. how $v$ is recovered from $\pi_{d-\beta+1}(v)$) and throw 23k core hours at the problem of checking if the predicted behaviour, e.g. matches the observed behaviour, e.g. Just like for the above plots, the general answer is a clear “yes”.

## Pretty Pictures or GTFO

I forgot the most important bit. The behaviour of the BKZ algorithm on uSVP(-BDD) instances can be observed in this video.

You can observe the basis approaching the GSA until the SVP oracle finds the unusually short vector $\pi_{d-\beta+1}(v)$. From $\pi_{d-\beta+1}(v)$, $v$ is then immediately recovered using size reduction. The grey area is the currently worked on block. The notation in the legend isn’t consistent with the plots above or even internally ( $n$ v $d$), but the general idea should still be apparent. In case you’re wondering about the erratic behaviour of the tails (which occasionally goes all over the place), this is due to a bug in fpylll which has recently been fixed.