## NTT Considered Harmful?

In a typical Ring-LWE-based public-key encryption scheme, Alice publishes

$(a, b) = (a, a \cdot s + e) \in \mathbb{Z}_q[x]/(x^n+1)$

(with $n$ a power of two1) as the public key, where $s, e$ are both “small” and secret. To encrypt, Bob computes

$(c_{0}, c_{1}) = (v \cdot a + e', v \cdot b + e'' + \textnormal{Encode}(m))$

where $v, e', e''$ are small, $m$ is the message $\in \{0,1\}^n$ and $\textnormal{Encode}(\cdot)$ some encoding function, e.g. $\sum_{i=0}^{n-1} \lfloor \frac{q}{2} \rfloor m_i x^i$ . To decrypt, Alice computes

$c_{0} \cdot s - c_{1} = (v \cdot a + e')\cdot s - v \cdot (a\cdot s + e) + e'' + \textnormal{Encode}(m),$

which is equal to $e' \cdot s - v \cdot e + e'' + \textnormal{Encode}(m)$. Finally, Alice recovers $m$ from the noisy encoding of $m$ where $e' \cdot s - v \cdot e + e''$ is the noise. In the Module-LWE variant the elements essentially live in $\left(\mathbb{Z}_q[x]/(x^n+1)\right)^k$, e.g. $a$ is not a polynomial but a vector of polynomials.

Thus, both encryption and decryption involve polynomial multiplication modulo $x^n+1$. Using schoolbook multiplication this costs $\mathcal{O}(n^2)$ operations. However, when selecting parameters for Ring-LWE, we can choose $q \equiv 1 \bmod 2n$ which permits to use an NTT to realise this multiplication (we require $\equiv \bmod 2n$ to use the negacyclic NTT which has modular reductions modulo $x^n+1$ baked in). Then, using the NTT we can implement multiplication by

1. evaluation (perform NTT),
2. pointwise multiplication,
3. interpolation (perform inverse NTT).

Steps (1) and (3) take $\mathcal{O}(n \log n)$ operations by using specially chosen evaluation points (roots of one). Step (2) costs $\mathcal{O}(n)$ operations.

This is trick is very popular. For example, many (but not all!) Ring-LWE based schemes submitted to the NIST PQC competition process use it, namely NewHope, LIMA (go LIMA!), LAC, KCL, HILA5, R.EMBLEM, Ding Key-Exchange, CRYSTALS-KYBER, CRYSTALS-DILITHIUM (sorry, if I forgot one). Note that since steps (1) and (3) are the expensive steps, it makes sense to remain in the NTT domain (i.e. after applying the NTT) and only to convert back at the very end. For example, it is faster for Alice to store $s, e$ in NTT domain and, since the NTT maps uniform to uniform, to sample $a$ in NTT domain directly, i.e. to just assume that a random vector $a$ is already the output of an NTT on some other random vector.

This post is about two recent results I was involved in suggesting that this is not necessarily always the best choice (depending on your priorities.)

Warning: This is going to be one of those clickbait-y pieces where the article doesn’t live up to the promise in the headline. The NTT is fine. Some of my best friends use the NTT. In fact I’ve implemented and used the NTT myself.

Advertisements