Post-quantum oblivious PRFs from shallow PRFs and TFHE

We – Alex Davidson, Amit Deo, Daniel Gardham and me – have updated our pre-print Crypto Dark Matter on the Torus: Oblivious PRFs from shallow PRFs and TFHE. It has been around for a while, but I am now somewhat confident that we won’t squeeze more performance out of it for the time being, so this feels like the right time to blog about.

What is an OPRF and why should I care? Oblivious pseudorandom functions allow two parties to compute a pseudorandom function (PRF) z := F_k(x) together: a server supplying a key k and a user supplying a private input x. The server does not learn x or z and the user does not learn k. If the user can be convinced that z is correct (i.e. that evaluation is performed under the correct key) then the function is “verifiable oblivious” (VOPRF), otherwise it is only “oblivious” (OPRF). Both may be used in many cryptographic applications. Example applications include anonymous credentials (e.g. Cloudflare’s PrivacyPass), password-based key exchanges (e.g. OPAQUE) and Private Set Intersection (PSI) enabling e.g. privacy-preserving contact look-up on chat platforms.

Sounds good, seems solved! Despite the wide use of (V)OPRFs, most constructions are based on classical assumptions, such as Diffie-Hellman (DH), RSA or even pairing-based assumptions. Indeed, DH-based OPRFs are currently being standardised by the IETF. Their vulnerability to quantum adversaries makes it desirable to find post-quantum solutions, however, known candidates are much less efficient.

Oblivious PRF and FHE, I see where this is going … Indeed, given fully homomorphic encryption (FHE), there is a natural (P)OPRF candidate. The client FHE encrypts input x and sends it with tag t. The server then evaluates the PRF homomorphically or “blindly” using a key derived from t and its own secret key. Finally, the client decrypts the resulting ciphertext to obtain the PRF output. The first challenge with this approach is performance, PRFs tend to have sufficiently deep circuits that FHE schemes struggle to evaluate them efficiently. Even special purpose PRFs such as the LowMC construction require depth ten or more, making them somewhat impractical. More generally, in a binary circuit model we expect to require depth \Theta( \log \lambda) to obtain a PRF resisting attacks with complexity 2^{\Theta(\lambda)}.

Yet, if we expand our circuit model to arithmetic circuits with both mod p and mod q gates for p\neq q both primes, shallow proposals exist. The main proposal even has a kewl name: “Crypto Dark Matter PRF”. In particular, the (weak) PRF candidate is

z := \sum (\mathbf{A} \cdot \mathbf{x} \bmod 2) \bmod 3

where arithmetic operations are over the integers and \mathbf{A} is the secret key. That’s it! The same work also contains a proposal to “upgrade” this weak PRF, defined for uniformly random inputs \mathbf{x}, to a full PRF, taking any \mathbf{x}. Furthermore, the works already provide oblivious PRF candidates based on this PRF and MPC, but with non-optimal round complexity. Thus, a natural question to ask is if we can construct a round-optimal (or, 2 message) POPRF based on this PRF candidate using the FHE-based paradigm mentioned above.

So what did you actually do? We construct a novel POPRF from lattice assumptions and the “Crypto Dark Matter” PRF candidate in the random oracle model. At a conceptual level, our scheme exploits the alignment of this family of PRF candidates, relying on mixed modulus computations, and programmable bootstrapping in the torus fully homomorphic encryption scheme (TFHE). This allows us to construct an OPRF candidate using only one level of bootstrapping (the most expensive operation in a FHE computation). We also explore a cut-and-choose based strategy for adding verifiability to our OPRF.

Performance. For the core online OPRF functionality, we require amortised 10.0KB communication per evaluation and a one-time per-client setup communication of 2.5MB. I’d say his makes our OPRF practical size-wise. Client computation costs are also somewhat manageable but server computation costs are quite unattractive unless you’re willing to invest in some FHE co-processor. We have some early benchmarks (using tfhe-rs) running the server code in ~150ms on 64 cores. Let me stress that this does not account for “circuit privacy” which should add a factor of 5x to 10x (or the zk systems we need, but we assume those won’t add that much overhead in computation, we do include their sizes in the estimates above). Moreover, our relatively small sizes are the effect of aggressive packing, unpacked the key material should weight about ~2GB in RAM.

Implementation. We do have a somewhat complete implementation of our scheme, but it is in SageMath and thus extremely slow. I should mention, though, that this implementation, too, does not cover the zero-knowledge proof systems we rely on to achieve malicious security.

Leave a comment