@adlrocha - The “hidden” magic behind cryptographic accumulators

A new gadget for your crypto tool-belt

Originally published at BlockchainWorks. Read it also here!

Public blockchain networks and decentralized systems can often make the implementation of the most mundane things into a complete nightmare. You don’t just have to worry about making it to work, but about doing so in a secure, private, and verifiable manner. Fortunately, researchers are taking action and we are seeing the appearance of more and more cryptographic constructions to make our lives easier. You may have heard about Bloom Filters and Merkle Trees already (and if this is not the case, let me know in the comments so we can have follow-up publication describing these constructions), but did you know about the existence of cryptographic accumulators?

A new gadget for your crypto-toolbelt 🛠️

Cryptographic accumulators have a lot of interesting properties that can come pretty handy when developing decentralized systems. You can think of cryptographic accumulators as a hash function that works on sets. Hash functions take as input a message of any size, and outputs a fixed-size value. These deterministic functions are a really efficient way to generate proofs of integrity.

But what happens if what we want to validate is a proof of membership? Accumulators take as input a set of values of any size, and turn them into a single output of constant size. This constant-size value can be then used to prove if an item belongs to the set.

Let’s illustrate how their operation with an example: Alice has received a secret message from Charlie revealing his kind of pet (BTW, it’s a duck!). To certify that this message was successfully sent, Alice generates and publishes a proof of the message. At one point, Bob wants to verify if Alice indeed knows Charlie’s kind of pet. If the proof published by Alice is a hash of the full message, Alice needs to provide Bob with the full message in order for him to be able to validate that she knows the content of the message. 

Unfortunately, Charlie’s message to Alice not only revealed information about his kind of pet but also its name, and this is something Charlie didn’t want to publicly reveal. How can Alice show Bob that she knows that Charlie has a duck without leaking information about the duck’s name? Cryptographic accumulators to the rescue! 

Instead of using a hash function to build the public proof of the message, Alice can use an accumulator. As Bob already knows about Charlie’s duck, it can directly use the accumulator’s proof to validate that the word “duck” was originally in the message. Here Bob is validating a proof of membership over the accumulator.

This was an example of the use of cryptographic accumulators on a simple commit-reveal protocol. But as shown in this paper, accumulators can be used for a great gamut of other applications: time stamping purposes, membership testing, distributed signatures, accountable certificate management, authenticated dictionaries, homomorphic signatures, authenticated data structures, group signatures, privacy-preserving data outsourcing, etc. In short, it seems something worth learning about.

Accumulators for everyone!

We’ve seen how accumulators allow parties to prove if an element x is in a set S or not, regardless of the number of elements in S. Actually, in our toy example from above, I hid some of the details of how accumulators work. In order to prove the membership of x in S, it doesn’t suffice by generating the accumulator, a membership witness/proof also needs to be created so that when combined with x you get the set’s accumulator. In the end, accumulators are fixed-length digests that can be used to generate proofs of membership (or non-membership). 

According to their construction, accumulators can be dynamic (they allow elements to be efficiently added and removed from the set represented by the accumulator), they can support exclusively membership proofs, or they can support both, membership and non-membership proofs (universal accumulators). 

As shown in this paper, Accumulator constructions fall into the following categories: unknown-order e.g. [Sander97, BP97, BM93, CL02, LLX07, Lip12, BCDLRSY17, BBF18, OWWB19, DGS20], Bilinear Maps [Nguyen05, ATSM09, CKS08, DT08, Thakur19, VB20], and hash-based [CHKO08]. Each one offers tradeoffs between setup parameters, accumulator sizes, witness and proof performance and space.

RSA accumulators afford the most efficient tradeoff between the number of setup parameters, updating witness requirements, proof computation and succinctness. The accumulator value and witness are the size of an RSA modulus. When implemented, the modulus, accumulator value and witnesses are 256 bytes with 32 byte elements, and membership proofs are 800 bytes and only require 90 milliseconds(ms) to compute, uses 2KB of RAM and 30ms to verify. Witness updates can be computed in 5–10ms depending on the underlying system. Non-membership proofs require creating two proofs but the proofs can be computed in parallel which takes the same amount of time but twice the space with 1.6KB.” [source: https://blog.coinbase.com/zksnarks-and-cryptographic-accumulators-f840da0b61c6]

The math behind the magic 📐

Accumulators not only happen to be the most efficient and convenient, but they also are easier to explain in “plain English”, so let’s use them to understand the magic behind accummulators. 

Let’s consider a set with the following elements: {A, B, C, D} (Charlie’s message to Alice). The first thing we need to do to generate an RSA accumulator from this set is to transform each item into a prime number: {p1, p2, p3, p4}. The full set can therefore be uniquely represented by the multiplication of all the items in the set:

S = p1*p2*p3*p4. 

Let’s pick now a modulus N to generate a multiplicative group. N is the product of two large primes, and G will be the generator of the group (i.e. any other prime). For those of you who already know how RSA works all of this will sound familiar (RSA accumulators rely on the same hardness assumptions as RSA asymmetric encryption). 

Let’s choose now a set of values from the set u = {p1, p2} (Charlie’s pet name) that we want to keep secret, and r = {p3, p4} the items of the set that we don’t care about revealing (Charlie’s pet type). The acumulator for S is computed as: C = G ^ S % N (recall that S = p1*p2*p3*p4). C will be a number about as large as N (there you go, now we have the fixed-value that represents Alice’s accumulator, i.e. the commitment of Charlie’s message).

If we want to now build a proof of membership of the secret values over the set, we need to compute: P = G^U % N where U = p1*p2 (multiplication of the secret items). Then we can reveal r = {p3, p4} to Bob (though he may have knowledge of it already, because he already knows that Charlie has a duck). 

To validate this proof, Bob takes  r = {p3, p4} (Charlie’s pet type), and compute C’ = P^R % N where R = p3*p4 (check if Charlie revealed his pet type in the message he sent to Alice). Due to the properties of exponentiation in multiplicative groups, if the elements in {r} are in the original set (Charlie’s message), Bob will get:

C’ = C % N as: C’ = G^U^R = G^(U * R) = G ^ U = C % N.

Eureka! We built our first proof of membership using an RSA accumulator. With this, proofs of any of the elements of the set could be generated according to your needs.

Proofs of non-memberships are a bit trickier, but the idea behind them is to choose another set with elements that do not belong to our original set S, and use them to build a proof. If you want to go in depth into the internals of RSA accumulators, I highly recommend this paper.

Wait, there’s a catch!

Did you notice? N was chosen by the party generating the accumulator and the proof, but this value needs to also be known by the verifier for this scheme to work. Even more, the “proof generator” shouldn’t have any knowledge about the factors of N as this would allow her to forge fake (but seemingly valid) proofs. How can this be fixed? One option would be to run an interactive protocol in which the verifier chooses N and share it with the proof generator, forcing her to build the proofs with that N. 

Another option is to publicly share with all entities of the system the N to be used to generate all the proofs in the system. If we choose to publish this N we must ensure that it is sufficiently safe to prevent anyone from being able to decompose N into its prime factors. Fortunately, generating secure enough Ns is something that we’ve known how to do for a long time.

Time to play (and code)! 👩‍💻

I get it! Now you are dying to start testing the power of accumulators in your own application. If this is the case, here are a few examples of accumulator implementations in C++ and Rust

If, on the other hand, you are more of a blockchain folk, EIP-198 included modular exponentiation in Ethereum, so you already have all the tools you need to implement your own RSA accumulator verifier and embedded it in a smart contract in Ethereum. If this is something you may fancy, take a look here for some inspiration on how to do this. 

And that’s all for me (at least for now), ping me with requests for my new publication. See you next week!

Publication originally written for BlockchainWorks. If you want to see one of my publications premiered in your site (before making it to the newsletter), or you want to be featured in this newsletter with one or your pieces, do not hesistate to get in touch with me! 📲

PS: Thank you Porçu for your feedback in the correctness of the math behind this publication. They've been conveniently addressed 🙏