Trustless SMT accumulator
March 11, 2025 · View on GitHub
This is the base data structure for implementing Unicity's aggregation layer.
By using an appropriate cryptographic SNARK, the size of proof can be reduced to constant size. Proving time depends on the logarithm of capacity and max. addition batch size.
Definitions
Trustless append only accumulator is consistent, if during insertion of a batch of updates there were no changes or deletions of existing leaves. The data structure implements other usual functions like inclusion proofs and non-inclusion proofs.
The size of consistency proof depends on the size of addition batch and logarithm of capacity. If we denote batch size as and depth , then size of consistency proof is , where .
By using a cryptographic SNARK (a zero knowledge proof with certain properties), the size of consistency proof can be further reduced to constant size.
After every addition batch, the root of aggregation layer is certified by the BFT Finality Gadget, ensuring its uniqueness and immutability. This provides a useful trust anchor for consistency proofs, inclusion proofs, and non-inclusion proofs.
Proof of Consistency
We have th batch of insertions , where is an inserted item; all executed during a round of operation. Root hash before the round is , and after the round is . The accumulator is implemented as a Sparse Merkle Tree (SMT).
The consistency proof generation for batch works as follows:
- Insert the new SMT leaves in ,
- Starting from the newly inserted leaves, for each sibling hash necessary to compute the root of the tree, we record sibling's path and sibling's value as the proof. Let's denote the set as .
- Record .
Proof verification works as follows:
- authenticate
- Build an incomplete SMT tree: for each item in , we insert the value of empty leaf at appropriate position,
- All necessary siblings necessary to compute the root are available in . Compute the root, compare with , if not equal then proof is not valid.
- Build again an incomplete SMT tree, for each item in , we insert the value of each key into appropriate position.
- Compute the root based on siblings in . If root is not equal to then proof is not valid.
- Proof is valid if the checks above passed.
This shows that given authentic , the keys in were empty before the insertion batch, and after execution of insertion batch the values in were recorded at the positions defined by respective keys, and there were no other changes.
SNARK based Proof of Consistency
Statement to be proved is the verification algorithm sketched above. Instance is defined by the root of trust and insertions, . Witness is the secret in zero knowledge, but for our use-case, it is not necessary to keep the witness secret.
The statement is implemented as a constraint system using the CIRCOM domain specific language. The witness is generated based on , and supplemented by control wires defining how individual hashing blocks in the circuit are connected to the previous layer and to the inputs. If all constraints are satisfied, then the proof is valid.
The proving backend is Groth 2016\footnote{\url{https://eprint.iacr.org/2016/260}} with conveniently small proof size. The proving time depends on the depth of SMT and the maximum size of the insertion batch. Importantly, the proving effort does not depend on the total size/capacity of SMT, enabling fairly large instantiations.
If the layer above verifies proofs of consistency, then Unicity aggregation layer is trustless. Still, some redundancy (at least one node able to persist the data structure) is required for data availability.
Circuit Design
Preprocess the proof:
- flatten the SMT (hash forest containing proof + addition batch) to the left (root is up)
- sort by layers, leaves first, and then lexicographically
- add 'wiring' signal fed into muxes before each hasher node.
Let's denote maximum batch size , SMT depth .
Circuit has two halves, both controlled by the same wiring signal. First half connects all insertion batch indices to zero (the empty element), second half to actual input values in the batch. First half computes to the root hash before insertion batch. Second half -- after.
Sise of each half is
Every following hashing layer connects inputs either to an output from previous layer, or element from proof.
Wiring signal is a pre-computed part of witness and does not have to be public. This is a lots of wires, let's see the effect on proving time.

Each cell above is implemented as a template with 2 muxes and a 2:1 hasher:

The leaf layer, first half mux inputs are connected to a vector with
- 'empty' leaf ($0$)
- all the new leaves in the batch are connected to 'empty' ($0$)
- 'proof' or sibling hashes ()
The leaf layer, second half mux inputs are connected to a vector with
- 'empty' leaf ($0$)
- batch of new leaves ()
- identical 'proof' or sibling hashes ()
Internal layers' muxes are connected to a vector a with
- 'empty' leaf
- previous layer cell output hashes,
- 'proof' or sibling hashes ()
Both halves' muxes are controlled by the same wiring signal. The positions of batch elements and proof elements are encoded into the control wires during the pre-processing, that is, control signals are part of the witness.
Walkthrough
Install circom, snarkjs, optionally rapidsnark
See https://docs.circom.io/getting-started/installation/
Generate the circuit
circom ndproof.circom --O2 --r1cs --wasm --c
Note
generated c code works on amd64 only; wasm based witness generation is slow!
Note
generated wasm code works up to a certain circuit size
Perform the trusted setup ceremony (depends on circuit)
See https://docs.circom.io/getting-started/proving-circuits/
Note
the degree 12 there is ok for trivial circuits. $2^{\mathsf{degree}}$ should be larger than the number of constraints, as output by the next command.
Example for ~5 million constraints. Appropriate to run over a weekend.
export NODE_OPTIONS="--max-old-space-size=8192"
snarkjs powersoftau new bn128 23 pot23_0000.ptau -v
snarkjs powersoftau contribute pot23_0000.ptau pot23_0001.ptau --name="First contribution" -v -e="blaah"
snarkjs powersoftau prepare phase2 pot23_0001.ptau pot23_final.ptau -v
# following is circuit specific
snarkjs groth16 setup ndproof.r1cs pot23_final.ptau ndproof_0000.zkey
snarkjs zkey contribute ndproof_0000.zkey ndproof_0001.zkey --name="1st Contributor Name" -v -e="blablaah"
snarkjs zkey export verificationkey ndproof_0001.zkey verification_key.json
Produce input
pip3 install -r reqirements.txt
python3 ndsmt.py > input.json
Witness generation
snarkjs wtns calculate ndproof_js/ndproof.wasm input.json witness.wtns
or alternatively (x86 only):
cd ndproof_cpp
make
./ndproof ../input.json ../witness.wtns
Prove
# increase nodejs limits for large circuits
export NODE_OPTIONS="--max-old-space-size=8192"
snarkjs groth16 prove ndproof_0001.zkey witness.wtns proof.json public.json
Prove using rapidsnark
prover ndproof_0001.zkey witness.wtns proof.json public.json
Verify
snarkjs groth16 verify verification_key.json public.json proof.json
Optimization ideas
- Special mux with 2 outs and multiplexed control (minor effect on number of wires, removed)
- Quin Selector instead of circomlib's mux (negative effect, removed)
- multiplex control wire signals within a layer (TBD the effect: removes wires, but adds gates)
- Unlike depicted above, layers close to the root have $1, 2, 4, 8, \dots, k_{max}$ cells
- It is possible to pack in more inputs than if some inputs share hashing steps at the leaf layer (ie, are connected to the same cell), that is, dynamic batch size to fully fill the width of circuit (up to batch generator)
- Reduce depth (), ie, use indexed Merkle tree with fixed max. capacity instead of complete SMT
- Greater arity than 2?
- Remove the special hashing rule h(0, 0) -> 0 -- then at each non-leaf layer, "empty" element is not zero and has to be hardcoded
- Split up the input proof: left half, right half, maybe even/odd layers, etc; so that less wide muxes can be used. If the tree is well populated then proofs gets larger, than currently configured.
License
MIT