zk-SNARKs Explained with Bellman
Zero knowledge proofs (ZKP) is one of the the most exciting cryptographic inventions since Public-key cryptography. zk-SNARK is a specific type of ZKP which allows one party (the prover) to convince other parties (the verifiers) that the prover faithfully executed a program with some secret information, without conveying any knowledge about the secret information itself. zk-SNARK also generates succinct (hundreds of bytes) proofs that can be verified within the range of miliseconds even though the original computation might be much larger, which means that other than the obvious privacy benefits, zk-SNARK can also be used to trustlessly outsource expensive computation.
To apply zk-SNARK, a computation needs to be expressed in terms of algebraic circuit, converted to a constraint system called R1CS, and then finally a form called “quadratic arithmetic program” (QAP). The intuition of going through this complex transformation is that:
- A computation can be viewed as series of constraints from the inputs to the outputs, but just expressing a computation in terms of constraints neither make the computation more “succinct” nor make it easier to apply cryptographic algorithms.
- QAP in a way “compresses” those constraints into one using polynomials, a form which is not only more efficient to verify but also more suitable to apply cryptographic protocols such as Pinocchio or Groth16 to achieve soundness, completeness and zero-knowledgeness.
Steps of transformation for zk-SNARK, drawn by Eran Tromer
For more detailed explanations of the QAP transformation, please take a look at
Why and How zk-SNARK works by
Maksym Petkus or Quadratic Arithmetic Programs: from Zero to
Hero
by Vitalik Buterin. To use an example in Vitalik’s article, a simple
program x³ + x + 5 == 35
can be converted into the following QAP.
QAP for x³ + x + 5 == 35, as in Vitalik’s Quadratic Arithmetic Programs: from Zero to
Hero
This article assumes that the readers understand the process of QAP
transformation, specifically why proving A(τ)*B(τ)-C(τ) = H(τ)*Z(τ)
is important in terms of completenss and soundness. It also assumes
some high level understanding of the elliptic curve
paring. The
purpose of the article is to explain how zk-SNARK works by walking
through how it is implemented in
Bellman (0.7.0), a rust library
used by the first widespread zk-SNARK application
ZCash. Bellman currently implements the
Groth16 proving system, which
can be roughly divided into three parts: Trusted setup, Prover
and Verifier. This article discusses each part in details.
Trusted setup
Trusted setup is a process to create what’s called “Common Reference String” (CRS) between provers and verifiers. CRS are basically encrypted secrets that are required by both provers and verifiers to run the cryptographic protocols (e.g. Groth16) in zk-SNARK. Verifiers need to trust that the original secrets (a.k.a toxic waste) are destroyed after the setup process, otherwise the entire proof system breaks down.
First, let’s define some parameters for a computation. We can use m
to denote its number of variables, l
of which are public. Take x³ +
x + 5 = 35
as an example where the solution 3 is the secret. This
program can be broken up into the following 3 constraints in the form
of A * B = C
:
There are 4 variables in total (i.e. x
, x_squared
, x_cubed
and
out
), among which out
has a publicly known value of 35. Therefore
m
and l
should be 4 and 1 respectively.
Another important parameter for the computation is the number of
constraints n
, which determines the target polynomial Z(x)
. In the
above example, the value of n
should be 3.
Assuming that the program is converted to the QAP form, which means that
we know the variable polynomials Aᵢ(x)
, Bᵢ(x)
and
Cᵢ(x)
. If the prover knows the value of all variables wᵢ
(both the public ones and secret ones), then ultimately (s)he wants to prove the
following statement in such a way that reveals nothing about the secret
portion of wᵢ:
Groth16 protocol is way to achieve that. It requires two
paring friendly curves: G1
, G2
(with the paring domain
GT
). During the trusted setup, the first thing it does is to select
one random point on each curve: g1
and g2
.
Next, it generates the following 5 random field elements: α
(alpha),
β
(beta), γ
(gamma), δ
(delta) and τ
(tau). τ
is the secret
value at which all the polynomials are evaluated later on. With α
and β
, it also defines a polynomial Lᵢ(x)
:
With the information created so far, Groth16 generates the
following G1
and G2
points (values encrypted using g1
or
g2
) for the prover:
For the verifier, the following points on G1
and G2
are created,
along with a pre-computed value g₁ᵅ * g₂ᵝ
using elliptic curve
paring.
Including g₁ᵅ * g₂ᵝ in CRS improves the verification performance since
it is needed every time a proof is verified.
After all these information is created, it is mandatory that random
field elements α
, β
, γ
, δ
and τ
(a.k.a toxic waste) are
destroyed. In the Prover and Verifier section, we will discuss
how these information is used by both the prover and verifier in such
a way that the prover can convince the verifier that A(x) * B(x) - C(x) = H(x) *
Z(x)
is valid without revealing any secrets. For now let’s take a look
at how trusted setup is implemented in Bellman.
Implementation
The code that implements trusted setup in Bellman is located in
generator.rs,
specifically the
generate_random_parameters
function. The code that
generates
g1
, g2
, α
, β
, γ
, δ
and τ
is pretty straightforward.
In Bellman, a computation is expressed in “circuits” using the
Circuit
and
ConstraintSystem
abstraction. For x³ + x + 5 = 35
, which has the following
constraints:
We can express the 2nd constraint x_squared * x = x_cubed
in bellman like this:
cs
in the example above is an instance of a
ConstraintSystem. During
the trusted setup, circuits are converted to QAP using the
KeypairAssembly
ConstraintSystem.
When
alloc_input
or
alloc
is called to allocate a variable, an empty vector is pushed to
at_inputs
, bt_inputs
, ct_inputs
or at_aux
, at_aux
, at_aux
respectively. The index of the empty vector is used to keep track of
the newly allocated variables.
These empty vectors is used to store a sequence of (coefficient,
constraint_number)
tuple for each variable, basically
where (which constraint) and how (what coefficient) a particular
variable is used in the
circuit. enforce
function ensures that a sequence of these tuples is stored correctly for
each variable.
After the circuit is
synthesize
with KeypairAssembly (basically all the alloc
, alloc_input
and enforce
in
the circuit are executed), what we end up with is
essentially the lagrange representation of the public variable
polynomials stored in at_inputs
, bt_inputs
, ct_inputs
, and
secret variable polynomials in at_aux
, bt_aux
, ct_aux
. We now
have Aᵢ(x)
, Bᵢ(x)
and Cᵢ(x)
!
Recall that we also need τⁱ (i in 0..n-1) as both G1 and G2 points in
the trusted setup. This is called powers of tau, which is important
for evaluating polynomials blindly at value τ. Further more, we need
τⁱZ(τ)/δ
as well where Z(x)
is the target polynomial.
This is how powers of tau is calculated in bellman:
Z(τ)/δ
is
calculated
with
and coeff
is later on
multiplied
with powers of tau to get τⁱZ(τ)/δ
.
The last values we want to create during the trusted setup is Lᵢ(τ)/δ
for public variables and Lᵢ(τ)/γ
for secret variabes where
Lᵢ(x) = β * Aᵢ(x) + α * Bᵢ(x) + Cᵢ(x)
as we discussed
earlier. Since we already know α
, β
and Aᵢ(x)
, Bᵢ(x)
, Cᵢ(x)
at this point, we basically need to evaluate Lᵢ(x)
polynomial at the
point of τ
. This is where powers of tau (τⁱ) comes in handy. Let’s
say if Lᵢ(x)
is in the form of a*xᵘ+ b*xᵛ + c*xʷ
, we just need to
find the corresponding τᵘ, τᵛ and τʷ to calculate Lᵢ(τ) = a*τᵘ+
b*τᵛ + c*τʷ
. Since i
in τⁱ
is determined by the number of
constraints, we should have all the τⁱ
needed for the evaluation.
Here
is how Lᵢ(τ)/δ
is calculated at τ
in Bellman, specifically:
As we can see, in order to evaluate Lᵢ(τ)
, we need to evaluate
Aᵢ(τ)
, Bᵢ(τ)
and Cᵢ(τ)
as well. If we think about it, Lᵢ(τ)
,
Aᵢ(τ)
, Bᵢ(τ)
and Cᵢ(τ)
are the lagrange representation of
L(x)
, A(x)
, B(x)
and C(x)
. Prover only needs the secret
variable portion of the Lᵢ(τ)
, verifier only needs the public
variable portion of the Lᵢ(τ)
. Here is the
VerifyingKey
which is created
at the end of the trusted setup for the verifier:
Finally at this point, we have created all the parameters we need, here are all the Parameters that the trusted setup process returns:
Prover
The prover knows the value of all m
variables in the computation,
both public input variables (l
in total) and secret auxiliary
variables (m-l
in total). These variables can be represented by
wᵢ
. When i
between 0
to l
, wᵢ
represents input (public)
variables. When i
is between l+1
to m
, wᵢ
represents auxiliary
(secret) variables. With this knowledge, the prover wants to
construct three values: Aₚ, Bₚ and Cₚ, as defined below:
r
and s
are randomly generated field elements by the prover.
With Aₚ, Bₚ and Cₚ, the verifier can somehow verify that the statement
A(τ)*B(τ) - C(τ) = H(τ)*Z(τ)
is valid and therefore confirm that the
prover had executed the computation faithfully without revealing the
auxiliary variables. We will break down how this is achieved later
when we discuss verifiers. For now, let’s focus on how the prover
constructs Aₚ, Bₚ and Cₚ.
From the trusted setup, the prover knows the G1
elements α
, δ
and
G2
elements β
, δ
. The prover also knows the “computation”, which is
expressed as R1CS circuit and then converted to Quadratic Arithmetic
Program (QAP).
Concretely, the QAP program is expressed using A
, B
and C
where
Also remember that L
and H
are defined in terms of A
, B
, C
and T
, as follows
Knowing the value of all variables wᵢ
, the prover has everything
needed to construct Aₚ, Bₚ and Cₚ.
Implementation
The prover code in bellman is in located in prover.rs, specifically the create_proof function.
First of all, the circuit is synthesized using the ProvingAssignment data structure, which is also a type of ConstraintSystem.
When
alloc_input
or
alloc
is called, the actual value of the variable is pushed to
input_assignment
and aux_assignment
vector respectively. Variables
are tracked by their position (index) in those vectors.
When
enforce
is called, it evaluates A
, B
and C
for each operation (in the
form of A*B=C
) using the variable values in input_assignment
and aux_assignment
. The evaluation results are pushed to a
, b
and c
vectors respectively. After the entire circuit is
synthesized,
a
, b
and c
essentially becomes the lagrange representation of
A(x)
, B(x)
and C(x)
.
Aₚ is relatively easy to create, it is equal to α + A(τ) + r*δ
(prover:304):
Bₚ is computed in a very similiar
way. What
is more interesting is Cₚ since it involves a more complex
H(τ)*(Z(τ)/δ)
expression. Recall that H(τ) is basically
(A(t) * B(t) - C(t)) / Z(t)
and Z(x)
is the target polynomial that
is publicly known. Here is how H(τ)*(Z(τ)/δ)
gets
computed
(represented by h
in the code):
Now the prover knows H(τ)*(Z(τ)/δ), it becomes a bit easier to computed Cₚ:
As we can see in the code, Cₚ is actually computed as
L_aux(τ)/δ + H(τ)*(Z(τ)/δ) + r*B(τ) + s*A(τ) + r*s*δ + s*α + r*β
,
which is in fact equivalent to
L_aux(τ)/δ + H(τ)*(Z(τ)/δ) + s*Aₚ + r*Bₚ - r*s*δ
. Here is why:
At this point, Aₚ, Bₚ and Cₚ are computed and prover completes the proof.
Verifier
The goal of the verifier is to take the proof from the prover, i.e. Aₚ, Bₚ and Cₚ, and verify that the following equation holds:
To see what exactly does it mean if this equation holds, we can try to expand on both the left hand side (LHS) and the right hand side (RHS) of the equation.
LHS
Aₚ*Bₚ
is equivalent to A(τ)*B(τ) + REM
:
Click here to see a more detailed deduction
RHS
α*β + (L_input/γ)*γ + Cₚ*δ
is equivalent to C(τ) + H(τ)*Z(τ) + REM
:
Click to see a more detailed deduction
If we can prove LHS = RHS
, then after removing REM
on both sides,
we also proved that A(τ)*B(τ)-C(τ) = H(τ)*Z(τ)
, which is the end
goal of the verifier! Once this is valid, the verifier is confident
that the program is executed faithfully with correct values of the
auxiliary variables.
Implementation
The verification code in Bellman is located at
verifier.rs. As
explained by the comments, the original equation
Aₚ*Bₚ = α*β + (L_input(τ)/γ)*γ + Cₚ*δ
is converted to
Aₚ*Bₚ + (L_input(τ)/γ)*(-γ) + Cₚ*(-δ) = α*β
, which only requires one
final exponentiation (verifier.rs:44).
The E::final_exponentiation
and E::miller_loop
functions are two
steps needed to compute the elliptic curve
paring.
The paring of α*β
(pvk.alpha_g1_beta_g2
) is already pre-computed in
the setup phase.
Summary
In this article, we walked through how zk-SNARK with Groth16 protocol is implemented in Bellman, which actually powers a nearly 1 billion dollar cryptocurrency (at the time of writing). Instead of just relying on government regulations and the mercy of big internet companies, I hope zk-SNARK and ZKP in general can be part of the solution for our privacy issues in the future.