Plonky = Plonk + Halo + Rescue

最新推荐文章于 2024-06-07 10:45:09 发布

mutourend

最新推荐文章于 2024-06-07 10:45:09 发布

阅读量498

点赞数 2

分类专栏：零知识证明文章标签：零知识证明

本文链接：https://blog.csdn.net/mutourend/article/details/115435149

版权

零知识证明专栏收录该内容

337 篇文章 137 订阅

订阅专栏

1. 引言

由mir protocol团队实现的Plonky，代码库为：

https://github.com/mir-protocol/plonky

Plonky为mir团队基于 “Plonk+Halo+Rescue” 实现的 fast recursive argument 原型，其核心组成为：

Plonk的permutation argument 【需open多个polynomial commitments at a challenge point $x$ ，以及open其中一个at an additional point $y$ 。】
Halo的polynomial commitment scheme 【可将多个opening reduce为一个opening，但是Verifier需做大量昂贵的multi-scalar multiplication，借鉴Plonk中的circuit model 思想，引入了Elliptic curve gate。】
Rescue的high-arity circuit model 【解决了在recursive circuit中的另一个性能瓶颈来自于——生成Fiat-Shamir challenge。】

通过将以上三者结合，Plonky可recursively verify an argument using around $2^{14}$ gates，尽管它还没有准备好真正使用，但是在6核笔记本上，Plonky构建resursive proofs仅需约9秒钟，具有令人欣喜的性能。

2. Batch opening polynomial commitments

Verify Plonk argument的过程中，需open多个polynomial commitments at a challenge point $x$ ，同时open 其中的一个at an additional point $y$ 。
在Halo论文中，介绍了将多个openings reduce为一个opening的算法实现，但是，该算法要求在Verifier端为每个opeing额外增加大量的计算，这种对Verifier的算力要求在实际应用中应尽量避免。

Plonky中，借鉴了 Multipoint, multipolynomial batched openings from inner-product arguments 中的batch polynomial opening算法：
令 $G$ 为a squence of random generators， $X=(x^0,x^1,x^2,\cdots)$ ， $Y=(y^0,y^1,y^2,\cdots)$ 。

Prover：发送每个polynomial commitment $c(f_i)=<f_i,G>$ ，及所宣称的evaluation值 $z_i=<f_i,X>,w_i=<f_i,Y>$ 。
Verifier：给Prover发送random challenge $\alpha,\beta\in\mathbb{F}$ ，并计算 $Z=\sum_i\alpha^iz_i,W=\sum_i\alpha^iw_i,c(F)=\sum_i\alpha^ic(f_i)$ ，其中 $F=\sum_i\alpha^if_i$ 。
此时转为由Prover证明 $F (x) = Z, F (y) = W$ ，with negligible loss of soundness，reduce为证明 $<F,X+\beta Y>=Z+\beta W$ 。
Prover：采用inner product argument来证明 $<F,X+\beta Y>=Z+\beta W$ 即可。
Verifier：需要计算 $< s, G >$ 和 $<s,X+\beta Y>$ ，其中 $s$ 在Halo论文中做了定义，实际即为inner product argument中每一轮的challenge组合。
$< s, G >$ 的计算可采用Halo中的算法，而 $X+\beta Y>=<s,X>+\beta<s,Y>=g(X,u_i)+\beta g(Y,u_i)$ ，从而转换为多项式证明。

3. Halo中的瓶颈：curve multiplication

在Halo中，Verifier在验证polynomial opening时，需计算：
$Q=\sum_{j=1}^{k}([u_j^2]L_j)+P'+\sum_{j=1}^{k}([u_j^{-2}]R_j)$
其中， $L_j,R_j$ 由Prover提供， $u_j\in\{0,1\}^{\lambda}$ 为random challenge。

每个curve multiplication $[r] P$ 若通过简单的double-and-add算法来计算，需要 $\lambda$ 个addition和 $\lambda$ 个doubling运算。

Halo中采用了类似的算法，但是每次以2个bit为单位进行计算，加法对应为 $\{P,-P,\phi(P),\phi(-P)\}$ 中的一个，其中 $\phi$ 为计算起来很简单的自同态。相比于简单的double-and-add，Halo中所需的运算reduce为 $\lambda/2$ 个addition和 $\lambda/2$ 个doubling运算。

同时，也可将以上 $Q$ 的计算公式看成是某种形式的multi-scalar multiplication运算，其具有 $2 k$ 个terms，最后再加上 $P^{'}$ 。通过简单的、circuit-friendly MSM（multi-scalar multiplication）优化，可同时为所有terms进行doubling运算（具体参看simultaneous squaring）。从而将计算开销降为 $k\lambda$ 个加法和 $\lambda/2$ 个doubling运算，或者降为 $(k+1/2)\lambda$ 个group operations。

4. 推广Plonk中的circuit model

标准的Plonk circuit model 为：

每个gate具有3根wire—— $a, b, c$
为每个gate赋予constraint的形式为： $q_La+q_Rb+q_Oc+q_Mab+q_C=0$ ，其中 $q$ 为gate的配置参数，如对于multiplicative constraint $a b = c$ ，需配置 $q_M=1,q_C=-1$ ，其它 $q$ 值均为0。

Plonk的这种circuit model非常简单友好，但是会导致相当大的gate数量。如，对于curve operations，需要大概7个gate，具体取决于the curve and completeness assumptions。但是，幸亏Plonk的basic scheme非常灵活，可通过多种方式来减少gate数量：

方法1：采用higher-arity gate。如若想以1个gate来表示1个curve operation，可设置其gate arity为6。对于Prover来说，其每个gate的计算开销将增加，但将大大减少特定operation的gate数量。
方法2：不应局限于单个的constraint。初看，增加多个constraints，Prover将需要进行更多的FFT运算。令 $d$ 为任一constraint的最高degree，若将每个polynomial的degree 提升至最大的 $d$ ，则所有constraint相关的arithmetic可以point-value的形式进行，而不需要额外的FFT运算。
方法3：并不是所有的wires都需要routed。“Advice” wires对于宣称的逆运算或中间结果有用。Advice wires并不会对Plonk的permutation argument 的degree产生影响。Plonk的permutation argument通常贡献了整个Plonk-based construction中的最高degree polynomial。
方法4：除challenge point $x$ 之外，Plonk还会open每个polynomial at a “shifted” point $g x$ ，因此，constraints可operate on the wires of the “next gate” in addition to the “local gate”。这是 TurboPlonk 的主要发现。在此基础之上，为了减少Plonk permutation argument中“copying” wires的数量，Plonky额外增加了shifted openings。

5. Elliptic curve gates

在之前介绍的MSM（multi-scalar multiplication）中，大多数步骤中都包含了conditionally negating a point、conditionally applying the endomorphism $\phi((x,y))=(\zeta x,y)$ ，然后将该修改后的point加到累加器accumulator中。整个过程可表示为：
$x_1\leftarrow (1+(\zeta-1)r_{high})x,$
$y_1\leftarrow (2r_{low}-1)y,$
$(x_3,y_3)\leftarrow (x_1,y_1)+(x_2,y_2).$
其中， $(x, y)$ 为the point being multiplied， $x_1,y_1)$ 为the point to be added to the accumulator， $x_2,y_2)$ 为the old state of the accumulator， $x_3,y_3)$ 为相应的updated state， $r_{high},r_{low}$ 为two consecutive bits of the scalar。

5.1 明确的公式描述

affine坐标系下的Weierstrass curve，其incomplete加法运算可表示为：（不考虑 $x_1=x_2$ 等特殊情况）
$inv=1/(x_1-x_2),$
$\lambda = (y_1-y_2)\cdot inv,$
$x_3=\lambda^2-x_1-x_2,$
$y_3=\lambda(x_1-x_3)-y_1.$

将以上计算arithmetize化的一种简单方法是为中间结果（如 $\lambda$ ）引入advice wire。特别地，相应的gate可以如下方式定义：

Routed wires有： $x,y,x_2,y_2,x_3,y_3,r_{low},r_{high}$
Advice wires有： $x_1,y_1,inv,\lambda$
Contraints有：
$r_{low}(r_{low}-1)=0,$
$r_{high}(r_{high}-1)=0,$
$x_1=(1+(\zeta-1)r_{high})x,$
$y_1=(2r_{low}-1)y,$
$(x_1-x_2)\cdot inv=1,$
$\lambda = (y_1-y_2)\cdot inv,$
$x_3=\lambda^2-x_1-x_2,$
$y_3=\lambda(x_1-x_3)-y_1.$

这看起来可能有很多constraints，但low-degree constraints 对性能影响不大，因为它们只需要在单个challenge point进行check。

5.2 特殊情况下的公式描述

5.1中针对的是 $x_1\neq x_2$ 的情况。在Halo的MSM（multi-scalar multiplication）中指出，恶意的Prover通过在相邻的两轮中发送相同的 $L_j$ 值即可打破 $x_1\neq x_2$ 的假设。因此，从安全角度来说，我们必须能验证确认 $x_1\neq x_2$ ——》通过 $(x_1-x_2)\cdot inv=1$ constraint即可实现。

而对于honest Prover，每一轮的 $L_j$ 都是独立随机的，因为其嵌入了随机group element $l_j]H$ 。对于honest Prover， $P(x_1=x_2)$ 的概率为 $2^{-|F|}$ ，因此在实际协议实现时，可简单地禁用该情况，由此引入的completeness error为negligible的。

5.3 进一步优化

进一步可在如下维度进行优化：

实际上，可不需要如此多的advice wires，因为每根wire都要求Prover计算一个polynomial commitment并增加argument length。 $x_1,y_1,\lambda$ 可直接内联进去。
进一步地，可通过constrain the accumulator wires of the “next” gate，来将accumulator $x_2,y_2)$ 和 $x_3,y_3)$ 结合起来。
最后，Plonky的实际实现中，仅使用一个单独的gate就可表达以上curve operations，同时可验证每个scalar的decomposition。

6. 使用Rescue来生成challenge

在recursive circuit中的另一个性能瓶颈来自于——生成Fiat-Shamir challenge。
为了减少生成Fiat-Shamir challenge的开销，根据 Daira Hopwood的建议，Plonky中采用duplex with Rescue作为底层的permutation实现。

令 $M=\begin{pmatrix} A & B\\ C & D \end{pmatrix}$ 为MDS矩阵，单轮的width-2 Rescue permutation可定义为：
$step_{i,1}\begin{pmatrix} \begin{bmatrix} x_1\\ x_2 \end{bmatrix} \end{pmatrix}=M\begin{bmatrix} x_1^{1/\alpha}\\ x_2^{1/\alpha} \end{bmatrix} + \begin{bmatrix} r_{i,1}\\ r_{i,2} \end{bmatrix}$
$step_{i,2}\begin{pmatrix} \begin{bmatrix} x_1\\ x_2 \end{bmatrix} \end{pmatrix}=M\begin{bmatrix} x_1^{\alpha}\\ x_2^{\alpha} \end{bmatrix} + \begin{bmatrix} r_{i,3}\\ r_{i,4} \end{bmatrix}$
$round_i=step_{i,2}\circ step_{i,1}$
其中 $r_{i,1}\cdots r_{i,4}$ 为round constants。

deterministically计算 $\alpha$ -th root是expensive的，可由Prover通过增加advice wire $y_1,y_2$ 的方式来提供。
令 $z_1,z_2$ 为round function的输出，则相应的gate可定义为：

Routed wire有： $x_1,x_2,z_1,z_2$
Advice wire有： $y_1,y_2$
Contraints有：
$x_1=y_1^{\alpha}$
$x_2=y_2^{\alpha}$
$z_1=A(Ay_1+By_2+r_{i,1})^{\alpha}+B(Cy_1+Dy_2+r_{i,2})^{\alpha}+r_{i,3}$
$z_2=C(Ay_1+By_2+r_{i,1})^{\alpha}+D(Cy_1+Dy_2+r_{i,2})^{\alpha}+r_{i,4}$

尽管有可能在单个gate中实现多轮Resuce运算，但是实际上我们会限制每个gate中constants的数量。在单轮Resuce运算中，已包含4个constants $r_{i,1},\cdots,r_{i,4}$ ，若再增加更多的constants，意味着需要对更多的preprocessed polynomial 进行challenge open操作。（adding more would mean more preprocessed polynomials which must be opened at a challenge point。）

为了进一步优化，可在Rescue gate中插入不需要constant的gate，如curve operation gate。这样，每个Rescue gate可利用其邻居的unused constant slots。同时，作为备选方案之一，也可使用不需要配置那么多constant的其它key schedule。

7. 统一的constraint set

迄今为止，已定义了不同gate类型的constraint表示，但是，在Plonk中只设定了一组constraint。因此，需要借助“0”或“1”的“filter”来将以上constraint 组合起来，以明确特定gate的特定constraint。Plonky中统一的constraint set是将这些contraint简单的进行filter求和。

最简单的实现方式是为每种gate类型定义一个constant polynomial，如is_resuce_gate定义为is_rescue_gate(g^i)=1 if and only if gate i is a Rescue gate。
同时，也可以有各种自定义的gate类型，但是，实际为了减少constant polynomial的数量，会控制gate类型的数量。

Plonky中以binary tree的形式来管理gates：
在这里插入图片描述
同时，为tree 的每一层额外引入constant polynomial $C_i$ ，根据其在tree的左端还是右端，分别设置其值为0或1。
如某arithmetic gate的path表示为 $1001$ ，则其constraint对应的filter为 $C_1(x)(1-C_2(x))(1-C_3(x))C_4(x)$ 。

不同的gate在binary tree中有不同的depth，主要原因为：

smaller depth对应为lower-degree filter polynomial。为了使最大degree尽可能小，某些具有higer-degree constraint的gate，应对其赋予lower-degree filter。
任何不在filter polynomial中使用的constant，这些constant都可用于gate configuration。如某arithmetic gate的filter中仅包含了 $C_1,\cdots,C_4$ ，则其可使用 $C_5,C_6$ 来定义其arithmetic type。

8. 未来的优化方向

当前Plonky在专注于优化recursive circuit size的同时，也致力于提升Plonky中primitives的性能。
Plonky采用纯Rust语言编写，Plonky希望在X86系统上借助 carry chain optimizations 来加速，但是Rust compiler当前并不支持。

Plonky中的proving time主要由multi-scalar multiplication占据，Plonky中的multi-scalar multiplication实现采用 Yao算法的一种变种。其性能要优于Pippinger算法，特别是对于包含variable-base MSM的IPA reduction场景。

Daira Hopwood也提供了另一种潜在的优化方案，相比于applying the endomorphism zero or one times based on a bit of the scalar，可apply it zero or one or two times based on a base-3 limb of the scalar。这可将iteration数量由64降为50，同时可保证其一一对应关系，尽管相应的证明可能更复杂。

参考资料

[1] mir团队blog Fast recursive arguments based on Plonk and Halo
[2] https://github.com/mir-protocol/plonky
[3] arkworks讨论 Plonk、TurboPLONK、PLOOKUP实现等

mutourend

关注

2
点赞
踩
5

收藏

觉得还不错? 一键收藏
0
评论
Plonky = Plonk + Halo + Rescue

参考资料[1] Fast recursive arguments based on Plonk and Halo[2] https://github.com/mir-protocol/plonky
复制链接

扫一扫

专栏目录