https://github.com/haskell-crypto/cryptonite 为haskell 实现的密码学库,其中对密码学常用到的加法和乘法等运算(如pointAdd等),主要集中在cryptonite/Crypto/PubKey/ECC/Prim.hs
和cryptonite/Crypto/PubKey/ECC/P256.hs
源文件内。
本文分为三部分内容:
- Prim.hs中的密码学运算实现
- P256.hs中的密码学运算实现
- 两者的pointAdd和pointMul性能对比(值得留意的是P256.pointAdd---- 1.131 ms VS ECC.pointAdd----3.961 μs)
1 Prim.hs中的密码学运算实现
Prim.hs中主要暴露的接口有:
- scalarGenerate
- pointAdd
- pointNegate
- pointDouble
- pointBaseMul
- pointMul
- pointAddTwoMuls
- isPointAtInfinity
- isPointValid
以最常用的pointAdd为例,其源码实现为:
pointAdd :: Curve -> Point -> Point -> Point
pointAdd _ PointO PointO = PointO
pointAdd _ PointO q = q
pointAdd _ p PointO = p
pointAdd c p q
| p == q = pointDouble c p
| p == pointNegate c q = PointO
pointAdd (CurveFP (CurvePrime pr _)) (Point xp yp) (Point xq yq)
= fromMaybe PointO $ do
s <- divmod (yp - yq) (xp - xq) pr
let xr = (s ^ (2::Int) - xp - xq) `mod` pr
yr = (s * (xp - xr) - yp) `mod` pr
return $ Point xr yr
直接为数学公式计算,并未做算法优化,性能表现为ms级别。
2 P256.hs中的密码学运算实现
P256.hs中暴露的接口比Prime.hs中更多:
- pointBase
- pointAdd
- pointNegate
- pointMul
- pointDh
- pointsMulVarTime
- pointIsValid
- toPoint
- pointToIntegers
- pointFromIntegers
- pointToBinary
- pointFromBinary
- unsafePointFromBinary
- scalarGenerate
- scalarZero
- scalarIsZero
- scalarAdd
- scalarSub
- scalarInv
- scalarCmp
- scalarFromBinary
- scalarToBinary
- scalarFromInteger
- scalarToInteger
仍然以最常用的pointAdd为例,其源码实现为:
toPoint :: Scalar -> Point
toPoint s
| scalarIsZero s = error "cannot create point from zero"
| otherwise =
withNewPoint $ \px py -> withScalar s $ \p ->
ccryptonite_p256_basepoint_mul p px py
-- | Add a point to another point
pointAdd :: Point -> Point -> Point
pointAdd a b = withNewPoint $ \dx dy ->
withPoint a $ \ax ay -> withPoint b $ \bx by ->
ccryptonite_p256e_point_add ax ay bx by dx dy
/* this function is not part of the original source
add 2 points together. so far untested.
probably vartime, as it use point_add_or_double_vartime
*/
void cryptonite_p256e_point_add(
const cryptonite_p256_int *in_x1, const cryptonite_p256_int *in_y1,
const cryptonite_p256_int *in_x2, const cryptonite_p256_int *in_y2,
cryptonite_p256_int *out_x, cryptonite_p256_int *out_y)
{
felem x1, y1, z1, x2, y2, z2, px1, py1, px2, py2;
const cryptonite_p256_int one = P256_ONE;
to_montgomery(px1, in_x1);
to_montgomery(py1, in_y1);
to_montgomery(px2, in_x2);
to_montgomery(py2, in_y2);
scalar_mult(x1, y1, z1, px1, py1, &one);
scalar_mult(x2, y2, z2, px2, py2, &one);
point_add_or_double_vartime(x1, y1, z1, x1, y1, z1, x2, y2, z2);
point_to_affine(px1, py1, x1, y1, z1);
from_montgomery(out_x, px1);
from_montgomery(out_y, py1);
}
采用了Montgomery算法进行了优化。
3 Prim.hs和P256.hs的pointAdd和pointMul性能对比
对应的bench代码为:
benchECC =
[ bench "pointAddTwoMuls-baseline" $ nf run_b (n1, p1, n2, p2)
, bench "pointAddTwoMuls-optimized" $ nf run_o (n1, p1, n2, p2)
, bench "pointAdd-ECC" $ nf run_c (p1, p2)
, bench "pointMul-ECC" $ nf run_d (n1, p2)
]
where run_b (n, p, k, q) = ECC.pointAdd c (ECC.pointMul c n p)
(ECC.pointMul c k q)
run_o (n, p, k, q) = ECC.pointAddTwoMuls c n p k q
run_c (p, q) = ECC.pointAdd c p q
run_d (n, p) = ECC.pointMul c n p
c = ECC.getCurveByName ECC.SEC_p256r1
r1 = 7
r2 = 11
-- p1 = ECC.pointBaseMul c r1
-- p2 = ECC.pointBaseMul c r2
p1 = ECC.pointBaseMul c n1
p2 = ECC.pointBaseMul c n2
n1 = 0x2ba9daf2363b2819e69b34a39cf496c2458a9b2a21505ea9e7b7cbca42dc7435
n2 = 0xf054a7f60d10b8c2cf847ee90e9e029f8b0e971b09ca5f55c4d49921a11fadc1
benchP256 =
[ bench "pointAddTwoMuls-P256" $ nf run_p (n1, s, n2, t)
, bench "pointAdd-P256" $ nf run_q (s, t)
, bench "pointMul-P256" $ nf run_t (n1, s)
]
where run_p (n1, s, n2, t) = P256.pointAdd (P256.pointMul n1 s) (P256.pointMul n2 t)
run_q (s, t) = P256.pointAdd s t
run_t (n1, s) = P256.pointMul n1 s
xS = 0xde2444bebc8d36e682edd27e0f271508617519b3221a8fa0b77cab3989da97c9
yS = 0xc093ae7ff36e5380fc01a5aad1e66659702de80f53cec576b6350b243042a256
xT = 0x55a8b00f8da1d44e62f6b3b25316212e39540dc861c89575bb8cf92e35e0986b
yT = 0x5421c3209c2d6c704835d82ac4c3dd90f61a8a52598b9e7ab656e9d8c8b24316
s = P256.pointFromIntegers (xS, yS)
t = P256.pointFromIntegers (xT, yT)
r1 =
case P256.scalarFromInteger 7 of
CryptoFailed err -> error ("cannot convert scalar: " ++ show err)
CryptoPassed scalar -> scalar
r2 =
case P256.scalarFromInteger 11 of
CryptoFailed err -> error ("cannot convert scalar: " ++ show err)
CryptoPassed scalar -> scalar
-- s = P256.pointMul r1 P256.pointBase
-- t = P256.pointMul r2 P256.pointBase
n1 =
let a = 0x2ba9daf2363b2819e69b34a39cf496c2458a9b2a21505ea9e7b7cbca42dc7435
in case P256.scalarFromInteger a of
CryptoFailed err -> error ("cannot convert scalar: " ++ show err)
CryptoPassed scalar -> scalar
n2 =
let b = 0xf054a7f60d10b8c2cf847ee90e9e029f8b0e971b09ca5f55c4d49921a11fadc1
in case P256.scalarFromInteger b of
CryptoFailed err -> error ("cannot convert scalar: " ++ show err)
CryptoPassed scalar -> scalar
stack bench
对应的bench运行结果为:
benchmarked ECC/pointAddTwoMuls-baseline
time 5.404 ms (4.558 ms .. 6.660 ms)
0.826 R² (0.757 R² .. 0.992 R²)
mean 5.183 ms (4.837 ms .. 5.757 ms)
std dev 1.291 ms (822.2 μs .. 1.849 ms)
variance introduced by outliers: 91% (severely inflated)
benchmarked ECC/pointAddTwoMuls-optimized
time 2.543 ms (2.432 ms .. 2.654 ms)
0.985 R² (0.972 R² .. 0.995 R²)
mean 3.422 ms (2.941 ms .. 4.578 ms)
std dev 2.547 ms (983.8 μs .. 5.538 ms)
variance introduced by outliers: 98% (severely inflated)
benchmarked ECC/pointAdd-ECC
time 3.961 μs (3.943 μs .. 3.979 μs)
1.000 R² (0.999 R² .. 1.000 R²)
mean 3.988 μs (3.974 μs .. 4.013 μs)
std dev 61.50 ns (40.65 ns .. 100.1 ns)
benchmarked ECC/pointMul-ECC
time 2.227 ms (2.148 ms .. 2.295 ms)
0.990 R² (0.979 R² .. 0.998 R²)
mean 2.578 ms (2.404 ms .. 2.878 ms)
std dev 823.7 μs (461.4 μs .. 1.175 ms)
variance introduced by outliers: 96% (severely inflated)
benchmarked P256/pointAddTwoMuls-P256
time 2.903 ms (1.650 ms .. 3.975 ms)
0.675 R² (0.595 R² .. 0.979 R²)
mean 2.349 ms (2.020 ms .. 3.197 ms)
std dev 1.698 ms (762.4 μs .. 3.235 ms)
variance introduced by outliers: 98% (severely inflated)
benchmarked P256/pointAdd-P256
time 1.131 ms (798.2 μs .. 1.462 ms)
0.742 R² (0.680 R² .. 0.964 R²)
mean 845.9 μs (778.8 μs .. 974.9 μs)
std dev 300.5 μs (154.5 μs .. 482.7 μs)
variance introduced by outliers: 97% (severely inflated)
benchmarked P256/pointMul-P256
time 796.3 μs (541.4 μs .. 1.078 ms)
0.633 R² (0.472 R² .. 0.772 R²)
mean 730.9 μs (634.6 μs .. 837.6 μs)
std dev 317.4 μs (254.0 μs .. 402.4 μs)
variance introduced by outliers: 97% (severely inflated)