Haskell crypto库性能简析

最新推荐文章于 2024-06-05 09:37:06 发布

mutourend

最新推荐文章于 2024-06-05 09:37:06 发布

阅读量392

点赞数

本文链接：https://blog.csdn.net/mutourend/article/details/95365692

版权

https://github.com/haskell-crypto/cryptonite 为haskell 实现的密码学库，其中对密码学常用到的加法和乘法等运算（如pointAdd等），主要集中在cryptonite/Crypto/PubKey/ECC/Prim.hs 和cryptonite/Crypto/PubKey/ECC/P256.hs源文件内。

本文分为三部分内容：

Prim.hs中的密码学运算实现
P256.hs中的密码学运算实现
两者的pointAdd和pointMul性能对比（值得留意的是P256.pointAdd---- 1.131 ms VS ECC.pointAdd----3.961 μs）

1 Prim.hs中的密码学运算实现

Prim.hs中主要暴露的接口有：

scalarGenerate
pointAdd
pointNegate
pointDouble
pointBaseMul
pointMul
pointAddTwoMuls
isPointAtInfinity
isPointValid

以最常用的pointAdd为例，其源码实现为：

pointAdd :: Curve -> Point -> Point -> Point
pointAdd _ PointO PointO = PointO
pointAdd _ PointO q = q
pointAdd _ p PointO = p
pointAdd c p q
  | p == q = pointDouble c p
  | p == pointNegate c q = PointO
pointAdd (CurveFP (CurvePrime pr _)) (Point xp yp) (Point xq yq)
    = fromMaybe PointO $ do
        s <- divmod (yp - yq) (xp - xq) pr
        let xr = (s ^ (2::Int) - xp - xq) `mod` pr
            yr = (s * (xp - xr) - yp) `mod` pr
        return $ Point xr yr

直接为数学公式计算，并未做算法优化，性能表现为ms级别。
在这里插入图片描述

2 P256.hs中的密码学运算实现

P256.hs中暴露的接口比Prime.hs中更多：

pointBase
pointAdd
pointNegate
pointMul
pointDh
pointsMulVarTime
pointIsValid
toPoint
pointToIntegers
pointFromIntegers
pointToBinary
pointFromBinary
unsafePointFromBinary
scalarGenerate
scalarZero
scalarIsZero
scalarAdd
scalarSub
scalarInv
scalarCmp
scalarFromBinary
scalarToBinary
scalarFromInteger
scalarToInteger

仍然以最常用的pointAdd为例，其源码实现为：

toPoint :: Scalar -> Point
toPoint s
    | scalarIsZero s = error "cannot create point from zero"
    | otherwise      =
        withNewPoint $ \px py -> withScalar s $ \p ->
            ccryptonite_p256_basepoint_mul p px py

-- | Add a point to another point
pointAdd :: Point -> Point -> Point
pointAdd a b = withNewPoint $ \dx dy ->
    withPoint a $ \ax ay -> withPoint b $ \bx by ->
        ccryptonite_p256e_point_add ax ay bx by dx dy


/* this function is not part of the original source
   add 2 points together. so far untested.
   probably vartime, as it use point_add_or_double_vartime
 */
void cryptonite_p256e_point_add(
    const cryptonite_p256_int *in_x1, const cryptonite_p256_int *in_y1,
    const cryptonite_p256_int *in_x2, const cryptonite_p256_int *in_y2,
    cryptonite_p256_int *out_x, cryptonite_p256_int *out_y)
{
    felem x1, y1, z1, x2, y2, z2, px1, py1, px2, py2;
    const cryptonite_p256_int one = P256_ONE;

    to_montgomery(px1, in_x1);
    to_montgomery(py1, in_y1);
    to_montgomery(px2, in_x2);
    to_montgomery(py2, in_y2);

    scalar_mult(x1, y1, z1, px1, py1, &one);
    scalar_mult(x2, y2, z2, px2, py2, &one);
    point_add_or_double_vartime(x1, y1, z1, x1, y1, z1, x2, y2, z2);

    point_to_affine(px1, py1, x1, y1, z1);
    from_montgomery(out_x, px1);
    from_montgomery(out_y, py1);
}

采用了Montgomery算法进行了优化。

3 Prim.hs和P256.hs的pointAdd和pointMul性能对比

对应的bench代码为：

benchECC =
    [ bench "pointAddTwoMuls-baseline"  $ nf run_b (n1, p1, n2, p2)
    , bench "pointAddTwoMuls-optimized" $ nf run_o (n1, p1, n2, p2)
    , bench "pointAdd-ECC" $ nf run_c (p1, p2)
    , bench "pointMul-ECC" $ nf run_d (n1, p2)
    ]
  where run_b (n, p, k, q) = ECC.pointAdd c (ECC.pointMul c n p)
                                            (ECC.pointMul c k q)

        run_o (n, p, k, q) = ECC.pointAddTwoMuls c n p k q
        run_c (p, q) = ECC.pointAdd c p q
        run_d (n, p) = ECC.pointMul c n p

        c  = ECC.getCurveByName ECC.SEC_p256r1
        r1 = 7
        r2 = 11
        -- p1 = ECC.pointBaseMul c r1
        -- p2 = ECC.pointBaseMul c r2
        p1 = ECC.pointBaseMul c n1
        p2 = ECC.pointBaseMul c n2
        n1 = 0x2ba9daf2363b2819e69b34a39cf496c2458a9b2a21505ea9e7b7cbca42dc7435
        n2 = 0xf054a7f60d10b8c2cf847ee90e9e029f8b0e971b09ca5f55c4d49921a11fadc1

benchP256 =
    [ bench "pointAddTwoMuls-P256"  $ nf run_p (n1, s, n2, t)
    , bench "pointAdd-P256"  $ nf run_q (s, t) 
    , bench "pointMul-P256"  $ nf run_t (n1, s) 
    ]
  where run_p (n1, s, n2, t) = P256.pointAdd (P256.pointMul n1 s) (P256.pointMul n2 t)
        run_q (s, t) = P256.pointAdd s t
        run_t (n1, s) = P256.pointMul n1 s

        xS = 0xde2444bebc8d36e682edd27e0f271508617519b3221a8fa0b77cab3989da97c9
        yS = 0xc093ae7ff36e5380fc01a5aad1e66659702de80f53cec576b6350b243042a256
        xT = 0x55a8b00f8da1d44e62f6b3b25316212e39540dc861c89575bb8cf92e35e0986b
        yT = 0x5421c3209c2d6c704835d82ac4c3dd90f61a8a52598b9e7ab656e9d8c8b24316
        s = P256.pointFromIntegers (xS, yS)
        t = P256.pointFromIntegers (xT, yT)
        r1 = 
            case P256.scalarFromInteger 7 of 
                CryptoFailed err    -> error ("cannot convert scalar: " ++ show err)
                CryptoPassed scalar -> scalar

        r2 = 
            case P256.scalarFromInteger 11 of 
                CryptoFailed err    -> error ("cannot convert scalar: " ++ show err)
                CryptoPassed scalar -> scalar

        -- s = P256.pointMul r1 P256.pointBase
        -- t = P256.pointMul r2 P256.pointBase
        n1 = 
            let a = 0x2ba9daf2363b2819e69b34a39cf496c2458a9b2a21505ea9e7b7cbca42dc7435
             in case P256.scalarFromInteger a of
                            CryptoFailed err    -> error ("cannot convert scalar: " ++ show err)
                            CryptoPassed scalar -> scalar
        n2 = 
            let b = 0xf054a7f60d10b8c2cf847ee90e9e029f8b0e971b09ca5f55c4d49921a11fadc1
             in case P256.scalarFromInteger b of
                            CryptoFailed err    -> error ("cannot convert scalar: " ++ show err)
                            CryptoPassed scalar -> scalar

stack bench

对应的bench运行结果为：

benchmarked ECC/pointAddTwoMuls-baseline
time                 5.404 ms   (4.558 ms .. 6.660 ms)
                     0.826 R²   (0.757 R² .. 0.992 R²)
mean                 5.183 ms   (4.837 ms .. 5.757 ms)
std dev              1.291 ms   (822.2 μs .. 1.849 ms)
variance introduced by outliers: 91% (severely inflated)

benchmarked ECC/pointAddTwoMuls-optimized
time                 2.543 ms   (2.432 ms .. 2.654 ms)
                     0.985 R²   (0.972 R² .. 0.995 R²)
mean                 3.422 ms   (2.941 ms .. 4.578 ms)
std dev              2.547 ms   (983.8 μs .. 5.538 ms)
variance introduced by outliers: 98% (severely inflated)

benchmarked ECC/pointAdd-ECC
time                 3.961 μs   (3.943 μs .. 3.979 μs)
                     1.000 R²   (0.999 R² .. 1.000 R²)
mean                 3.988 μs   (3.974 μs .. 4.013 μs)
std dev              61.50 ns   (40.65 ns .. 100.1 ns)

benchmarked ECC/pointMul-ECC
time                 2.227 ms   (2.148 ms .. 2.295 ms)
                     0.990 R²   (0.979 R² .. 0.998 R²)
mean                 2.578 ms   (2.404 ms .. 2.878 ms)
std dev              823.7 μs   (461.4 μs .. 1.175 ms)
variance introduced by outliers: 96% (severely inflated)

benchmarked P256/pointAddTwoMuls-P256
time                 2.903 ms   (1.650 ms .. 3.975 ms)
                     0.675 R²   (0.595 R² .. 0.979 R²)
mean                 2.349 ms   (2.020 ms .. 3.197 ms)
std dev              1.698 ms   (762.4 μs .. 3.235 ms)
variance introduced by outliers: 98% (severely inflated)

benchmarked P256/pointAdd-P256
time                 1.131 ms   (798.2 μs .. 1.462 ms)
                     0.742 R²   (0.680 R² .. 0.964 R²)
mean                 845.9 μs   (778.8 μs .. 974.9 μs)
std dev              300.5 μs   (154.5 μs .. 482.7 μs)
variance introduced by outliers: 97% (severely inflated)

benchmarked P256/pointMul-P256
time                 796.3 μs   (541.4 μs .. 1.078 ms)
                     0.633 R²   (0.472 R² .. 0.772 R²)
mean                 730.9 μs   (634.6 μs .. 837.6 μs)
std dev              317.4 μs   (254.0 μs .. 402.4 μs)
variance introduced by outliers: 97% (severely inflated)