隐私计算（五）：隐私求交和隐语PSI的介绍及开发实践

小墨Sang

已于 2024-03-27 15:22:06 修改

阅读量1k

点赞数 22

文章标签：隐私保护

于 2024-03-27 15:21:44 首次发布

本文链接：https://blog.csdn.net/xmosang/article/details/137078516

版权

SPU实现的PSI介绍

PSI：安全求交集 Private Set Intersection (PSl)

一种特殊的安全多方计算(MPC)协议
Alice持有集合X，Bob持有集合Y
Alice和Bob通过执行PSI协议，得到交集结果X∩Y
除交集外不会泄漏交集外的其它信息

PSI分类

2-Party/Multi-Party PSI
Balanced/Unbalanced PSI
Semi-honest/Malicious PSI
PSI with computation:
- PSI-CA (Cardinality)
- PSI-Payload Analytics
- Circuit PS

隐语PSI功能分层

SPU实现的PSI

半诚实模型
- 两方
  - ecdh、kkrt16、bc22(pcg-psi)
  - ec-oprf PSl (Unbalanced Psl)
  - dp-psi
- 多方
  - ecdh-3-party(可扩展到多方)
恶意模型
- mini-PSI(适合小数据集)

ecdh-PSI

Simple to understand and to explain (to your managers)
Simple to implement
Best Communication cost, high Computation cost
Easy to extend
Can be modified to compute intersection size (PS|-CA)>Google private join and compute; Facebook Private-lD

KKRT16-PSI

论文主要贡献
- Propose a novel extension to lKNP and KK OT protocol achieve an 1-out-of-n OT , for arbitrarily large n
- Batch， Related-keyOPRF (BaRK-OPRF)
优点:运行时间快
- 3.8s 2^20(百万)
- 1m 2^24(1.6千万)最新PSI论文中比较的基准
缺点:
- 内存占用大
- 通信量大

BS22 PCG PSI

基于SVOLE的BaRK-OPRF
Generalized Cuckoo Hash
Permutation-Based Hashing

Unbalanced PSI: ec-oprf based

Unbalanced PSI: SHE-based

APS1与ec-oprf PSl比较:

优点:

不需要将大数据方的数据传输到小数据方
缺点：

计算量大，运行时间长

基于ecdh的三方PSI协议

协议流程

Alice和Bob先进行交互，得到shuffle后的两方交集
Alice将shuffle后两方交集，发给Charlie
Charlie加密后的数据依次给Bob和Alice加密
Charlie比较密态数据，得到交集

优点：

基于ecdh-psi，协议简单易于实现

缺点：

泄漏Alice和Bob两方交集数量

SPU PSI调度封装

SecretFlow psi_csv python api

统一入口
- 入口函数:bucket_psi
支持分桶求交
- 通过分桶支持大规模数据(10亿规模)
输入输出处理
- 检查求交id列是否数据是否完整
- 检查是否有重复项
输出处理
- 支持按求交id列排序
- 输出完整label列

bucket_psi:高级API，通过Hash分桶支持海量数据，覆盖生产级全流程(数据查重、分桶求交、结果广播、结果排序)。
mem psi:低级API，算法内核级的性能+统一易用的接口
Operator:算法接入层，向上提供统一接口接入工程化封装;注册工厂模式，提升协议工程化效率

PSI执行步骤

启动ray集群

alice首先启动ray集群。注意这里的命令是启动Ray的主节点。

ray start --head --node-ip-address="ip" --port="port" --include-dashboard=False--disable-usage-stats

bob首先启动ray集群

ray start --head --node-ip-address="ip"--port="port"--includedashboard=False--disable-usage-stats

初始化SecretFlow

sf _cluster_config ={
    'parties': {
        'alice': {
            # replace with alice's real address.
            'address': "ip:port of alice',
            'listen addr': '0.0.0.0:port
        },
        'bob': {
            # replace with bob's real address.
            'address': 'ip:port of bob',
            "listen _addr': '0.0.0.0:port
        },
    },
'self_party': 'bob'
}

tls_config ={
    "ca cert": "ca root cert of other parties ",
    "cert": "server cert of alice in pem",
    "key": "server key of alice in pem"
}

sf.init(address='alice ray head node address', cluster _config=sf_cluster _configtls config=tls config)
sf.init(address='bob ray head node address', cluster config=sf cluster_config,tls config=tls config )

启动spu设备

spu_cluster_def ={
    'nodes': [
        #<<< !!! >>> replace <192.168.0.1:12945> to alice node's local ip & free port
        {'party': 'alice','address':'192.168.0.1:12945',"isten_address': '0.0.0.0:12945'
        },
         #<<<!!! >>>replace <192.168.0.2:12946> to bob node's local ip & free port
        {'party': 'bob', 'address':'192.168.0.2:12946',"listen address': '0.0.0.0:12946'
        },
    ],
    'runtime_config': {
        'protocol':spu.spu pb2.SEMI2K,
        'field': spu.spu pb2.FM128,
    },
}
spu = sf.SPU(spu_cluster_def)

执行PSI

reports=spu.psi_csv(
    key=select keys,
    input _path=input _path,
    output_path=output_path,
    receiver='alice', # receiver get output file.
    # psi protocOl KKRT PSI 2PC, BC22 PSI 2PC
    protocol='ECDH_PSI_2PC'
    curve _type='CURVE 25519','CURVE FOURQ','CURVE SM2'
    precheck input=False, # check inputfile duplicate entries
    sort=False, #sort intersection by key ids
    broadcast result=False,#true receiver send intersection toother parties

#reports结构
# 输入数据量总数
    int64 original count = 1;
# 交接结果
    int64 intersection count = 2;
    
#PSI交集输出
    output_path ={
        alice:'/data/psi _output.csv',#节点alice端的输出
        bob:'/data/psi_output_bob.csv',#节点bob端的输出
    }

PSI测试

初始化SecretFlow

alice 和 bob 节点都需要初始化 secretflow。首先在两个节点分别选取一个可以被对方访问的地址，注意，端口号要选取未被占用的端口。

import secretflow as sf
import spu
import os

network_conf = {
    "parties": {
        "alice": {
            "address": "alice:8000",
        },
        "bob": {
            "address": "bob:8000",
        },
    },
}

party = os.getenv("SELF_PARTY", "alice")
sf.shutdown()
sf.init(
    address="127.0.0.1:6379",
    cluster_config={**network_conf, "self_party": party},
    log_to_driver=True,
)

初始化SPU

alice 的 address 请填写可以被 bob 访通的地址，并且选择一个未被占用的端口 ，注意不要和 Ray 端口冲突。
alice 的 listen_addr 可以和 alice address 里的端口一样。
bob 的 address 请填写可以被 alice 访通的地址，并且选择一个未被占用的端口 ，注意不要和 Ray 端口冲突。
bob 的 listen_addr 可以和 bob address 里的端口一样。

alice, bob = sf.PYU("alice"), sf.PYU("bob")
spu_conf = {
    "nodes": [
        {
            "party": "alice",
            "address": "alice:8001",
            "listen_addr": "alice:8001",
        },
        {
            "party": "bob",
            "address": "bob:8001",
            "listen_addr": "bob:8001",
        },
    ],
    "runtime_config": {
        "protocol": spu.spu_pb2.SEMI2K,
        "field": spu.spu_pb2.FM128,
        "sigmoid_mode": spu.spu_pb2.RuntimeConfig.SIGMOID_REAL,
    },
}
spu = sf.SPU(cluster_def=spu_conf)

隐私求交

SecretFlow 提供 psi_csv函数，psi_csv 将 csv文件作为输入，并在求交后生成 csv 文件。默认协议为KKRT

current_dir = os.getcwd()

input_path = {
    alice: f"{current_dir}/payment.csv",
    bob: f"{current_dir}/record.csv",
}
output_path = {
    alice: f"{current_dir}/payment_output.csv",
    bob: f"{current_dir}/record_output.csv",
}
spu.psi_csv("uid", input_path, output_path, "alice")
sf.shutdown()

小墨Sang

关注

22
点赞
踩
22

收藏

觉得还不错? 一键收藏
0
评论
隐私计算（五）：隐私求交和隐语PSI的介绍及开发实践

PSI：安全求交集 Private Set Intersection (PSl)一种特殊的安全多方计算(MPC)协议Alice持有集合X，Bob持有集合YAlice和Bob通过执行PSI协议，得到交集结果X∩Y除交集外不会泄漏交集外的其它信息PSI分类Circuit PS隐语PSI功能分层SPU实现的PSI半诚实模型两方dp-psi多方ecdh-3-party(可扩展到多方)恶意模型mini-PSI(适合小数据集)ecdh-PSIKKRT16-PSI论文主要贡献。
复制链接

扫一扫