隐私计算实训营第二期第11讲组件介绍与自定义开发-CSDN博客

本文链接：https://blog.csdn.net/TianxiaZhu824/article/details/140046679

首先给大家推荐一个博主的笔记：隐语课程学习笔记11- 组件介绍与自定义开发-CSDN博客

官方文档：隐语开放标准介绍 | 开放标准 v1.0.dev240328 | 隐语 SecretFlow

01 隐语开放标准

隐语开放标准是为隐私保护应用设计的协议栈。

目前，隐语开放标准包括数据、组件和节点评估协议，将很快引入工作流程协议。

隐语开放标准被隐语生态系统使用，包括：

SecretFlow：用于隐私保护数据分析和机器学习的统一框架。
Kuscia：基于 K8s 的隐私保护计算任务编排框架。
SecretPad：基于 Kuscia 框架的隐私保护计算 Web 平台，旨在为隐私保护数据智能和机器学习功能提供便捷访问。
SCQL 和 TEE 引擎将来也将使用隐语开放标准。

数据

隐语对数据制定了面向隐私计算场景的标准定义，采用DistData，其中包含两种子类数据：

（1）PublicData（可以公开的数据），包含name、type、meta、system_info等信息，不同的type对应不同的meta；

（2）DataRef（远程数据的句柄引用，会指定对应的所有者（所有权）以及对应的URI），比如分散在各计算节点中的碎片数据就是一种远程的密态数据。

数据调用过程：

指令需要根据所有者以及对应的URI找到对应的数据RemoteObject，然后在所有者本地做处理操作。

组件

组件（Component）是隐语开放标准中最复杂的协议，组件表示可以集成到工作流中的一份应用程序。

你可以使用ComponentDef来定义一个组件：

domain: 组件的命名空间。可以使用此字段对组件进行分组。例如，在SecretFlow中，我们有 ‘ml.train’，’feature’等。

name: 在命名空间中必须是唯一的。但是，在不同的命名空间中，可以具有相同名称的组件。

version: 组件的版本。

attributes: 组件的属性。

inputs ：组件的输入要求

outputs: 组件的输出要求

通过domain， name和version，用户可以在系统中定位到一个唯一的组件。

我们将组件的所有属性组织成属性树。

树的叶子节点称为原子属性，表示用户需要填写的固定字段，例如桶大小或学习率，在图中表示为”a/b”，“a/c/e/i”，“a/c/f/j”。
树的非叶子节点称为属性组。有两种类型的属性组：
- Struct Attribute Group ：组内的所有子节点都需要一起填写。例如，在图中”a/c/f”，“a/d”，“a/d/g”。
- Union Attribute Group ：用户必须选择组内的一个子节点进行填写。例如，在图中”a/c”和”a/d/h”。

Struct Attribute Group 有点类似问卷调查中的某一类问题，比如个人信息（包括姓名、年龄等）；

Union Attribute Group 有点类似问卷调查中设计的分支。（比如不同职业，对应需要回答的问题不同）

IoDef用于指定组件的输入或输出要求。

节点执行

组件的运行时实例称为节点。

要评估一个应用程序的组件，您必须提供以下内容：

StorageConfig，您必须提供它以让应用程序获取由DataRef指定的远程数据。
NodeEvalParam，ComponentDef所需的所有字段。

结果由应用程序的NodeEvalResult表示。

运行报告

包含以下内容：

02 隐语组件列表

隐语提供了很多组件，可以参考SecretFlow 组件列表 | SecretFlow v1.6.1b0 | 隐语 SecretFlow。并且提供了可视化的编程界面，方便调用执行。

03 调用隐语组件

有三种不同的方式调用PSI隐私求交组件

SecretFlow 组件指南 | SecretFlow v1.7.0b0 | 隐语 SecretFlow

连接中可以找到完整代码

运行指令时报错：

解决如下：来自shell - bash 中的字符串比较。[[：未找到 - 堆栈溢出 --- shell - String comparison in bash. [[: not found - Stack Overflow

04 新增隐语组件

构建一个 SecretFlow 组件的简要步骤如下：

1.在 secretflow/component/ 目录下创建一个新的文件。

2.使用 secretflow.component.component.Component 创建一个组件类：

from secretflow.component.component import IoType
from secretflow.component.data_utils import DistDataType

train_test_split_comp.float_attr(
    name="train_size",
    desc="Proportion of the dataset to include in the train subset.",
    is_list=False,
    is_optional=True,
    default_value=0.75,
    allowed_values=None,
    lower_bound=0.0,
    upper_bound=1.0,
    lower_bound_inclusive=True,
    upper_bound_inclusive=True,
)
train_test_split_comp.float_attr(
    name="test_size",
    desc="Proportion of the dataset to include in the test subset.",
    is_list=False,
    is_optional=True,
    default_value=0.25,
    allowed_values=None,
    lower_bound=0.0,
    upper_bound=1.0,
    lower_bound_inclusive=True,
    upper_bound_inclusive=True,
)
train_test_split_comp.int_attr(
    name="random_state",
    desc="Specify the random seed of the shuffling.",
    is_list=False,
    is_optional=True,
    default_value=1234,
)
train_test_split_comp.bool_attr(
    name="shuffle",
    desc="Whether to shuffle the data before splitting.",
    is_list=False,
    is_optional=True,
    default_value=True,
)
train_test_split_comp.io(
    io_type=IoType.INPUT,
    name="input_data",
    desc="Input dataset.",
    types=[DistDataType.VERTICAL_TABLE],
    col_params=None,
)
train_test_split_comp.io(
    io_type=IoType.OUTPUT,
    name="train",
    desc="Output train dataset.",
    types=[DistDataType.VERTICAL_TABLE],
    col_params=None,
)
train_test_split_comp.io(
    io_type=IoType.OUTPUT,
    name="test",
    desc="Output test dataset.",
    types=[DistDataType.VERTICAL_TABLE],
    col_params=None,
)

3.定义属性和输入输出。

from secretflow.component.component import IoType
from secretflow.component.data_utils import DistDataType

train_test_split_comp.float_attr(
    name="train_size",
    desc="Proportion of the dataset to include in the train subset.",
    is_list=False,
    is_optional=True,
    default_value=0.75,
    allowed_values=None,
    lower_bound=0.0,
    upper_bound=1.0,
    lower_bound_inclusive=True,
    upper_bound_inclusive=True,
)
train_test_split_comp.float_attr(
    name="test_size",
    desc="Proportion of the dataset to include in the test subset.",
    is_list=False,
    is_optional=True,
    default_value=0.25,
    allowed_values=None,
    lower_bound=0.0,
    upper_bound=1.0,
    lower_bound_inclusive=True,
    upper_bound_inclusive=True,
)
train_test_split_comp.int_attr(
    name="random_state",
    desc="Specify the random seed of the shuffling.",
    is_list=False,
    is_optional=True,
    default_value=1234,
)
train_test_split_comp.bool_attr(
    name="shuffle",
    desc="Whether to shuffle the data before splitting.",
    is_list=False,
    is_optional=True,
    default_value=True,
)
train_test_split_comp.io(
    io_type=IoType.INPUT,
    name="input_data",
    desc="Input dataset.",
    types=[DistDataType.VERTICAL_TABLE],
    col_params=None,
)
train_test_split_comp.io(
    io_type=IoType.OUTPUT,
    name="train",
    desc="Output train dataset.",
    types=[DistDataType.VERTICAL_TABLE],
    col_params=None,
)
train_test_split_comp.io(
    io_type=IoType.OUTPUT,
    name="test",
    desc="Output test dataset.",
    types=[DistDataType.VERTICAL_TABLE],
    col_params=None,
)

4.定义执行函数

from secretflow.spec.v1.data_pb2 import DistData

# Signature of eval_fn must be
#  func(*, ctx, attr_0, attr_1, ..., input_0, input_1, ..., output_0, output_1, ...) -> typing.Dict[str, DistData]
# All the arguments are keyword-only, so orders don't matter.
@train_test_split_comp.eval_fn
def train_test_split_eval_fn(
    *, ctx, train_size, test_size, random_state, shuffle, input_data, train, test
):
    # Please check more examples to learn component utils.
    # ctx includes some parsed cluster def and other useful meta.

    # The output of eval_fn is a map of DistDatas of which keys are output names.
    return {"train": DistData(), "test": DistData()}

5.将你的新组件加入到 secretflow.component.entry 的 ALL_COMPONENTS 中。