在FATE中guest和host的区别是什么,两者的功能分别是什么?
- Guest表示数据应用方,Host是数据提供方。
- 纵向联邦时guest是拥有标签(变量y)的一方,纵向联邦中不含label的host方
- Guest通常是使用数据来构建和训练机器学习模型的一方,Host提供数据给Guest进行机器学习模型的构建和训练。
- 在FATE中,一般是由Guest发起建模流程,Host不直接发起建模流程,但它会参与由Arbiter辅助的模型聚合过程。
Arbiter的作用主要是作为协作者和中立第三方,负责聚合多方模型。具体来说,Arbiter的作用可以归纳为以下几点:
- 协作者角色:Arbiter在FATE中充当协作者的角色,它协调并促进不同参与方之间的合作与交互。
- 中立第三方:作为中立第三方,Arbiter不偏袒任何一方,确保各方在合作过程中的公平性和公正性。
- 聚合多方模型:这是Arbiter的核心功能,它能够将多个参与方提供的模型进行聚合,形成一个统一的模型或结果。
from fate_client.pipeline import FateFlowPipeline
pipeline = FateFlowPipeline().set_parties(
local="0")
pipeline.set_site_role("local")
pipeline.set_site_party_id("0")
meta = {'delimiter': ',',
'dtype': 'float64',
'input_format': 'dense',
'label_type': 'float64',
'label_name': 'motor_speed',
'match_id_name': 'idx',
'match_id_range': 0,
'tag_value_delimiter': ':',
'tag_with_value': False,
'weight_type': 'float64'}
pipeline.transform_local_file_to_dataframe( file="/data/projects/fate/examples/data/motor_hetero_guest.csv",
meta=meta, head=True, extend_sid=True,
namespace="experiment",
name="motor_hetero_guest"
)
meta = {'delimiter': ',',
'dtype': 'float64',
'input_format': 'dense',
'match_id_name': 'idx',
'match_id_range': 0,
'tag_value_delimiter': ':',
'tag_with_value': False,
'weight_type': 'float64'}
pipeline = FateFlowPipeline().set_parties(
local="0")
pipeline.set_site_role("local")
pipeline.set_site_party_id("0")
pipeline.transform_local_file_to_dataframe( file="/data/projects/fate/examples/data/motor_hetero_host_1.csv",
meta=meta, head=True, extend_sid=True,
namespace="experiment",
name="motor_hetero_host_1"
)
meta = {'delimiter': ',',
'dtype': 'float64',
'input_format': 'dense',
'match_id_name': 'idx',
'match_id_range': 0,
'tag_value_delimiter': ':',
'tag_with_value': False,
'weight_type': 'float64'}
pipeline = FateFlowPipeline().set_parties(
local="0")
pipeline.set_site_role("local")
pipeline.set_site_party_id("0")
pipeline.transform_local_file_to_dataframe( file="/data/projects/fate/examples/data/motor_hetero_host_2.csv",
meta=meta, head=True, extend_sid=True,
namespace="experiment",
name="motor_hetero_host_2"
)
通过上传两份host数据模拟两个host节点。修改pipeline文件下的config.yaml,此文件为训练时读取的节点设置文件。
parties: # parties default id
guest:
- '9999'
host:
- '10000'
- '10001'
arbiter:
- '10000'
data_base_dir: "" # path to project base where data is located
修改训练模型文件,使用Reader类实例化的对象通过上传时设置的tablename和namespace读取到数据,设置两次任务,读取两种host节点数据。
import argparse
from fate_client.pipeline import FateFlowPipeline
from fate_client.pipeline.components.fate import CoordinatedLinR, PSI, Evaluation, Reader
from fate_client.pipeline.utils import test_utils
def main(config="../config.yaml", namespace=""):
if isinstance(config, str):
config = test_utils.load_job_config(config)
parties = config.parties
guest = parties.guest[0]
host = parties.host[0]
arbiter = parties.arbiter[0]
pipeline = FateFlowPipeline().set_parties(guest=guest, host=host, arbiter=arbiter)
if config.task_cores:
pipeline.conf.set("task_cores", config.task_cores)
if config.timeout:
pipeline.conf.set("timeout", config.timeout)
reader_0 = Reader("reader_0", runtime_parties=dict(guest=guest, host=host))
reader_0.guest.task_parameters(namespace=f"experiment{namespace}", name="motor_hetero_guest")
reader_0.hosts[0].task_parameters(namespace=f"experiment{namespace}", name="motor_hetero_host_1")
psi_0 = PSI("psi_0", input_data=reader_0.outputs["output_data"])
linr_0 = CoordinatedLinR("linr_0",
epochs=10,
batch_size=100,
optimizer={"method": "rmsprop", "optimizer_params": {"lr": 0.01},
"alpha": 0.001},
init_param={"fit_intercept": True},
train_data=psi_0.outputs["output_data"])
evaluation_0 = Evaluation("evaluation_0",
runtime_parties=dict(guest=guest),
default_eval_setting="regression",
input_datas=linr_0.outputs["train_output_data"])
pipeline.add_tasks([reader_0, psi_0, linr_0, evaluation_0])
pipeline.compile()
# print(pipeline.get_dag())
pipeline.fit()
pipeline.deploy([psi_0,linr_0])
host=parties.host[1]
print(parties.host)
predict_pipeline = FateFlowPipeline().set_parties(guest=guest, host=host, arbiter=arbiter)
reader_1 = Reader("reader_1", runtime_parties=dict(guest=guest, host=host))
reader_1.guest.task_parameters(namespace=f"experiment{namespace}", name="motor_hetero_guest")
reader_1.hosts[0].task_parameters(namespace=f"experiment{namespace}", name="motor_hetero_host_2")
psi_1 = PSI("psi_1", input_data=reader_1.outputs["output_data"])
linr_1 = CoordinatedLinR("linr_1",
epochs=10,
batch_size=100,
optimizer={"method": "rmsprop", "optimizer_params": {"lr": 0.01},
"alpha": 0.001},
init_param={"fit_intercept": True},
train_data=psi_1.outputs["output_data"])
predict_pipeline.add_tasks([reader_1, psi_1, linr_1])
predict_pipeline.compile()
# print("\n\n\n")
# print(predict_pipeline.compile().get_dag())
# predict_pipeline.predict()
predict_pipeline.fit()
if __name__ == "__main__":
parser = argparse.ArgumentParser("PIPELINE DEMO")
parser.add_argument("--config", type=str, default="../config.yaml",
help="config file")
parser.add_argument("--namespace", type=str, default="",
help="namespace for data stored in FATE")
args = parser.parse_args()
main(config=args.config, namespace=args.namespace)
第一次聚合
第二次聚合
在fate_flow目录下的model文件夹中有每次训练的模型
上传mnist数据集转换的csv
from fate_client.pipeline import FateFlowPipeline
pipeline = FateFlowPipeline().set_parties(
local="0")
pipeline.set_site_role("local")
pipeline.set_site_party_id("0")
meta = {'delimiter': ',',
'dtype': 'int',
'input_format': 'dense',
'label_type': 'int64',
'label_name': 'y',
'match_id_name': 'idx',
'match_id_range': 0,
'tag_value_delimiter': ':',
'tag_with_value': False,
'weight_type': 'float64'}
pipeline.transform_local_file_to_dataframe( file="/data/projects/fate/examples/data/mnist_1_train.csv",
meta=meta, head=True, extend_sid=True,
namespace="experiment",
name="mnist_homo_guest"
)
meta = {'delimiter': ',',
'dtype': 'int',
'input_format': 'dense',
'label_type': 'int64',
'label_name': 'y',
'match_id_name': 'idx',
'match_id_range': 0,
'tag_value_delimiter': ':',
'tag_with_value': False,
'weight_type': 'float64'}
pipeline = FateFlowPipeline().set_parties(
local="0")
pipeline.set_site_role("local")
pipeline.set_site_party_id("0")
pipeline.transform_local_file_to_dataframe( file="/data/projects/fate/examples/data/mnist_2_train.csv",
meta=meta, head=True, extend_sid=True,
namespace="experiment",
name="mnist_homo_host"
)
用homo_nn算法进行训练会报错: