使用Docker-Compose方式部署KubeFATE集群

目录

1 目标

2 准备工作

2.1 两个虚拟机的配置

2.2 安装Docker 和Docker-Compose

2.3 网络互通

2.4 下载FATE各组镜像

2.5 安装KubeFATE

3 用Docker Compose部署FATE

3.1 配置需要部署的实例数目

3.2 执行部署脚本

 3.3 验证部署

3.4 验证Serving-Service功能


FATE、KubeFATE就不过多介绍,网上都有相关文章,本文的目的是从零开始部署一个可以跑的通的FATE。FATE版本v1.8.0.

首先附上官方文档:使用Docker Compose 部署 FATE

b站教学: 《联邦学习技术介绍、应用和FATE开源框架》第2课!好基友Vmware来分享部署模型与开发环境搭建啦

我们参照官方文档,一步步操作、补充官方文档内容

1 目标

两个可以互通的FATE实例,每个实例均包括FATE所有组件。

2 准备工作

  1. 两个主机(物理机或者虚拟机,都是Centos7系统);
  2. 所有主机安装Docker 版本 : 18+;
  3. 所有主机安装Docker-Compose 版本: 1.24+;
  4. 部署机可以联网,所以主机相互之间可以网络互通;
  5. 运行机已经下载FATE 的各组件镜像(离线构建镜像参考文档构建镜像)。

下面我们一步步完成准备工作:

2.1 两个虚拟机的配置

可以参考这篇文章中的二、三两节,讲的非常详细,一步步操作即可。

虚拟机配置

第一台虚拟机 “fate01”,IP地址:192.168.73.161;第二台虚拟机“fate02”,IP地址:192.168.73.162 。 后续都以这两台举例。

2.2 安装Docker 和Docker-Compose

192.168.73.161 和 192.168.73.162 同时进行:

docker环境搭建

# 由于 firewalld 防火墙与 docker 服务有冲突,因此卸载 firewalld
# 可以使用 iptables 来代替 firewalld
systemctl stop firewalld && systemctl disable firewalld && yum -y remove firewalld

# 添加 Docker 源
yum install -y yum-utils
yum-config-manager --add-repo https://download.docker.com/linux/centos/docker-ce.repo

# 安装 Docker
yum install -y docker-ce

# 启动 Docker 服务,并添加开机自启
systemctl start docker && systemctl enable docker

安装Docker-Compose

curl -L https://get.daocloud.io/docker/compose/releases/download/1.29.2/docker-compose-`uname -s`-`uname -m` > /usr/local/bin/docker-compose

chmod +x /usr/local/bin/docker-compose

版本为1.29.2,满足要求,如果要最新版的话,随便去搜一下即可。

测试下版本号

# 输出 docker 版本号
docker --version


# 输出 docker-compose 版本号
docker-compose --version

有结果证明安装成功

2.3 网络互通

同一台主机的两个虚拟机理论上是互通的,如果不行的话,可以配置下ssh免密登录,参考

4-11

2.4 下载FATE各组镜像

通过以下命令从Docker Hub获取镜像

docker pull federatedai/eggroll:1.8.0-release
docker pull federatedai/fateboard:1.8.0-release
docker pull federatedai/python:1.8.0-release
docker pull federatedai/client:1.8.0-release
docker pull federatedai/serving-server:2.1.5-release
docker pull federatedai/serving-proxy:2.1.5-release
docker pull federatedai/serving-admin:2.1.5-release
docker pull bitnami/zookeeper:3.7.0 
docker pull mysql:8.0.28

检查所有镜像是否下载成功

2.5 安装KubeFATE

# 从github上获取资源包
wget https://github.com/FederatedAI/KubeFATE/releases/download/v1.8.0-a/kubefate-docker-compose-v1.8.0-a.tar.gz

# 解压
tar -xzf kubefate-docker-compose-v1.8.0-a.tar.gz

到这里准备工作就做完了,下面开始部署FATE。

3 用Docker Compose部署FATE

3.1 配置需要部署的实例数目

部署脚本提供了部署多个FATE实例的功能,下面的例子我们部署在两个机器上,每个机器运行一个FATE实例,这里两台机器的IP分别为192.168.73.162192.168.73.161

根据需求修改配置文件 kubeFATE\docker-deploy\parties.conf

下面是修改好的文件,party 10000的集群将部署在192.168.73.161上,而party 9999的集群将部署在192.168.73.162上。为了减少所需拉取镜像的大小,KubeFATE在默认情况下,会使用不带神经网络的“python”容器,若需要跑神经网络的算法则需把“parties.conf”中的enabled_nn设置成true

修改parties.conf文件

cd docker-deploy

vi parties.conf
user=fate
dir=/data/projects/fate
party_list=(10000 9999)
party_ip_list=(192.168.73.161 192.168.73.162)
serving_ip_list=(192.168.73.161 192.168.73.162)

# backend could be eggroll, spark_rabbitmq and spark_pulsar spark_local_pulsar
backend=eggroll

# true if you need python-nn else false, the default value will be false
enabled_nn=true

# default
exchangeip=

# modify if you are going to use an external db
mysql_ip=mysql
mysql_user=fate
mysql_password=fate_dev
mysql_db=fate_flow

name_node=hdfs://namenode:9000

# Define fateboard login information
fateboard_username=admin
fateboard_password=admin

# Define serving admin login information
serving_admin_username=admin
serving_admin_password=admin

3.2 执行部署脚本

进入目录kubeFATE\docker-deploy,然后运行:

# 生成部署文件
bash ./generate_config.sh        

# 训练、服务分开部署

bash docker_deploy.sh all --training

bash docker_deploy.sh all --serving

输出显示如下,若各个组件都是运行(up)状态,说明部署成功。

 3.3 验证部署

toy_example验证

#在192.168.73.162上执行下列命令
$ docker exec -it confs-10000_client_1 bash                        #进入python组件容器内部
$ flow test toy --guest-party-id 10000 --host-party-id 9999        #验证

如果测试通过,屏幕将显示类似如下消息:

3.4 验证Serving-Service功能

以官方文档为例

Host方操作

进入party10000 client容器

docker exec -it confs-10000_client_1 bash

修改examples/upload_host.json

一般用vim进入文件修改

cat > fateflow/examples/upload/upload_host.json <<EOF
{
  "file": "examples/data/breast_hetero_host.csv",
  "id_delimiter": ",",
  "head": 1,
  "partition": 10,
  "namespace": "experiment",
  "table_name": "breast_hetero_host"
}
EOF

上传host数据

flow data upload -c fateflow/examples/upload/upload_host.json

Guest方操作

进入party9999 client容器

docker exec -it confs-9999_client_1 bash

修改examples/upload_guest.json

cat > fateflow/examples/upload/upload_guest.json <<EOF
{
  "file": "examples/data/breast_hetero_guest.csv",
  "id_delimiter": ",",
  "head": 1,
  "partition": 4,
  "namespace": "experiment",
  "table_name": "breast_hetero_guest"
}
EOF

上传guest数据

flow data upload -c fateflow/examples/upload/upload_guest.json

成功的话会显示

 将http中的fateboard改成你的地址,如192.168.73.162 就可以在线可视化查看进程。

修改examples/test_hetero_lr_job_conf.json

cat > fateflow/examples/lr/test_hetero_lr_job_conf.json <<EOF
{
  "dsl_version": "2",
  "initiator": {
    "role": "guest",
    "party_id": 9999
  },
  "role": {
    "guest": [
      9999
    ],
    "host": [
      10000
    ],
    "arbiter": [
      10000
    ]
  },
  "job_parameters": {
    "common": {
      "task_parallelism": 2,
      "computing_partitions": 8,
      "task_cores": 4,
      "auto_retries": 1
    }
  },
  "component_parameters": {
    "common": {
      "intersection_0": {
        "intersect_method": "raw",
        "sync_intersect_ids": true,
        "only_output_key": false
      },
      "hetero_lr_0": {
        "penalty": "L2",
        "optimizer": "rmsprop",
        "alpha": 0.01,
        "max_iter": 3,
        "batch_size": 320,
        "learning_rate": 0.15,
        "init_param": {
          "init_method": "random_uniform"
        }
      }
    },
    "role": {
      "guest": {
        "0": {
          "reader_0": {
            "table": {
              "name": "breast_hetero_guest",
              "namespace": "experiment"
            }
          },
          "dataio_0": {
            "with_label": true,
            "label_name": "y",
            "label_type": "int",
            "output_format": "dense"
          }
        }
      },
      "host": {
        "0": {
          "reader_0": {
            "table": {
              "name": "breast_hetero_host",
              "namespace": "experiment"
            }
          },
          "dataio_0": {
            "with_label": false,
            "output_format": "dense"
          },
          "evaluation_0": {
            "need_run": false
          }
        }
      }
    }
  }
}
EOF

修改examples/test_hetero_lr_job_dsl.json

cat > fateflow/examples/lr/test_hetero_lr_job_dsl.json <<EOF
{
  "components": {
    "reader_0": {
      "module": "Reader",
      "output": {
        "data": [
          "table"
        ]
      }
    },
    "dataio_0": {
      "module": "DataIO",
      "input": {
        "data": {
          "data": [
            "reader_0.table"
          ]
        }
      },
      "output": {
        "data": [
          "train"
        ],
        "model": [
          "dataio"
        ]
      },
      "need_deploy": true
    },
    "intersection_0": {
      "module": "Intersection",
      "input": {
        "data": {
          "data": [
            "dataio_0.train"
          ]
        }
      },
      "output": {
        "data": [
          "train"
        ]
      }
    },
    "hetero_feature_binning_0": {
      "module": "HeteroFeatureBinning",
      "input": {
        "data": {
          "data": [
            "intersection_0.train"
          ]
        }
      },
      "output": {
        "data": [
          "train"
        ],
        "model": [
          "hetero_feature_binning"
        ]
      }
    },
    "hetero_feature_selection_0": {
      "module": "HeteroFeatureSelection",
      "input": {
        "data": {
          "data": [
            "hetero_feature_binning_0.train"
          ]
        },
        "isometric_model": [
          "hetero_feature_binning_0.hetero_feature_binning"
        ]
      },
      "output": {
        "data": [
          "train"
        ],
        "model": [
          "selected"
        ]
      }
    },
    "hetero_lr_0": {
      "module": "HeteroLR",
      "input": {
        "data": {
          "train_data": [
            "hetero_feature_selection_0.train"
          ]
        }
      },
      "output": {
        "data": [
          "train"
        ],
        "model": [
          "hetero_lr"
        ]
      }
    },
    "evaluation_0": {
      "module": "Evaluation",
      "input": {
        "data": {
          "data": [
            "hetero_lr_0.train"
          ]
        }
      },
      "output": {
        "data": [
          "evaluate"
        ]
      }
    }
  }
}
EOF

提交任务

flow job submit -d fateflow/examples/lr/test_hetero_lr_job_dsl.json -c fateflow/examples/lr/test_hetero_lr_job_conf.json

output:

 查看训练任务状态

flow task query -r guest -j 202207140922339079250 | grep -w f_status

output:

部署模型 

flow model deploy --model-id arbiter-10000#guest-9999#host-10000#model --model-version 202207140922339079250
{
    "data": {
        "arbiter": {
            "10000": 0
        },
        "detail": {
            "arbiter": {
                "10000": {
                    "retcode": 0,
                    "retmsg": "deploy model of role arbiter 10000 success"
                }
            },
            "guest": {
                "9999": {
                    "retcode": 0,
                    "retmsg": "deploy model of role guest 9999 success"
                }
            },
            "host": {
                "10000": {
                    "retcode": 0,
                    "retmsg": "deploy model of role host 10000 success"
                }
            }
        },
        "guest": {
            "9999": 0
        },
        "host": {
            "10000": 0
        },
        "model_id": "arbiter-10000#guest-9999#host-10000#model",
        "model_version": "202207150151245432850"
    },
    "retcode": 0,
    "retmsg": "success"
}

注意:后面需要用到的model_version都是这一步得到的"model_version": "202207150151245432850"

修改加载模型的配置

cat > fateflow/examples/model/publish_load_model.json <<EOF
{
  "initiator": {
    "party_id": "9999",
    "role": "guest"
  },
  "role": {
    "guest": [
      "9999"
    ],
    "host": [
      "10000"
    ],
    "arbiter": [
      "10000"
    ]
  },
  "job_parameters": {
    "model_id": "arbiter-10000#guest-9999#host-10000#model",
    "model_version": "202207150151245432850"
  }
}
EOF

加载模型

flow model load -c fateflow/examples/model/publish_load_model.json

output:

cat > fateflow/examples/model/bind_model_service.json <<EOF
{
    "service_id": "test",
    "initiator": {
        "party_id": "9999",
        "role": "guest"
    },
    "role": {
        "guest": ["9999"],
        "host": ["10000"],
        "arbiter": ["10000"]
    },
    "job_parameters": {
        "work_mode": 1,
        "model_id": "arbiter-10000#guest-9999#host-10000#model",
        "model_version": "202207150151245432850"
    }
}
EOF

绑定模型

flow model bind -c fateflow/examples/model/bind_model_service.json

output:

 在线测试

官方给出的教程是:

发送以下信息到"GUEST"方的推理服务"{SERVING_SERVICE_IP}:8059/federation/v1/inference"

$ curl -X POST -H 'Content-Type: application/json' -i 'http://192.168.7.2:8059/federation/v1/inference' --data '{
  "head": {
    "serviceId": "test"
  },
  "body": {
    "featureData": {
        "x0": 1.88669,
        "x1": -1.359293,
        "x2": 2.303601,
        "x3": 2.00137,
        "x4": 1.307686
    },
    "sendToRemoteFeatureData": {
        "phone_num": "122222222"
    }
  }
}'

output:

{"retcode":0,"retmsg":"","data":{"score":0.018025086161221948,"modelId":"guest#9999#arbiter-10000#guest-9999#host-10000#model","modelVersion":"202111240318516571130","timestamp":1637743473990},"flag":0}

我自己跑这个是报错的,所以我通过fate-serving接口

http://192.168.73.162:8350/#/home/service

 得到推理结果

 删除部署

在部署机器上运行以下命令可以停止所有FATE集群:

bash ./docker_deploy.sh --delete all

如果想要彻底删除在运行机器上部署的FATE,可以分别登录节点,然后运行命令:

cd /data/projects/fate/confs-<id>/  # <id> 组织的id,本例中代表10000或者9999
docker-compose down
rm -rf ../confs-<id>/               # 删除docker-compose部署文件

这就是一整套部署和结果测试了,希望对大家有所帮助,有各种问题可以一起讨论。

  • 9
    点赞
  • 14
    收藏
    觉得还不错? 一键收藏
  • 17
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 17
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值