安装kubeflow0.2.2–jupyter及tensorflow核心组件
安装ksonnet
curl -o ks_0.9.2_linux_amd64.tar.gz http://kubeflow.oss-cn-beijing.aliyuncs.com/ks_0.9.2_linux_amd64.tar.gz
tar -xvf ks_0.9.2_linux_amd64.tar.gz
cp ks_0.9.2_linux_amd64/ks /usr/local/bin/
ks version
准备github token
登录https://github.com/settings/tokens创建token。无须提供任何权限给这个token
echo "export GITHUB_TOKEN=${GITHUB_TOKEN}" >> ~/.bashrc
export GITHUB_TOKEN=你的GitHub token
安装kubeflow,此处先安装支持tensorflow的核心组件
NAMESPACE=kubeflow
kubectl create namespace ${NAMESPACE}
VERSION=jupyterhub-alibaba-cloud
APP_NAME=my-kubeflow
ks init ${APP_NAME} --api-spec=version:v1.9.3
cd ${APP_NAME}
ks env set default --namespace ${NAMESPACE}
ks registry add kubeflow github.com/cheyang/kubeflow/tree/${VERSION}/kubeflow
ks registry list
ks pkg install kubeflow/core@${VERSION}
ks pkg install kubeflow/tf-serving@${VERSION}
ks pkg install kubeflow/tf-job@${VERSION}
ks pkg install kubeflow/tf-serving@${VERSION}
ks pkg install kubeflow/tf-job@${VERSION}
ks generate kubeflow-core kubeflow-core
ks param set kubeflow-core cloud ack
ks param set kubeflow-core jupyterHubImage registry.aliyuncs.com/kubeflow-images-public/jupyterhub-k8s:1.0.1
ks param set kubeflow-core tfJobImage registry.cn-hangzhou.aliyuncs.com/kubeflow-images-public/tf_operator:v20180326-6214e560
ks param set kubeflow-core tfAmbassadorImage registry.aliyuncs.com/datawire/ambassador:0.34.0
ks param set kubeflow-core tfStatsdImage registry.aliyuncs.com/datawire/statsd:0.34.0
ks param set kubeflow-core jupyterNotebookRegistry registry.aliyuncs.com
ks param set kubeflow-core JupyterNotebookRepoName kubeflow-images-public
ks param set kubeflow-core jupyterHubServiceType LoadBalancer
ks param set kubeflow-core tfAmbassadorServiceType LoadBalancer
ks param set kubeflow-core tfJobUiServiceType LoadBalancer
ks pkg install kubeflow/tf-job@${VERSION}
ks apply default -c kubeflow-core
执行完毕后查看集群pod
[root@master pipelines]# kubectl -n kubeflow get po
NAME READY STATUS RESTARTS AGE
ambassador-cd476cb56-jk79h 2/2 Running 0 27h
ambassador-cd476cb56-md5x5 2/2 Running 0 27h
ambassador-cd476cb56-qr5l4 2/2 Running 0 27h
centraldashboard-7d45f8cbc8-vdksx 1/1 Running 0 27h
tf-hub-0 1/1 Running 0 27h
tf-job-dashboard-9fd7d588-z9nc8 1/1 Running 0 27h
tf-job-operator-8d98cd89b-vbv97 1/1 Running 0 27h
暴露jupyter外部访问,此处使用loadbalancer
kubectl -n kubeflow edit svc tf-hub-lb
修改type为loadbalancer
spec:
clusterIP: 10.104.28.245
externalTrafficPolicy: Cluster
ports:
- name: hub
nodePort: 32357
port: 80
protocol: TCP
targetPort: 8000
selector:
app: tf-hub
sessionAffinity: None
type: LoadBalancer
获取到jupyter的外部地址
[root@master pipelines]# kubectl get service -n kubeflow |grep tf-hub-lb
tf-hub-lb LoadBalancer 10.104.28.245 10.18.5.30 80:32357/TCP 28h
本文测试获得的地址为10.18.5.30,使用浏览器访问http://10.18.5.30,进入jupyter
使用任意用户名密码登录,本文使用用户名mocktest,登录后点击【start my server】,填写server配置,选择镜像
确定后等待server被创建,页面跳转到jupyter
此时在k8s集群内查看pod,可以找到对应的server
[root@master pipelines]# kubectl get po -n kubeflow
NAME READY STATUS RESTARTS AGE
jupyter-mock 1/1 Running 0 97s
运行一段python代码,代码成功运行
使用kubeflow进行tensorflow训练
参照tf-hb-lb暴露方式暴露service ambassador,本文地址为10.18.5.32,浏览器访问http://10.18.5.32,进入tf-dashboard,选择create,填写相应信息,此处我只填写了master镜像
查看k8s内pod信息
[root@master pipelines]# kubectl get pod
NAME READY STATUS RESTARTS AGE
2020-02-12-master-0q6y-0-cy42r 0/1 Running 0 5m16s
可以看到master对应的pod被创建,页面上展示该job在运行中,证明安装的kubeflow支持tensorflow
参考:https://blog.csdn.net/weixin_33849942/article/details/89699917