一、安装kubeflow
kustomize 版本3.3.0
Kubeflow 版本1.6
1.1下载官方安装脚本仓库
安装1.6.0版本
mkdir /data1/kubeflow_file
cd /data1/kubeflow_file
wget https://github.com/kubeflow/manifests/archive/refs/tags/v1.6.0.zip
unzip v1.6.0.zip
#unzip v1.6.0.zip
#mv manifests-1.6.0/ manifests
1.2下载安装kustomize
这里选择 kustomize 3.3.0(原教程为3.2.0,但已经没有3.2.0版本)
cd /data1/kubeflow_file/
#curl -s "https://raw.githubusercontent.com/kubernetes-sigs/kustomize/master/hack/install_kustomize.sh" | bash
curl -o install_kustomize.sh "https://raw.githubusercontent.com/kubernetes-sigs/kustomize/master/hack/install_kustomize.sh"
sh install_kustomize.sh 3.3.0 .
kustomize version
Version: {Version:kustomize/v3.3.0 GitCommit:7050c6a7b692fdba6e831e63c7b83920ab03ad76 BuildDate:2019-10-24T17:54:30Z GoOs:linux GoArch:amd64}
添加到/bin,方便运行
cp kustomize /bin/
1.3找到国外镜像的包,提前下载
查看某个镜像需要提前下载的镜像
cd /data1/kubeflow_file/manifests-1.6.0
kustomize build example |grep 'image: gcr.io'|awk '$2 != "" { print $2}' |sort -u
提前下载好国外源的镜像
类似以下代码,在镜像名称前加上 m.daocloud.io即可。
docker pull m.daocloud.io/gcr.io/ml-pipeline/frontend:2.0.0-alpha.3
如此方便主要得益于public-image-mirror项目,喝水不忘挖井人,表达感谢。
DaoCloud/public-image-mirror
1.4 准备sc、pv、pvc
准备kubeflow组件的存储
首先,准备本地目录
mkdir -p /data/k8s/istio-authservice /data/k8s/katib-mysql /data/k8s/minio /data/k8s/mysql-pv-claim
修改auth路径权限
chmod -R 777 /data/k8s/istio-authservice/
编写kubeflow-storage.yaml,路径需要跟上面的本地目录一一对应。
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
name: local-storage
provisioner: kubernetes.io/no-provisioner
volumeBindingMode: WaitForFirstConsumer
---
apiVersion: v1
kind: PersistentVolume
metadata:
name: authservice
namespace: istio-system
labels:
type: local
spec:
storageClassName: local-storage
capacity:
storage: 10Gi
accessModes:
- ReadWriteOnce
hostPath:
path: "/data1/k8s/istio-authservice"
---
apiVersion: v1
kind: PersistentVolume
metadata:
namespace: kubeflow
name: katib-mysql
labels:
type: local
spec:
storageClassName: local-storage
capacity:
storage: 10Gi
accessModes:
- ReadWriteOnce
hostPath:
path: "/data1/k8s/katib-mysql"
---
apiVersion: v1
kind: PersistentVolume
metadata:
name: minio
namespace: kubeflow
labels:
type: local
spec:
storageClassName: local-storage
capacity:
storage: 20Gi
accessModes:
- ReadWriteOnce
hostPath:
path: "/data1/k8s/minio"
---
apiVersion: v1
kind: PersistentVolume
metadata:
name: mysql-pv-claim
namespace: kubeflow
labels:
type: local
spec:
storageClassName: local-storage
capacity:
storage: 20Gi
accessModes:
- ReadWriteOnce
hostPath:
path: "/data1/k8s/mysql-pv-claim"
执行
kubectl apply -f kubeflow-storage.yaml
修改安装脚本拉取镜像
在kustomization.yaml添加images参数,在执行时用已经下载好的镜像替换国外源我们无法下载的镜像,部分截图如下:
修改yaml,每个文件添加存储卷名称:storageClassName: local-storage
apps/katib/upstream/components/mysql/pvc.yaml
apps/pipeline/upstream/third-party/minio/base/minio-pvc.yaml
apps/pipeline/upstream/third-party/mysql/base/mysql-pv-claim.yaml
common/oidc-authservice/base/pvc.yaml
一键安装
cd /data1/kubeflow_file/manifests-1.6.0
# while ! kubectl kustomize example | kubectl apply -f -; do echo "Retrying to apply resources"; sleep 10; done
while ! kustomize build example | kubectl apply -f -; do echo "Retrying to apply resources"; sleep 10; done
报以下错误,尝试了一些连接命令,重新启动就好了。
The Definitive Debugging Guide for the cert-manager Webhook Pod
Error from server (InternalError): error when creating "STDIN":
Internal error occurred: failed calling webhook "webhook.cert-manager.io": failed to call webhook:
Post "https://cert-manager-webhook.cert-manager.svc:443/mutate?timeout=10s":
dial tcp 10.96.20.99:443: connect: connection refused
等待半个小时看下,kubectl get pods --all-namespaces
,发现还有一些镜像拉取错误的,需要按上面方法再补一下。其他status则需要进去kubectl describe pod pod_name -n namespace
看下具体报错情况。
访问Kubeflow Dashboard
[root@10 kubeflow_file] kubectl port-forward --address 0.0.0.0 svc/istio-ingressgateway -n istio-system 8080:80
# --address 0.0.0.0 代表可以外部host访问,不加的话只能本地访问
# port-forward 将本地的8080端口转发到pod svc/istio-ingressgateway 的80端口
只能http访问,https有问题。
默认用户名和密码:
user@example.com
12341234
参考
从零搭建机器学习平台Kubeflow
玩转kubeflow(开新坑):1.3版本国内镜像安装及kubeflow组件介绍
kubeflow初探(二):kubeflow安装