Kubeflow第二篇--安装部署(kubenetes v1.18)

0 前言

予读者言:
本系列博客本义作为笔者记录所用,所以可能稍显冗长,但同时也记录了我的学习研究思路,会在学习的过程中不断更新,可供读者借鉴,能对大家有些许帮助就是笔者最为开心之事~

1 Install Kubeflow With Outside-Network(FAILED)

1.1 查看节点硬件支持

查看内存

查看内存大小:

user@node01:~$ free -m # -m 代表以兆为单位显示,也可为k/g
	              总计         已用        空闲      共享    缓冲/缓存    可用
	内存:      128601        4026      121630         106        2945      123273
	交换:           0           0           0

root@master:/home/hqc# free -m
	              总计         已用        空闲      共享    缓冲/缓存    可用
	内存:       64089        3733       53620         202        6735       59561
	交换:           0           0           0

INFO:node01内存总共125G左右,master内存62G左右

查看内存使用率:

# 安装htop
user@node01:~$ sudo apt  install htop
# 查看
user@node01:~$ htop

在这里插入图片描述

查看cpu

cpu型号:

user@node01:~$ grep "model name" /proc/cpuinfo |awk -F ':' '{print $NF}'
	 Intel(R) Core(TM) i9-10980XE CPU @ 3.00GHz
	 Intel(R) Core(TM) i9-10980XE CPU @ 3.00GHz
	 Intel(R) Core(TM) i9-10980XE CPU @ 3.00GHz

INFO:cpu是i9版本的

cpu详细信息:

user@node01:~$ cat /proc/cpuinfo
	processor	: 22
	vendor_id	: GenuineIntel
	cpu family	: 6
	model		: 85
	model name	: Intel(R) Core(TM) i9-10980XE CPU @ 3.00GHz
	stepping	: 7
	microcode	: 0x5003102
	cpu MHz		: 1200.002
	cache size	: 25344 KB
	physical id	: 0
	siblings	: 36
	core id		: 4 
	cpu cores	: 18 # cpu核心数,18核
	apicid		: 9
	initial apicid	: 9
	fpu		: yes
	fpu_exception	: yes
	cpuid level	: 22
	wp		: yes
	flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cdp_l3 invpcid_single pti retpoline mba rsb_ctxsw tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm mpx rdt_a avx512f avx512dq rdseed adx smap clflushopt clwb intel_pt avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts hwp hwp_act_window hwp_epp hwp_pkg_req avx512_vnni
	bugs		: cpu_meltdown spectre_v1 spectre_v2
	bogomips	: 6000.00
	clflush size	: 64
	cache_alignment	: 64
	address sizes	: 46 bits physical, 48 bits virtual

查看硬盘

user@node01:~$ sudo fdisk -l |grep "Disk /dev/sd"
	Disk /dev/sda:465.8 GiB,500107862016 字节,976773168 个扇区


INFO:硬盘大小为465.8 GB

user@node01:~$ df -lh
	文件系统        容量  已用  可用 已用% 挂载点
	udev             63G     0   63G    0% /dev
	tmpfs            13G  2.6M   13G    1% /run
	/dev/nvme0n1p7  469G   43G  402G   10% /
	tmpfs            63G   46M   63G    1% /dev/shm
	tmpfs           5.0M  4.0K  5.0M    1% /run/lock
	tmpfs            63G     0   63G    0% /sys/fs/cgroup
	/dev/loop0      640K  640K     0  100% /snap/gnome-logs/103
	/dev/loop1       56M   56M     0  100% /snap/core18/2284
	/dev/loop2       62M   62M     0  100% /snap/core20/1405
	/dev/loop3      249M  249M     0  100% /snap/gnome-3-38-2004/99
	/dev/loop4      128K  128K     0  100% /snap/bare/5
	/dev/loop6       66M   66M     0  100% /snap/gtk-common-themes/1515
	/dev/loop5       45M   45M     0  100% /snap/snapd/15314
	/dev/loop7      2.5M  2.5M     0  100% /snap/gnome-calculator/884
	/dev/loop8      2.3M  2.3M     0  100% /snap/gnome-system-monitor/157
	/dev/loop9      640K  640K     0  100% /snap/gnome-logs/106
	/dev/loop10      44M   44M     0  100% /snap/snapd/15177
	/dev/loop11     219M  219M     0  100% /snap/gnome-3-34-1804/77
	/dev/loop12     219M  219M     0  100% /snap/gnome-3-34-1804/72
	/dev/loop13     768K  768K     0  100% /snap/gnome-characters/741
	/dev/loop14     768K  768K     0  100% /snap/gnome-characters/761
	/dev/loop15      62M   62M     0  100% /snap/core20/1376
	/dev/loop16     2.7M  2.7M     0  100% /snap/gnome-calculator/920
	/dev/loop17     2.7M  2.7M     0  100% /snap/gnome-system-monitor/174
	/dev/loop18      66M   66M     0  100% /snap/gtk-common-themes/1519
	/dev/loop19      56M   56M     0  100% /snap/core18/2344
	/dev/loop20     248M  248M     0  100% /snap/gnome-3-38-2004/87
	/dev/nvme0n1p5  735M  117M  565M   18% /boot
	/dev/nvme0n1p8  438G   33G  383G    8% /home
	/dev/sda1       256M   32M  225M   13% /boot/efi
	tmpfs            13G   16K   13G    1% /run/user/121
	tmpfs            13G   80K   13G    1% /run/user/1000


root@master:/home/hqc# df -lh
文件系统         容量  已用  可用 已用% 挂载点
udev              32G     0   32G    0% /dev
tmpfs            6.3G  2.9M  6.3G    1% /run
/dev/nvme0n1p6    29G  3.4G   24G   13% /
/dev/nvme0n1p10   94G   40G   49G   45% /usr
tmpfs             32G   50M   32G    1% /dev/shm
tmpfs            5.0M  4.0K  5.0M    1% /run/lock
tmpfs             32G     0   32G    0% /sys/fs/cgroup
/dev/nvme0n1p9   9.4G  123M  8.8G    2% /tmp
/dev/nvme0n1p7   946M  176M  706M   20% /boot
/dev/nvme0n1p11  9.4G  7.7G  1.2G   87% /var
/dev/nvme0n1p8    47G   20G   25G   45% /home

INFO:master比node01要存储空间小很多,恐怕只有master中的/usr还勉强满足要求

查看ubuntu的位数

user@node01:~$ getconf LONG_BIT
	64

本篇总结

在这里插入图片描述
Kubeflow 要求:

  1. 集群中至少有一个工作节点。☑
  2. 每个节点要求至少4核心cpu,50GB的存储空间,以及12GB的内存;而本node01是18内核,456GB+存储空间和125GB的内存。远远大于最低要求。☑
  3. Kubernetes要求版本大于1.11,本集群为1.18版本。☑

1.2 查看版本匹配关系

Kubernetes和Kubeflow的版本对应关系

在这里插入图片描述
我的集群是1.18版本的,但是目前没有完全测试过的Kubeflow版本,毕竟也没法升级Kubernetes版本,还是决定选用目前最新版本1.2。因为看有别的朋友安装成功了,参考链接

1.3 下载必要文件

需要两个东西:kfctl以及 kfctl_k8s_istio.v1.2.0.yaml
下载kfctl
在这里插入图片描述
yaml文件地址
由于不知道怎么把这个文件下载下来,我直接复制粘贴了。

1.4 解压kfctl压缩包

user@node01:~/Kubeflow$ tar -xvf kfctl_v1.2.0-0-gbc038f9_linux.tar.gz
	./kfctl
user@node01:~/Kubeflow$ ls
	kfctl  kfctl_k8s_istio.v1.2.0.yaml  kfctl_v1.2.0-0-gbc038f9_linux.tar.gz

1.5 将 kfctl 移到 /usr/bin 目录中

这样就不用配置环境变量了

user@node01:~/Kubeflow$ sudo mv kfctl /usr/bin
	[sudo] user 的密码: 
user@node01:~/Kubeflow$ ls
	kfctl_k8s_istio.v1.2.0.yaml  kfctl_v1.2.0-0-gbc038f9_linux.tar.gz

1.6 查看kfctl

user@node01:~/Kubeflow$ which kfctl
	/usr/bin/kfctl
user@node01:~/Kubeflow$ ll /usr/bin/ | grep kfctl
	-rwxr-xr-x  1 user user     83424955 1121  2020 kfctl*

1.7 配置环境变量

增加最后三行:

export KF_NAME=<自己起一个Kubeflow应用名称> 
export BASE_DIR=<自己设一个根目录>
export KF_DIR=${BASE_DIR}/${KF_NAME}  # kubeflow应用存放路径

完整内容如下:

user@node01:~/Kubeflow$ sudo vi /etc/profile
	# /etc/profile: system-wide .profile file for the Bourne shell (sh(1))
	# and Bourne compatible shells (bash(1), ksh(1), ash(1), ...).
	
	if [ "${PS1-}" ]; then
	  if [ "${BASH-}" ] && [ "$BASH" != "/bin/sh" ]; then
	    # The file bash.bashrc already sets the default PS1.
	    # PS1='\h:\w\$ '
	    if [ -f /etc/bash.bashrc ]; then
	      . /etc/bash.bashrc
	    fi
	  else
	    if [ "`id -u`" -eq 0 ]; then
	      PS1='# '
	    else
	      PS1='$ '
	    fi
	  fi
	fi
	
	if [ -d /etc/profile.d ]; then
	  for i in /etc/profile.d/*.sh; do
	    if [ -r $i ]; then
	      . $i
	    fi
	  done
	  unset i
	fi
	export KF_NAME=Kubeflow1.2.0
	export BASE_DIR=~/Kubeflow
	export KF_DIR=${BASE_DIR}/${KF_NAME}  # kubeflow应用存放路径

刷新环境变量使之生效:

user@node01:~/Kubeflow$ source /etc/profile

1.8 创建${KF_DIR}目录

user@node01:~/Kubeflow$ mkdir -p ${KF_DIR}

执行完可看到在指定路径创建了该工作目录:
在这里插入图片描述

1.9 安装部署

# 移动kfctl_k8s_istio.v1.2.0.yaml文件到工作目录中
user@node01:~/Kubeflow$ mv kfctl_k8s_istio.v1.2.0.yaml Kubeflow1.2.0/
user@node01:~/Kubeflow$ ls
	kfctl_v1.2.0-0-gbc038f9_linux.tar.gz  Kubeflow1.2.0

# 进入工作目录
user@node01:~/Kubeflow$ cd Kubeflow1.2.0/
user@node01:~/Kubeflow/Kubeflow1.2.0$ ls
	kfctl_k8s_istio.v1.2.0.yaml
# 执行安装,报错
user@node01:~/Kubeflow/Kubeflow1.2.0$ kfctl apply -V -f kfctl_k8s_istio.v1.2.0.yaml
	INFO[0000] No name specified in KfDef.Metadata.Name; defaulting to Kubeflow1.2.0 based on location of config file: kfctl_k8s_istio.v1.2.0.yaml.  filename="coordinator/coordinator.go:202"
	INFO[0000] 
	****************************************************************
	Notice anonymous usage reporting enabled using spartakus
	To disable it
	If you have already deployed it run the following commands:
	  cd $(pwd)
	  kubectl -n ${K8S_NAMESPACE} delete deploy -l app=spartakus
	
	For more info: https://www.kubeflow.org/docs/other-guides/usage-reporting/
	****************************************************************
	  filename="coordinator/coordinator.go:120"
	INFO[0000] Creating directory .cache                     filename="kfconfig/types.go:450"
	INFO[0000] Fetching https://github.com/kubeflow/manifests/archive/v1.2.0.tar.gz to .cache/manifests  filename="kfconfig/types.go:498"
	INFO[0004] Updating localPath to .cache/manifests/manifests-1.2.0  filename="kfconfig/types.go:569"
	INFO[0004] Fetch succeeded; LocalPath .cache/manifests/manifests-1.2.0  filename="kfconfig/types.go:590"
	INFO[0004] Processing application: namespaces            filename="kustomize/kustomize.go:569"
	INFO[0004] Creating folder kustomize/namespaces          filename="kustomize/kustomize.go:667"
	INFO[0004] Processing application: application           filename="kustomize/kustomize.go:569"
	INFO[0004] Creating folder kustomize/application         filename="kustomize/kustomize.go:667"
	INFO[0004] Processing application: istio-stack           filename="kustomize/kustomize.go:569"
	INFO[0004] Creating folder kustomize/istio-stack         filename="kustomize/kustomize.go:667"
	INFO[0004] Processing application: cluster-local-gateway  filename="kustomize/kustomize.go:569"
	INFO[0004] Creating folder kustomize/cluster-local-gateway  filename="kustomize/kustomize.go:667"
	INFO[0004] Processing application: istio                 filename="kustomize/kustomize.go:569"
	INFO[0004] Creating folder kustomize/istio               filename="kustomize/kustomize.go:667"
	INFO[0004] Processing application: cert-manager-crds     filename="kustomize/kustomize.go:569"
	INFO[0004] Creating folder kustomize/cert-manager-crds   filename="kustomize/kustomize.go:667"
	INFO[0004] Processing application: cert-manager-kube-system-resources  filename="kustomize/kustomize.go:569"
	INFO[0004] Creating folder kustomize/cert-manager-kube-system-resources  filename="kustomize/kustomize.go:667"
	INFO[0004] Processing application: cert-manager          filename="kustomize/kustomize.go:569"
	INFO[0004] Creating folder kustomize/cert-manager        filename="kustomize/kustomize.go:667"
	INFO[0004] Processing application: add-anonymous-user-filter  filename="kustomize/kustomize.go:569"
	INFO[0004] Creating folder kustomize/add-anonymous-user-filter  filename="kustomize/kustomize.go:667"
	INFO[0004] Processing application: metacontroller        filename="kustomize/kustomize.go:569"
	INFO[0004] Creating folder kustomize/metacontroller      filename="kustomize/kustomize.go:667"
	INFO[0004] Processing application: bootstrap             filename="kustomize/kustomize.go:569"
	INFO[0004] Creating folder kustomize/bootstrap           filename="kustomize/kustomize.go:667"
	INFO[0004] Processing application: spark-operator        filename="kustomize/kustomize.go:569"
	INFO[0004] Creating folder kustomize/spark-operator      filename="kustomize/kustomize.go:667"
	INFO[0004] Processing application: kubeflow-apps         filename="kustomize/kustomize.go:569"
	INFO[0004] Creating folder kustomize/kubeflow-apps       filename="kustomize/kustomize.go:667"
	INFO[0004] Processing application: knative               filename="kustomize/kustomize.go:569"
	INFO[0004] Creating folder kustomize/knative             filename="kustomize/kustomize.go:667"
	INFO[0004] Processing application: kfserving             filename="kustomize/kustomize.go:569"
	INFO[0004] Creating folder kustomize/kfserving           filename="kustomize/kustomize.go:667"
	INFO[0004] Processing application: spartakus             filename="kustomize/kustomize.go:569"
	INFO[0004] Creating folder kustomize/spartakus           filename="kustomize/kustomize.go:667"
	INFO[0004] .cache/manifests exists; not resyncing        filename="kfconfig/types.go:473"
	INFO[0004] namespace: kubeflow                           filename="utils/k8utils.go:433"
	INFO[0004] Creating namespace: kubeflow                  filename="utils/k8utils.go:438"
	Error: failed to apply:  (kubeflow.error): Code 500 with message: kfApp Apply failed for kustomize:  (kubeflow.error): Code 400 with message: couldn't create namespace kubeflow Error: Post "http://localhost:8080/api/v1/namespaces": dial tcp [::1]:8080: connect: connection refused
	Usage:
	  kfctl apply -f ${CONFIG} [flags]
	
	Flags:
	      --context string   Optional kubernetes context to use when applying resources. Currently not used by KFDef resources.
	  -f, --file string      Static config file to use. Can be either a local path:
	                         		export CONFIG=./kfctl_gcp_iap.yaml
	                         	or a URL:
	                         		export CONFIG=https://raw.githubusercontent.com/kubeflow/manifests/v1.0-branch/kfdef/kfctl_gcp_iap.v1.0.0.yaml
	                         		export CONFIG=https://raw.githubusercontent.com/kubeflow/manifests/v1.2-branch/kfdef/kfctl_istio_dex.v1.2.0.yaml
	                         		export CONFIG=https://raw.githubusercontent.com/kubeflow/manifests/v1.2-branch/kfdef/kfctl_aws.v1.2.0.yaml
	                         		export CONFIG=https://raw.githubusercontent.com/kubeflow/manifests/v1.2-branch/kfdef/kfctl_k8s_istio.v1.2.0.yaml
	                         	kfctl apply -V --file=${CONFIG}
	  -h, --help             help for apply
	  -V, --verbose          verbose output default is false
	
	kfctl exited with error: failed to apply:  (kubeflow.error): Code 500 with message: kfApp Apply failed for kustomize:  (kubeflow.error): Code 400 with message: couldn't create namespace kubeflow Error: Post "http://localhost:8080/api/v1/namespaces": dial tcp [::1]:8080: connect: connection refused
user@node01:~/Kubeflow/Kubeflow1.2.0$ 

报错:无法创建Kubeflow命名空间。
出现的变化是:生成了一个kustomize的文件夹
在这里插入图片描述
猜想问题所在:应该把集群运行起来,前面基础要求就是集群中至少有一个工作节点,而我目前只是在node01单机上操作。

将集群运行起来重新执行后还是同样的问题。。

1.9.1 在master上安装Kubeflow(重要须知)

意识到可能得在master上部署,因为在node01上是在创建命名空间那一步出错的,而命名空间管理是master进行管理的。尝试~

root@master:/home/hqc/Kubeflow/Kubeflow1.2.0# kfctl apply -V -f kfctl_k8s_istio.v1.2.0.yaml
	...
	application.app.k8s.io/knative-serving-crds created
	application.app.k8s.io/knative-serving-install created
	gateway.networking.istio.io/cluster-local-gateway created
	horizontalpodautoscaler.autoscaling/activator created
	image.caching.internal.knative.dev/queue-proxy created
	servicerole.rbac.istio.io/istio-service-role created
	servicerolebinding.rbac.istio.io/istio-service-role-binding created
	INFO[0332] Successfully applied application knative      filename="kustomize/kustomize.go:291"
	INFO[0332] Deploying application kfserving               filename="kustomize/kustomize.go:266"
	secret/kfserving-webhook-server-secret created
	configmap/inferenceservice-config created
	customresourcedefinition.apiextensions.k8s.io/inferenceservices.serving.kubeflow.org created
	clusterrole.rbac.authorization.k8s.io/kubeflow-kfserving-edit created
	clusterrole.rbac.authorization.k8s.io/kfserving-manager-role created
	clusterrole.rbac.authorization.k8s.io/kfserving-proxy-role created
	clusterrole.rbac.authorization.k8s.io/kubeflow-kfserving-admin created
	clusterrole.rbac.authorization.k8s.io/kubeflow-kfserving-view created
	clusterrolebinding.rbac.authorization.k8s.io/kfserving-manager-rolebinding created
	clusterrolebinding.rbac.authorization.k8s.io/kfserving-proxy-rolebinding created
	role.rbac.authorization.k8s.io/leader-election-role created
	rolebinding.rbac.authorization.k8s.io/leader-election-rolebinding created
	service/kfserving-controller-manager-metrics-service created
	service/kfserving-controller-manager-service created
	service/kfserving-webhook-server-service created
	statefulset.apps/kfserving-controller-manager created
	mutatingwebhookconfiguration.admissionregistration.k8s.io/inferenceservice.serving.kubeflow.org created
	validatingwebhookconfiguration.admissionregistration.k8s.io/inferenceservice.serving.kubeflow.org created
	application.app.k8s.io/kfserving created
	certificate.cert-manager.io/serving-cert created
	issuer.cert-manager.io/selfsigned-issuer created
	INFO[0333] Successfully applied application kfserving    filename="kustomize/kustomize.go:291"
	INFO[0333] Deploying application spartakus               filename="kustomize/kustomize.go:266"
	configmap/spartakus-config created
	serviceaccount/spartakus created
	clusterrole.rbac.authorization.k8s.io/spartakus created
	clusterrolebinding.rbac.authorization.k8s.io/spartakus created
	deployment.apps/spartakus-volunteer created
	application.app.k8s.io/spartakus created
	INFO[0333] Successfully applied application spartakus    filename="kustomize/kustomize.go:291"
	INFO[0333] Applied the configuration Successfully!       filename="cmd/apply.go:75"
root@master:/home/hqc/Kubeflow/Kubeflow1.2.0# 

运行成功,但中间出现了很多报错:类似下图
在这里插入图片描述

1.10 查看组件状态

root@master:/home/hqc/Kubeflow/Kubeflow1.2.0# kubectl -n kubeflow get all
	NAME                                        READY   STATUS         RESTARTS   AGE
	pod/application-controller-stateful-set-0   0/1     ErrImagePull   0          6m38s
	
	NAME                                                   TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)             AGE
	service/admission-webhook-service                      ClusterIP   10.103.120.39    <none>        443/TCP             71s
	service/application-controller-service                 ClusterIP   10.111.227.112   <none>        443/TCP             6m38s
	service/argo-ui                                        NodePort    10.101.170.70    <none>        80:30807/TCP        71s
	service/cache-server                                   ClusterIP   10.105.186.146   <none>        443/TCP             71s
	service/centraldashboard                               ClusterIP   10.104.172.133   <none>        80/TCP              71s
	service/jupyter-web-app-service                        ClusterIP   10.101.207.115   <none>        80/TCP              71s
	service/katib-controller                               ClusterIP   10.99.87.123     <none>        443/TCP,8080/TCP    71s
	service/katib-db-manager                               ClusterIP   10.100.152.113   <none>        6789/TCP            71s
	service/katib-mysql                                    ClusterIP   10.104.225.100   <none>        3306/TCP            71s
	service/katib-ui                                       ClusterIP   10.100.126.162   <none>        80/TCP              71s
	service/kfserving-controller-manager-metrics-service   ClusterIP   10.108.136.216   <none>        8443/TCP            70s
	service/kfserving-controller-manager-service           ClusterIP   10.96.9.106      <none>        443/TCP             70s
	service/kfserving-webhook-server-service               ClusterIP   10.106.99.36     <none>        443/TCP             70s
	service/kubeflow-pipelines-profile-controller          ClusterIP   10.96.157.140    <none>        80/TCP              71s
	service/metadata-db                                    ClusterIP   10.107.111.240   <none>        3306/TCP            71s
	service/metadata-envoy-service                         ClusterIP   10.97.104.91     <none>        9090/TCP            71s
	service/metadata-grpc-service                          ClusterIP   10.105.204.174   <none>        8080/TCP            71s
	service/minio-service                                  ClusterIP   10.106.7.185     <none>        9000/TCP            71s
	service/ml-pipeline                                    ClusterIP   10.96.27.108     <none>        8888/TCP,8887/TCP   71s
	service/ml-pipeline-ui                                 ClusterIP   10.102.174.60    <none>        80/TCP              71s
	service/ml-pipeline-visualizationserver                ClusterIP   10.97.229.74     <none>        8888/TCP            71s
	service/mysql                                          ClusterIP   10.108.49.231    <none>        3306/TCP            71s
	service/notebook-controller-service                    ClusterIP   10.110.173.10    <none>        443/TCP             71s
	service/profiles-kfam                                  ClusterIP   10.97.143.27     <none>        8081/TCP            71s
	service/pytorch-operator                               ClusterIP   10.108.162.192   <none>        8443/TCP            71s
	service/seldon-webhook-service                         ClusterIP   10.97.17.96      <none>        443/TCP             71s
	service/tf-job-operator                                ClusterIP   10.98.163.53     <none>        8443/TCP            71s
	
	NAME                                                    READY   UP-TO-DATE   AVAILABLE   AGE
	deployment.apps/admission-webhook-deployment            0/1     0            0           71s
	deployment.apps/argo-ui                                 0/1     0            0           71s
	deployment.apps/cache-deployer-deployment               0/1     0            0           71s
	deployment.apps/cache-server                            0/1     0            0           71s
	deployment.apps/centraldashboard                        0/1     0            0           71s
	deployment.apps/jupyter-web-app-deployment              0/1     0            0           71s
	deployment.apps/katib-controller                        0/1     0            0           71s
	deployment.apps/katib-db-manager                        0/1     0            0           71s
	deployment.apps/katib-mysql                             0/1     0            0           71s
	deployment.apps/katib-ui                                0/1     0            0           71s
	deployment.apps/kubeflow-pipelines-profile-controller   0/1     0            0           71s
	deployment.apps/metadata-db                             0/1     0            0           71s
	deployment.apps/metadata-envoy-deployment               0/1     0            0           71s
	deployment.apps/metadata-grpc-deployment                0/1     0            0           71s
	deployment.apps/metadata-writer                         0/1     0            0           71s
	deployment.apps/minio                                   0/1     0            0           71s
	deployment.apps/ml-pipeline                             0/1     0            0           71s
	deployment.apps/ml-pipeline-persistenceagent            0/1     0            0           71s
	deployment.apps/ml-pipeline-scheduledworkflow           0/1     0            0           71s
	deployment.apps/ml-pipeline-ui                          0/1     0            0           71s
	deployment.apps/ml-pipeline-viewer-crd                  0/1     0            0           71s
	deployment.apps/ml-pipeline-visualizationserver         0/1     0            0           71s
	deployment.apps/mpi-operator                            0/1     0            0           71s
	deployment.apps/mxnet-operator                          0/1     0            0           71s
	deployment.apps/mysql                                   0/1     0            0           71s
	deployment.apps/notebook-controller-deployment          0/1     0            0           71s
	deployment.apps/profiles-deployment                     0/1     0            0           71s
	deployment.apps/pytorch-operator                        0/1     0            0           71s
	deployment.apps/seldon-controller-manager               0/1     0            0           71s
	deployment.apps/spark-operatorsparkoperator             0/1     0            0           73s
	deployment.apps/spartakus-volunteer                     0/1     0            0           69s
	deployment.apps/tf-job-operator                         0/1     0            0           71s
	deployment.apps/workflow-controller                     0/1     0            0           71s
	
	NAME                                                               DESIRED   CURRENT   READY   AGE
	replicaset.apps/admission-webhook-deployment-5d9ccb5696            1         0         0       71s
	replicaset.apps/argo-ui-684bcb587f                                 1         0         0       71s
	replicaset.apps/cache-deployer-deployment-6667847478               1         0         0       71s
	replicaset.apps/cache-server-bd9c859db                             1         0         0       67s
	replicaset.apps/centraldashboard-895c4c768                         1         0         0       71s
	replicaset.apps/jupyter-web-app-deployment-6588c6f544              1         0         0       71s
	replicaset.apps/katib-controller-75c8d47f8c                        1         0         0       71s
	replicaset.apps/katib-db-manager-6c88c68d79                        1         0         0       71s
	replicaset.apps/katib-mysql-858f68f588                             1         0         0       69s
	replicaset.apps/katib-ui-68f59498d4                                1         0         0       71s
	replicaset.apps/kubeflow-pipelines-profile-controller-69c94df75b   1         0         0       71s
	replicaset.apps/metadata-db-757dc9c7b5                             1         0         0       71s
	replicaset.apps/metadata-envoy-deployment-6ff58757f6               1         0         0       71s
	replicaset.apps/metadata-grpc-deployment-76d69f69c8                1         0         0       71s
	replicaset.apps/metadata-writer-6d94ffb7df                         1         0         0       70s
	replicaset.apps/minio-66c9cd74c9                                   1         0         0       70s
	replicaset.apps/ml-pipeline-54989c9946                             1         0         0       70s
	replicaset.apps/ml-pipeline-persistenceagent-7f6bf7646             1         0         0       70s
	replicaset.apps/ml-pipeline-scheduledworkflow-66db7bcf5d           1         0         0       70s
	replicaset.apps/ml-pipeline-ui-756b58fb                            1         0         0       67s
	replicaset.apps/ml-pipeline-viewer-crd-58f59f87db                  1         0         0       69s
	replicaset.apps/ml-pipeline-visualizationserver-6f9ff4974          1         0         0       69s
	replicaset.apps/mpi-operator-77bb5d8f4b                            1         0         0       69s
	replicaset.apps/mxnet-operator-68b688bb69                          1         0         0       69s
	replicaset.apps/mysql-7694c6b8b7                                   1         0         0       68s
	replicaset.apps/notebook-controller-deployment-58447d4b4c          1         0         0       68s
	replicaset.apps/profiles-deployment-78d4549cbc                     1         0         0       68s
	replicaset.apps/pytorch-operator-b79799447                         1         0         0       68s
	replicaset.apps/seldon-controller-manager-5fc5dfc86c               1         0         0       68s
	replicaset.apps/spark-operatorsparkoperator-67c6bc65fb             1         0         0       73s
	replicaset.apps/spartakus-volunteer-6ddc7b6676                     1         0         0       65s
	replicaset.apps/tf-job-operator-5c97f4bf7                          1         0         0       67s
	replicaset.apps/workflow-controller-5c7cc7976d                     1         0         0       67s
	
	NAME                                                        READY   AGE
	statefulset.apps/admission-webhook-bootstrap-stateful-set   0/1     73s
	statefulset.apps/application-controller-stateful-set        0/1     6m38s
	statefulset.apps/kfserving-controller-manager               0/1     70s
	statefulset.apps/metacontroller                             0/1     73s

发现pod镜像拉取失败,并且各deployment和apps都没有ready,通常来讲是外网问题,但我配置了外网呀,这是为啥呢?

root@master:/home/hqc/Kubeflow/Kubeflow1.2.0# kubectl get pod --all-namespaces
	NAMESPACE         NAME                                                         READY   STATUS              RESTARTS   AGE
	cert-manager      cert-manager-59b485c4cc-zmj68                                1/1     Running             0          38m
	cert-manager      cert-manager-cainjector-5bb487bcd-t8hsg                      1/1     Running             0          38m
	cert-manager      cert-manager-webhook-74b4bd9bcc-vxnsb                        1/1     Running             0          38m
	default           federated-deployment-655454d67c-kw4nd                        0/1     CrashLoopBackOff    1491       39d
	default           federated-deployment-655454d67c-q54hk                        0/1     CrashLoopBackOff    1492       39d
	istio-system      cluster-local-gateway-84bb595449-pjwjm                       0/1     Running             0          38m
	istio-system      istio-citadel-7f66ddfcfb-zmdwm                               0/1     ImagePullBackOff    0          38m
	istio-system      istio-galley-7976dd55cd-cspbv                                0/1     ContainerCreating   0          38m
	istio-system      istio-ingressgateway-c79f9f6f-cs8f4                          0/1     ContainerCreating   0          38m
	istio-system      istio-nodeagent-4ntfm                                        0/1     ImagePullBackOff    0          38m
	istio-system      istio-nodeagent-hpnpc                                        0/1     ImagePullBackOff    0          27m
	istio-system      istio-pilot-7bd96d69d9-xmt4f                                 0/2     ContainerCreating   0          38m
	istio-system      istio-policy-66b5d9887c-ltgcw                                0/2     ContainerCreating   0          38m
	istio-system      istio-security-post-install-release-1.3-latest-daily-ghgqk   0/1     ImagePullBackOff    0          38m
	istio-system      istio-sidecar-injector-56b6997f7d-jq5df                      0/1     ContainerCreating   0          38m
	istio-system      istio-telemetry-856f7bcff4-475l7                             0/2     ContainerCreating   0          38m
	istio-system      prometheus-65fdcbc857-d2hhs                                  0/1     ContainerCreating   0          38m
	knative-serving   activator-789bcb5644-txkqz                                   0/1     ImagePullBackOff    0          32m
	knative-serving   autoscaler-5888bf7697-bprrd                                  0/1     ImagePullBackOff    0          32m
	knative-serving   controller-7f646849cd-nfrtz                                  0/1     ImagePullBackOff    0          32m
	knative-serving   istio-webhook-7db84bf7bf-l62c9                               0/1     ImagePullBackOff    0          32m
	knative-serving   networking-istio-55d86868c6-8shwd                            0/1     ImagePullBackOff    0          32m
	knative-serving   webhook-579f9448c4-9pcw4                                     0/1     ImagePullBackOff    0          32m
	kube-system       coredns-66bff467f8-p8txx                                     1/1     Running             20         40d
	kube-system       coredns-66bff467f8-qqrn9                                     1/1     Running             20         40d
	kube-system       etcd-master                                                  1/1     Running             4          40d
	kube-system       kube-apiserver-master                                        1/1     Running             10337      40d
	kube-system       kube-controller-manager-master                               1/1     Running             23         40d
	kube-system       kube-flannel-ds-8gb4m                                        1/1     Running             21         40d
	kube-system       kube-flannel-ds-tpnlj                                        1/1     Running             11         40d
	kube-system       kube-proxy-vrcts                                             1/1     Running             19         40d
	kube-system       kube-proxy-w8sv8                                             1/1     Running             4          40d
	kube-system       kube-scheduler-master                                        1/1     Running             24         40d
	kubeflow          application-controller-stateful-set-0                        0/1     ImagePullBackOff    0          38m

和kubeflow相关的新创建的pod都没有成功。。

2 Install Kubeflow with Aliyun-local

2.1 先删除之前部署的Kubeflow1.2.0组件

kfctl delete -V -f kfctl_k8s_istio.v1.2.0.yaml

需要半个小时以上才能删完。
并且会出现删除不干净的报错:

Error: couldn't delete KfApp:  (kubeflow.error): Code 500 with message: kfApp Delete failed for kustomize:  (kubeflow.error): Code 500 with message: error deleting kustomize manifests: [error evaluating kustomization manifest for knative: Timed out waiting for resource /knative-serving to be deleted. Error deleted resource is not cleaned up yet, error evaluating kustomization manifest for cert-manager: Timed out waiting for resource /cert-manager to be deleted. Error deleted resource is not cleaned up yet, error evaluating kustomization manifest for cluster-local-gateway: Timed out waiting for resource /istio-system to be deleted. Error deleted resource is not cleaned up yet, error evaluating kustomization manifest for istio-stack: Timed out waiting for resource /istio-system to be deleted. Error deleted resource is not cleaned up yet, error evaluating kustomization manifest for namespaces: Timed out waiting for resource /cert-manager to be deleted. Error deleted resource is not cleaned up yet, error evaluating kustomization manifest for namespaces: Timed out waiting for resource /kubeflow to be deleted. Error deleted resource is not cleaned up yet]
Usage:
  kfctl delete [flags]

Flags:
      --delete_storage   Set if you want to delete app's storage cluster used for mlpipeline.
  -f, --file string      The local config file of KfDef.
      --force-deletion   force-deletion output default is false
  -h, --help             help for delete
  -V, --verbose          verbose output default is false

好像不用管,貌似是因为集群中的node01节点没有加入进来,而master安装kubeflow时把相关组件安装在node01上。
因为后面把node01运行起来之后,那些一直terminating的pod就被删除了。
如下:

root@master:/home/hqc/Kubeflow/Kubeflow1.2.0# kubectl get pod --all-namespaces
	NAMESPACE     NAME                             READY   STATUS    RESTARTS   AGE
	kube-system   coredns-66bff467f8-p8txx         1/1     Running   20         40d
	kube-system   coredns-66bff467f8-qqrn9         1/1     Running   20         40d
	kube-system   etcd-master                      1/1     Running   4          40d
	kube-system   kube-apiserver-master            1/1     Running   10533      40d
	kube-system   kube-controller-manager-master   1/1     Running   24         40d
	kube-system   kube-flannel-ds-8gb4m            1/1     Running   22         40d
	kube-system   kube-flannel-ds-tpnlj            1/1     Running   11         40d
	kube-system   kube-proxy-vrcts                 1/1     Running   20         40d
	kube-system   kube-proxy-w8sv8                 1/1     Running   4          40d
	kube-system   kube-scheduler-master            1/1     Running   25         40d
root@master:/home/hqc/Kubeflow/Kubeflow1.2.0# kubectl -n kubeflow get all
	No resources found in kubeflow namespace.

2.2 依据教程创建

阿里社区-一键本地安装Kubeflow

# 创建独立工作文件夹
root@master:/home/hqc/Kubeflow# mkdir Kubeflow1.3
root@master:/home/hqc/Kubeflow# cd Kubeflow1.3/

# 克隆源项目
root@master:/home/hqc/Kubeflow/Kubeflow1.3# git clone https://github.com/shikanon/kubeflow-manifests.git
	正克隆到 'kubeflow-manifests'...
	remote: Enumerating objects: 552, done.
	remote: Counting objects: 100% (552/552), done.
	remote: Compressing objects: 100% (358/358), done.
	remote: Total 552 (delta 201), reused 506 (delta 171), pack-reused 0
	接收对象中: 100% (552/552), 571.84 KiB | 316.00 KiB/s, 完成.
	处理 delta 中: 100% (201/201), 完成.
# 进入文件夹
root@master:/home/hqc/Kubeflow/Kubeflow1.3# cd kubeflow-manifests
# 执行一键部署程序,会根据yaml文件一个个安装
root@master:/home/hqc/Kubeflow/Kubeflow1.3/kubeflow-manifests# python install.py
	kubectl apply -f ./manifest1.3/001-cert-manager-cert-manager-kube-system-resources-base.yaml
	b'role.rbac.authorization.k8s.io/cert-manager-cainjector:leaderelection created\nrole.rbac.authorization.k8s.io/cert-manager:leaderelection created\nrolebinding.rbac.authorization.k8s.io/cert-manager-cainjector:leaderelection created\nrolebinding.rbac.authorization.k8s.io/cert-manager-webhook:webhook-authentication-reader created\nrolebinding.rbac.authorization.k8s.io/cert-manager:leaderelection created\nconfigmap/cert-manager-kube-params-parameters created\n'
	kubectl apply -f ./manifest1.3/002-cert-manager-cert-manager-crds-base.yaml
	b'customresourcedefinition.apiextensions.k8s.io/certificaterequests.cert-manager.io created\ncustomresourcedefinition.apiextensions.k8s.io/certificates.cert-manager.io created\ncustomresourcedefinition.apiextensions.k8s.io/challenges.acme.cert-manager.io created\ncustomresourcedefinition.apiextensions.k8s.io/clusterissuers.cert-manager.io created\ncustomresourcedefinition.apiextensions.k8s.io/issuers.cert-manager.io created\ncustomresourcedefinition.apiextensions.k8s.io/orders.acme.cert-manager.io created\n'
	kubectl apply -f ./manifest1.3/003-cert-manager-overlays-self-signed.yaml
	b'namespace/cert-manager created\nserviceaccount/cert-manager created\nserviceaccount/cert-manager-cainjector created\nserviceaccount/cert-manager-webhook created\nclusterrole.rbac.authorization.k8s.io/cert-manager-edit created\nclusterrole.rbac.authorization.k8s.io/cert-manager-view created\nclusterrole.rbac.authorization.k8s.io/cert-manager-webhook:webhook-requester created\nclusterrole.rbac.authorization.k8s.io/cert-manager-cainjector created\nclusterrole.rbac.authorization.k8s.io/cert-manager-controller-certificates created\nclusterrole.rbac.authorization.k8s.io/cert-manager-controller-challenges created\nclusterrole.rbac.authorization.k8s.io/cert-manager-controller-clusterissuers created\nclusterrole.rbac.authorization.k8s.io/cert-manager-controller-ingress-shim created\nclusterrole.rbac.authorization.k8s.io/cert-manager-controller-issuers created\nclusterrole.rbac.authorization.k8s.io/cert-manager-controller-orders created\nclusterrolebinding.rbac.authorization.k8s.io/cert-manager-cainjector created\nclusterrolebinding.rbac.authorization.k8s.io/cert-manager-controller-certificates created\nclusterrolebinding.rbac.authorization.k8s.io/cert-manager-controller-challenges created\nclusterrolebinding.rbac.authorization.k8s.io/cert-manager-controller-clusterissuers created\nclusterrolebinding.rbac.authorization.k8s.io/cert-manager-controller-ingress-shim created\nclusterrolebinding.rbac.authorization.k8s.io/cert-manager-controller-issuers created\nclusterrolebinding.rbac.authorization.k8s.io/cert-manager-controller-orders created\nclusterrolebinding.rbac.authorization.k8s.io/cert-manager-webhook:auth-delegator created\nconfigmap/cert-manager-parameters created\nservice/cert-manager created\nservice/cert-manager-webhook created\ndeployment.apps/cert-manager created\ndeployment.apps/cert-manager-cainjector created\ndeployment.apps/cert-manager-webhook created\napiservice.apiregistration.k8s.io/v1beta1.webhook.cert-manager.io created\nclusterissuer.cert-manager.io/kubeflow-self-signing-issuer created\nmutatingwebhookconfiguration.admissionregistration.k8s.io/cert-manager-webhook created\nvalidatingwebhookconfiguration.admissionregistration.k8s.io/cert-manager-webhook created\n'
	...

部署过程中完全没报错,太牛了!!!
最后面发现还是报错了的,报错没有颜色区别显示,导致眼花没看到

...
kubectl apply -f ./manifest1.3/017-pipeline-env-platform-agnostic-multi-user.yaml
error: unable to recognize "./manifest1.3/017-pipeline-env-platform-agnostic-multi-user.yaml": no matches for kind "CompositeController" in version "metacontroller.k8s.io/v1alpha1"
...
...
Error from server (NotFound): error when deleting "./patch/data.yaml": deployments.apps "minio" not found
b''
b'deployment.apps/minio created\n'
b'envoyfilter.networking.istio.io "authn-filter" deleted\n'
b'envoyfilter.networking.istio.io/authn-filter created\n'
Error from server (NotFound): error when deleting "./patch/istio-ingressgateway.yaml": deployments.apps "istio-ingressgateway" not found
b''
b'deployment.apps/istio-ingressgateway created\n'
Error from server (NotFound): error when deleting "./patch/istiod.yaml": deployments.apps "istiod" not found
b'configmap "istio-sidecar-injector" deleted\n'
b'deployment.apps/istiod created\nconfigmap/istio-sidecar-injector created\n'
b'deployment.apps "jupyter-web-app-deployment" deleted\n'
b'deployment.apps/jupyter-web-app-deployment created\n'
b'image.caching.internal.knative.dev "queue-proxy" deleted\nconfigmap "config-deployment" deleted\nconfigmap "inferenceservice-config" deleted\n'
b'image.caching.internal.knative.dev/queue-proxy created\nconfigmap/config-deployment created\nconfigmap/inferenceservice-config created\n'
Error from server (NotFound): error when deleting "./patch/pipeline-env-platform-agnostic-multi-user.yaml": configmaps "kubeflow-pipelines-profile-controller-code-c2cd68d9k4" not found
Error from server (NotFound): error when deleting "./patch/pipeline-env-platform-agnostic-multi-user.yaml": configmaps "pipeline-install-config" not found
Error from server (NotFound): error when deleting "./patch/pipeline-env-platform-agnostic-multi-user.yaml": deployments.apps "workflow-controller" not found
Error from server (NotFound): error when deleting "./patch/pipeline-env-platform-agnostic-multi-user.yaml": deployments.apps "kubeflow-pipelines-profile-controller" not found
b''
b'configmap/kubeflow-pipelines-profile-controller-code-c2cd68d9k4 created\nconfigmap/pipeline-install-config created\ndeployment.apps/workflow-controller created\ndeployment.apps/kubeflow-pipelines-profile-controller created\n'
b'deployment.apps "tensorboards-web-app-deployment" deleted\n'
b'deployment.apps/tensorboards-web-app-deployment created\n'
b'deployment.apps "volumes-web-app-deployment" deleted\n'
b'deployment.apps/volumes-web-app-deployment created\n'
Error from server (NotFound): error when deleting "./patch/workflow-controller.yaml": configmaps "workflow-controller-configmap" not found
Error from server (NotFound): error when deleting "./patch/workflow-controller.yaml": deployments.apps "cache-server" not found
b'deployment.apps "workflow-controller" deleted\n'
b'configmap/workflow-controller-configmap created\ndeployment.apps/workflow-controller created\ndeployment.apps/cache-server created\n'

如下图:
在这里插入图片描述主要是:当删除yaml文件时deployments.appsconfigmaps找不到的报错。

另外查看 发现目前还有这5个问题PendingNot ReadyContainerCreatingCrashLoopBackoffCreatContainerConfigError
在这里插入图片描述

2.3 问题排查

2.3.1 Pending

Pending问题根据之前的经验可能是没有创建pvpvc造成的,可从这方面入手。

一般来说,Pending是指挂起状态,表示创建的Pod找不到可以运行它的物理节点,不能调度到相应的节点上运行

# 查看pod细节
root@master:/home/hqc/Kubeflow/Kubeflow1.2.0# kubectl describe pods authservice-0 -n istio-system
	...
	Events:
	  Type     Reason            Age                 From               Message
	  ----     ------            ----                ----               -------
	  Warning  FailedScheduling  93s (x49 over 66m)  default-scheduler  running "VolumeBinding" filter plugin for pod "authservice-0": pod has unbound immediate PersistentVolumeClaims
# 果然和持久化挂载卷有关

# 查看日志
root@master:/home/hqc/Kubeflow/Kubeflow1.2.0# kubectl logs authservice-0 -n istio-system
# 无日志输出,不知道为啥

2.3.2 Not Ready

root@master:/home/hqc/Kubeflow/Kubeflow1.2.0# kubectl describe pods cluster-local-gateway-d8688cfdd-m4znc -n istio-system
	...
	Events:
	  Type     Reason     Age                     From             Message
	  ----     ------     ----                    ----             -------
	  Warning  Unhealthy  78s (x4588 over 4h53m)  kubelet, node01  Readiness probe failed: HTTP probe failed with statuscode: 503
# 这是什么原因,暂不知晓

2.3.3 ContainerCreating

重新运行一下之后新出现了一个pod一直在ContainerCreating,查看:

root@master:/home/hqc/Kubeflow/Kubeflow1.2.0# kubectl describe pods cluster-local-gateway-54568d47c5-2jk7s -n istio-system
	...
	Events:
	  Type     Reason       Age                   From               Message
	  ----     ------       ----                  ----               -------
	  Normal   Scheduled    60m                   default-scheduler  Successfully assigned istio-system/cluster-local-gateway-54568d47c5-2jk7s to node01
	  Warning  FailedMount  58m                   kubelet, node01    Unable to attach or mount volumes: unmounted volumes=[istio-token], unattached volumes=[config-volume podinfo cluster-local-gateway-service-account-token-6f4dv istio-envoy ingressgateway-ca-certs istio-token istio-data ingressgateway-certs istiod-ca-cert]: timed out waiting for the condition
	  Warning  FailedMount  55m                   kubelet, node01    Unable to attach or mount volumes: unmounted volumes=[istio-token], unattached volumes=[ingressgateway-ca-certs cluster-local-gateway-service-account-token-6f4dv istio-envoy istiod-ca-cert istio-token ingressgateway-certs config-volume istio-data podinfo]: timed out waiting for the condition
	  Warning  FailedMount  53m                   kubelet, node01    Unable to attach or mount volumes: unmounted volumes=[istio-token], unattached volumes=[istio-envoy istio-token podinfo cluster-local-gateway-service-account-token-6f4dv ingressgateway-certs ingressgateway-ca-certs config-volume istiod-ca-cert istio-data]: timed out waiting for the condition
	  Warning  FailedMount  51m                   kubelet, node01    Unable to attach or mount volumes: unmounted volumes=[istio-token], unattached volumes=[ingressgateway-certs cluster-local-gateway-service-account-token-6f4dv config-volume istio-token istio-data istio-envoy podinfo ingressgateway-ca-certs istiod-ca-cert]: timed out waiting for the condition
	  Warning  FailedMount  49m                   kubelet, node01    Unable to attach or mount volumes: unmounted volumes=[istio-token], unattached volumes=[istio-data podinfo ingressgateway-certs istiod-ca-cert ingressgateway-ca-certs cluster-local-gateway-service-account-token-6f4dv config-volume istio-token istio-envoy]: timed out waiting for the condition
	  Warning  FailedMount  46m                   kubelet, node01    Unable to attach or mount volumes: unmounted volumes=[istio-token], unattached volumes=[config-volume istio-data istiod-ca-cert istio-token ingressgateway-certs ingressgateway-ca-certs istio-envoy podinfo cluster-local-gateway-service-account-token-6f4dv]: timed out waiting for the condition
	  Warning  FailedMount  44m                   kubelet, node01    Unable to attach or mount volumes: unmounted volumes=[istio-token], unattached volumes=[istiod-ca-cert istio-envoy cluster-local-gateway-service-account-token-6f4dv config-volume istio-token istio-data podinfo ingressgateway-certs ingressgateway-ca-certs]: timed out waiting for the condition
	  Warning  FailedMount  42m                   kubelet, node01    Unable to attach or mount volumes: unmounted volumes=[istio-token], unattached volumes=[istio-envoy istio-token cluster-local-gateway-service-account-token-6f4dv istiod-ca-cert config-volume ingressgateway-certs istio-data podinfo ingressgateway-ca-certs]: timed out waiting for the condition
	  Warning  FailedMount  9m12s (x19 over 40m)  kubelet, node01    (combined from similar events): MountVolume.SetUp failed for volume "istio-token" : failed to fetch token: the API server does not have TokenRequest endpoints enabled
	  Warning  FailedMount  5m8s (x29 over 60m)   kubelet, node01    MountVolume.SetUp failed for volume "istio-token" : failed to fetch token: the API server does not have TokenRequest endpoints enabled
# 也和挂载卷相关

与之前对比发现,多了一个一样的pod,觉得是重新安装时产生的问题,因此删掉:

root@master:/home/hqc/Kubeflow/Kubeflow1.2.0# kubectl get deployment -n istio-system
	NAME                    READY   UP-TO-DATE   AVAILABLE   AGE
	cluster-local-gateway   0/1     1            0           5h25m
	istio-ingressgateway    0/1     1            0           5h25m
	istiod                  1/1     1            1           5h25m
root@master:/home/hqc/Kubeflow/Kubeflow1.2.0# kubectl delete deployment cluster-local-gateway -n istio-system
	deployment.apps "cluster-local-gateway" deleted
root@master:/home/hqc/Kubeflow/Kubeflow1.2.0# kubectl get deployment -n istio-system
	NAME                   READY   UP-TO-DATE   AVAILABLE   AGE
	istio-ingressgateway   0/1     1            0           5h25m
	istiod                 1/1     1            1           5h25m
root@master:/home/hqc/Kubeflow/Kubeflow1.2.0# kubectl get pod --all-namespaces
	NAMESPACE          NAME                                                         READY   STATUS                       RESTARTS   AGE
	cert-manager       cert-manager-cainjector-846b7c9f8c-7vmxf                     1/1     Running                      33         5h31m
	cert-manager       cert-manager-fbc979d45-rws9g                                 1/1     Running                      1          5h31m
	cert-manager       cert-manager-webhook-67956cb44b-hz6c4                        1/1     Running                      0          5h31m
	istio-system       authservice-0                                                0/1     Pending                      0          5h30m
	istio-system       istio-ingressgateway-84f6567479-4z9q4                        0/1     Running                      0          5h25m
	istio-system       istiod-5d6d848d84-8fwg8                                      1/1     Running                      0          5h25m
	knative-eventing   broker-controller-d675f7d9f-hb6bg                            1/1     Running                      0          5h29m

2.3.4 CrashLoopBackoff

root@master:/home/hqc/Kubeflow/Kubeflow1.2.0# kubectl describe pods katib-db-manager-755464ffcf-f4wl8 -n kubeflow
	...
	Events:
	  Type     Reason     Age                      From             Message
	  ----     ------     ----                     ----             -------
	  Warning  Unhealthy  9m24s (x184 over 5h10m)  kubelet, node01  Readiness probe failed: timeout: failed to connect service ":6789" within 1s
	  Warning  BackOff    4m24s (x630 over 5h8m)   kubelet, node01  Back-off restarting failed container

2.3.5 CreatContainerConfigError

root@master:/home/hqc/Kubeflow/Kubeflow1.2.0# kubectl describe pods kubeflow-pipelines-profile-controller-65c8c9dc9c-2g6pm -n kubeflow
	...
	Events:
	  Type    Reason  Age                     From             Message
	  ----    ------  ----                    ----             -------
	  Normal  Pulled  3m54s (x749 over 5h5m)  kubelet, node01  Container image "registry.cn-shenzhen.aliyuncs.com/tensorbytes/python:3.7-3a781" already present on machine
# 想不明白,为啥容器镜像已经存在机器上还更出现这个情况

root@master:/home/hqc/Kubeflow/Kubeflow1.2.0# kubectl describe pods minio-6f4c68d54f-q7mnl -n kubeflow
	...
	Events:
	  Type    Reason  Age                    From             Message
	  ----    ------  ----                   ----             -------
	  Normal  Pulled  80s (x784 over 5h10m)  kubelet, node01  Container image "registry.cn-shenzhen.aliyuncs.com/tensorbytes/ml-pipeline-minio:RELEASE.2019-08-14T20-37-41Z-license-compliance-290a7" already present on machine

2.3.6 问题总结

主要出现问题的是这几个pod:authservice-0cluster-local-gatewayistio-ingressgatewaykatib-db-managerkatib-mysqlkubeflow-pipelines-profile-controllerminio

# 查看相关组件
root@master:/home/hqc/Kubeflow/Kubeflow1.2.0# kubectl -n kubeflow get all

还需格外注意以下几个组件:deployment.apps/cache-serverdeployment.apps/katib-db-managerdeployment.apps/katib-mysqldeployment.apps/kubeflow-pipelines-profile-controllerdeployment.apps/miniodeployment.apps/workflow-controllerreplicaset.apps/cache-serverreplicaset.apps/katib-db-managerreplicaset.apps/katib-mysqlreplicaset.apps/kubeflow-pipelines-profile-controllerreplicaset.apps/minioreplicaset.apps/workflow-controller-7b8f56f6c

2.3.7 解决部分问题(可以登录UI,但还有部分组件异常)

要想可以登录UI,至少要保证istio-systemknative-eventing命名空间全部running。
之前情况是:istio-system|authservice-0kubeflow|katib-mysql组件处于Pending状态,这一般是和没有创建pvpvc有关。

# 针对那两个组件创建文件夹
root@master:/home/hqc/Kubeflow/Kubeflow1.3# mkdir pv1
root@master:/home/hqc/Kubeflow/Kubeflow1.3# mkdir pv2

# 创建pv.yaml文件
root@master:/home/hqc/Kubeflow/Kubeflow1.3# vim pv.yaml
	apiVersion: v1
	kind: PersistentVolume
	metadata:
	  name: pv-authservice
	spec:
	  capacity:
	    storage: 25Gi
	  accessModes:
	    - ReadWriteOnce
	  hostPath:
	    path: "/home/hqc/kubeflow/Kubeflow1.3/pv1"
	
	---
	apiVersion: v1
	kind: PersistentVolume
	metadata:
	  name: pv-katib-mysql
	spec:
	  capacity:
	    storage: 25Gi
	  accessModes:
	    - ReadWriteOnce
	  hostPath:
	    path: "/home/hqc/kubeflow/Kubeflow1.3/pv2"

# 部署
root@master:/home/hqc/Kubeflow/Kubeflow1.3# kubectl apply -f pv.yaml 
	persistentvolume/pv-authservice created
	persistentvolume/pv-katib-mysql created

# 查看
root@master:/home/hqc/Kubeflow/Kubeflow1.3# kubectl get pvc --all-namespaces -o wide
	NAMESPACE      NAME              STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE   VOLUMEMODE
	istio-system   authservice-pvc   Bound    pvc-c3484c8f-169c-41fb-80bf-5e58935969fa   10Gi       RWO            local-path     40m   Filesystem
	kubeflow       katib-mysql       Bound    pv-katib-mysql                             25Gi       RWO                           16h   Filesystem

在这里插入图片描述running!

2.3.8 登录

# 查看
root@master:/home/hqc/Kubeflow/Kubeflow1.3# kubectl get svc/istio-ingressgateway -n istio-system
	NAME                   TYPE       CLUSTER-IP     EXTERNAL-IP   PORT(S)                                                                      AGE
	istio-ingressgateway   NodePort   10.108.100.5   <none>        15021:31201/TCP,80:30000/TCP,443:32407/TCP,31400:30740/TCP,15443:31863/TCP   16h

看到第二个80:30000/TCP可知端口号是30000,因此可访问localhost:30000进入登录界面。
在这里插入图片描述实验过后发现,k8s集群中任意节点的IP都可以访问:
在这里插入图片描述

输入账号密码即可登录,这里的账号密码可以通过patch/auth.yaml进行更改。 默认的用户名是admin@example.com,密码是password

登录后进入kubeflow界面:
在这里插入图片描述

2.4 重新部署

由于之前部署操作有点混乱,虽然可以登录了,但遗留了不少异常问题:
在这里插入图片描述为了不影响后续的部署,出问题了更麻烦,因此决定重新部署~

2.4.1 删除所有相关组件

## database-patch
root@master:/home/hqc/Kubeflow/Kubeflow1.3/kubeflow-manifests# kubectl delete -f database-patch/mysql-persistent-storage.yaml 
	deployment.apps "mysql" deleted

## local-path
root@master:/home/hqc/Kubeflow/Kubeflow1.3/kubeflow-manifests# kubectl delete -f local-path/local-path-storage.yaml 
	namespace "local-path-storage" deleted
	serviceaccount "local-path-provisioner-service-account" deleted
	clusterrole.rbac.authorization.k8s.io "local-path-provisioner-role" deleted
	clusterrolebinding.rbac.authorization.k8s.io "local-path-provisioner-bind" deleted
	deployment.apps "local-path-provisioner" deleted
	storageclass.storage.k8s.io "local-path" deleted
	configmap "local-path-config" deleted

## manifest1.3
root@master:/home/hqc/Kubeflow/Kubeflow1.3/kubeflow-manifests# kubectl delete -f manifest1.3/001-cert-manager-cert-manager-kube-system-resources-base.yaml 
	role.rbac.authorization.k8s.io "cert-manager-cainjector:leaderelection" deleted
	role.rbac.authorization.k8s.io "cert-manager:leaderelection" deleted
	rolebinding.rbac.authorization.k8s.io "cert-manager-cainjector:leaderelection" deleted
	rolebinding.rbac.authorization.k8s.io "cert-manager-webhook:webhook-authentication-reader" deleted
	rolebinding.rbac.authorization.k8s.io "cert-manager:leaderelection" deleted
	configmap "cert-manager-kube-params-parameters" deleted
...

## patch
root@master:/home/hqc/Kubeflow/Kubeflow1.3/kubeflow-manifests# kubectl delete -f patch/auth.yaml 
	profile.kubeflow.org "kubeflow-user-example-com" deleted
	Error from server (NotFound): error when deleting "patch/auth.yaml": configmaps "dex" not found
	Error from server (NotFound): error when deleting "patch/auth.yaml": deployments.apps "dex" not found
	Error from server (NotFound): error when deleting "patch/auth.yaml": configmaps "default-install-config-9h2h2b6hbk" not found
...

## pv.yaml
root@master:/home/hqc/Kubeflow/Kubeflow1.3# kubectl delete -f pv.yaml 
	persistentvolume "pv-authservice" deleted
	persistentvolume "pv-katib-mysql" deleted

2.4.2 创建pv

root@master:/home/hqc/Kubeflow/Kubeflow1.3# kubectl apply -f pv.yaml
	persistentvolume/pv-authservice created
	persistentvolume/pv-katib-mysql created
# 查看pv
root@master:/home/hqc/Kubeflow/Kubeflow1.3# kubectl get pv
	NAME             CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS      CLAIM   STORAGECLASS   REASON   AGE
	pv-authservice   25Gi       RWO            Retain           Available                                   3m19s
	pv-katib-mysql   25Gi       RWO            Retain           Available                                   3m19s

2.4.3 也和挂载卷有关(好像是设置StorageClass)

root@master:/home/hqc/Kubeflow/Kubeflow1.3/kubeflow-manifests# kubectl apply -f local-path/local-path-storage.yaml 
	namespace/local-path-storage created
	serviceaccount/local-path-provisioner-service-account created
	clusterrole.rbac.authorization.k8s.io/local-path-provisioner-role created
	clusterrolebinding.rbac.authorization.k8s.io/local-path-provisioner-bind created
	deployment.apps/local-path-provisioner created
	storageclass.storage.k8s.io/local-path created
	configmap/local-path-config created

2.4.4 部署manifest1.3

root@master:/home/hqc/Kubeflow/Kubeflow1.3/kubeflow-manifests# kubectl apply -f manifest1.3/001-cert-manager-cert-manager-kube-system-resources-base.yaml 
	role.rbac.authorization.k8s.io/cert-manager-cainjector:leaderelection created
	role.rbac.authorization.k8s.io/cert-manager:leaderelection created
	rolebinding.rbac.authorization.k8s.io/cert-manager-cainjector:leaderelection created
	rolebinding.rbac.authorization.k8s.io/cert-manager-webhook:webhook-authentication-reader created
	rolebinding.rbac.authorization.k8s.io/cert-manager:leaderelection created
	configmap/cert-manager-kube-params-parameters created
......
# 第17个文件仍然出错,和pipeline相关
root@master:/home/hqc/Kubeflow/Kubeflow1.3/kubeflow-manifests# kubectl apply -f manifest1.3/017-pipeline-env-platform-agnostic-multi-user.yaml 
	error: unable to recognize "manifest1.3/017-pipeline-env-platform-agnostic-multi-user.yaml": no matches for kind "CompositeController" in version "metacontroller.k8s.io/v1alpha1"
......

相同报错问题
执行到这步目前是这个样子:
在这里插入图片描述

2.4.5 打补丁(patch)

因为一些patch安装涉及到的一些修改需要重启pod,所以需要先删除再安装

# 删除
root@master:/home/hqc/Kubeflow/Kubeflow1.3/kubeflow-manifests# kubectl delete -f patch/auth.yaml 
	configmap "dex" deleted
	deployment.apps "dex" deleted
	configmap "default-install-config-9h2h2b6hbk" deleted
	profile.kubeflow.org "kubeflow-user-example-com" deleted
# 安装
root@master:/home/hqc/Kubeflow/Kubeflow1.3/kubeflow-manifests# kubectl apply -f patch/auth.yaml 
	configmap/dex created
	deployment.apps/dex created
	configmap/default-install-config-9h2h2b6hbk created
	profile.kubeflow.org/kubeflow-user-example-com unchanged

# 删除
kubectl delete -f patch/cluster-local-gateway.yaml 
	deployment.apps "cluster-local-gateway" deleted
# 安装
kubectl apply -f patch/cluster-local-gateway.yaml 
	deployment.apps/cluster-local-gateway created

# 删除
root@master:/home/hqc/Kubeflow/Kubeflow1.3/kubeflow-manifests# kubectl delete -f patch/data.yaml 
	Error from server (NotFound): error when deleting "patch/data.yaml": deployments.apps "minio" not found
# 之前就没有
# 安装
root@master:/home/hqc/Kubeflow/Kubeflow1.3/kubeflow-manifests# kubectl apply -f patch/data.yaml 
	deployment.apps/minio created
	
# 删除
root@master:/home/hqc/Kubeflow/Kubeflow1.3/kubeflow-manifests# kubectl delete -f patch/envoy-filter.yaml 
	envoyfilter.networking.istio.io "authn-filter" deleted
# 安装
root@master:/home/hqc/Kubeflow/Kubeflow1.3/kubeflow-manifests# kubectl apply -f patch/envoy-filter.yaml 
	envoyfilter.networking.istio.io/authn-filter created
	
# 删除
root@master:/home/hqc/Kubeflow/Kubeflow1.3/kubeflow-manifests# kubectl delete -f patch/istiod.yaml 
	deployment.apps "istiod" deleted
	configmap "istio-sidecar-injector" deleted
# 安装
root@master:/home/hqc/Kubeflow/Kubeflow1.3/kubeflow-manifests# kubectl apply -f patch/istiod.yaml
	deployment.apps/istiod created
	configmap/istio-sidecar-injector created

执行到此处,istiod组件变成running状态
在这里插入图片描述继续~

# 删除
root@master:/home/hqc/Kubeflow/Kubeflow1.3/kubeflow-manifests# kubectl delete -f patch/istio-ingressgateway.yaml 
	deployment.apps "istio-ingressgateway" deleted
# 安装
root@master:/home/hqc/Kubeflow/Kubeflow1.3/kubeflow-manifests# kubectl apply -f patch/istio-ingressgateway.yaml 
	deployment.apps/istio-ingressgateway created

执行到此处,cluster-local-gatewayistio-ingressgateway组件都变成running状态
而且此时可以访问登录界面,但看不到完整UI。
在这里插入图片描述
继续~

# 删除
root@master:/home/hqc/Kubeflow/Kubeflow1.3/kubeflow-manifests# kubectl delete -f patch/jupyter-web-app.yaml 
	deployment.apps "jupyter-web-app-deployment" deleted
# 安装
root@master:/home/hqc/Kubeflow/Kubeflow1.3/kubeflow-manifests# kubectl apply -f patch/jupyter-web-app.yaml 
	deployment.apps/jupyter-web-app-deployment created

root@master:/home/hqc/Kubeflow/Kubeflow1.3/kubeflow-manifests# kubectl delete -f patch/kfserving.yaml 
	image.caching.internal.knative.dev "queue-proxy" deleted
	configmap "config-deployment" deleted
	configmap "inferenceservice-config" deleted
root@master:/home/hqc/Kubeflow/Kubeflow1.3/kubeflow-manifests# kubectl apply -f patch/kfserving.yaml 
	image.caching.internal.knative.dev/queue-proxy created
	configmap/config-deployment created
	configmap/inferenceservice-config created

root@master:/home/hqc/Kubeflow/Kubeflow1.3/kubeflow-manifests#kubectl delete -f patch/pipeline-env-platform-agnostic-multi-user.yaml 
	Error from server (NotFound): error when deleting "patch/pipeline-env-platform-agnostic-multi-user.yaml": configmaps "kubeflow-pipelines-profile-controller-code-c2cd68d9k4" not found
	Error from server (NotFound): error when deleting "patch/pipeline-env-platform-agnostic-multi-user.yaml": configmaps "pipeline-install-config" not found
	Error from server (NotFound): error when deleting "patch/pipeline-env-platform-agnostic-multi-user.yaml": deployments.apps "workflow-controller" not found
	Error from server (NotFound): error when deleting "patch/pipeline-env-platform-agnostic-multi-user.yaml": deployments.apps "kubeflow-pipelines-profile-controller" not found
root@master:/home/hqc/Kubeflow/Kubeflow1.3/kubeflow-manifests# kubectl apply -f patch/pipeline-env-platform-agnostic-multi-user.yaml 
	configmap/kubeflow-pipelines-profile-controller-code-c2cd68d9k4 created
	configmap/pipeline-install-config created
	deployment.apps/workflow-controller created
	deployment.apps/kubeflow-pipelines-profile-controller created
# CreateContainerConfigError 

root@master:/home/hqc/Kubeflow/Kubeflow1.3/kubeflow-manifests# kubectl delete -f patch/tensorboard.yaml 
	deployment.apps "tensorboards-web-app-deployment" deleted
root@master:/home/hqc/Kubeflow/Kubeflow1.3/kubeflow-manifests# kubectl apply -f patch/tensorboard.yaml 
	deployment.apps/tensorboards-web-app-deployment created
# running

到这一步,出现了很多组件,都running啦!

root@master:/home/hqc/Kubeflow/Kubeflow1.3/kubeflow-manifests# kubectl delete -f patch/volumes-web-app.yaml 
	deployment.apps "volumes-web-app-deployment" deleted
root@master:/home/hqc/Kubeflow/Kubeflow1.3/kubeflow-manifests# kubectl apply -f patch/volumes-web-app.yaml 
	deployment.apps/volumes-web-app-deployment created
# running

root@master:/home/hqc/Kubeflow/Kubeflow1.3/kubeflow-manifests# kubectl delete -f patch/workflow-controller.yaml 
	deployment.apps "workflow-controller" deleted
	Error from server (NotFound): error when deleting "patch/workflow-controller.yaml": configmaps "workflow-controller-configmap" not found
	Error from server (NotFound): error when deleting "patch/workflow-controller.yaml": deployments.apps "cache-server" not found
root@master:/home/hqc/Kubeflow/Kubeflow1.3/kubeflow-manifests# kubectl apply -f patch/workflow-controller.yaml 
	configmap/workflow-controller-configmap created
	deployment.apps/workflow-controller created
	deployment.apps/cache-server created
# 过了两三分钟,这个步骤部署的组件都没有ready

在这里插入图片描述

2.4.6 查看UI界面

是可以正常登录UI界面的,但存在几个问题

  1. 没有Namespace
    在这里插入图片描述问题解决参考,不过此方法对我好像没用,重装一遍之后就解决了。但仍有第2个问题。

  2. Invalid Page
    在这里插入图片描述
    pipeline有关,因此应该是前面017yaml文件没有部署成功的原因,也可能和后面的workflow组件相关。
    报错信息是:

# 第17个文件仍然出错,和pipeline相关
root@master:/home/hqc/Kubeflow/Kubeflow1.3/kubeflow-manifests# kubectl apply -f manifest1.3/017-pipeline-env-platform-agnostic-multi-user.yaml 
	error: unable to recognize "manifest1.3/017-pipeline-env-platform-agnostic-multi-user.yaml": no matches for kind "CompositeController" in version "metacontroller.k8s.io/v1alpha1"

通过查询好像是资源文件的版本定义过期了的问题,只需将v1beta1改成v1,尝试~

root@master:/home/hqc/Kubeflow/Kubeflow1.3/kubeflow-manifests# kubectl apply -f manifest1.3/017-pipeline-env-platform-agnostic-multi-user.yaml 
	error: error validating "manifest1.3/017-pipeline-env-platform-agnostic-multi-user.yaml": error validating data: [ValidationError(CustomResourceDefinition.spec): unknown field "version" in io.k8s.apiextensions-apiserver.pkg.apis.apiextensions.v1.CustomResourceDefinitionSpec, ValidationError(CustomResourceDefinition.spec): missing required field "versions" in io.k8s.apiextensions-apiserver.pkg.apis.apiextensions.v1.CustomResourceDefinitionSpec]; if you choose to ignore these errors, turn validation off with --validate=false

没解决!…

2.4.7 解决问题

解决了017.yaml文件无法部署,打补丁遗留问题,相继地UI界面也解决了。

根据前面的报错信息:no matches for kind "CompositeController" in version "metacontroller.k8s.io/v1alpha1",判断是与CompositeController相关的句段有问题。
通过不断更改其apiversions测试,发现始终不行,因此决定直接注释掉相关句段尝试一下(有两处),因为各句段之间是独立的,注释掉也不会影响其他组件。
在这里插入图片描述

# 注释
root@master:/home/hqc/Kubeflow/Kubeflow1.3/kubeflow-manifests# vim manifest1.3/017-pipeline-env-platform-agnostic-multi-user.yaml

# 部署
root@master:/home/hqc/Kubeflow/Kubeflow1.3/kubeflow-manifests# kubectl apply -f manifest1.3/017-pipeline-env-platform-agnostic-multi-user.yaml 
	customresourcedefinition.apiextensions.k8s.io/clusterworkflowtemplates.argoproj.io created
	customresourcedefinition.apiextensions.k8s.io/controllerrevisions.metacontroller.k8s.io created
	customresourcedefinition.apiextensions.k8s.io/cronworkflows.argoproj.io created
	customresourcedefinition.apiextensions.k8s.io/decoratorcontrollers.metacontroller.k8s.io created
	customresourcedefinition.apiextensions.k8s.io/scheduledworkflows.kubeflow.org created
	customresourcedefinition.apiextensions.k8s.io/viewers.kubeflow.org created
	customresourcedefinition.apiextensions.k8s.io/workfloweventbindings.argoproj.io created
	customresourcedefinition.apiextensions.k8s.io/workflows.argoproj.io created
	customresourcedefinition.apiextensions.k8s.io/workflowtemplates.argoproj.io created
	serviceaccount/argo created
	serviceaccount/kubeflow-pipelines-cache created
	serviceaccount/kubeflow-pipelines-cache-deployer-sa created
	serviceaccount/kubeflow-pipelines-container-builder created
	serviceaccount/kubeflow-pipelines-metadata-writer created
	serviceaccount/kubeflow-pipelines-viewer created
	serviceaccount/meta-controller-service created
	serviceaccount/metadata-grpc-server created
	serviceaccount/ml-pipeline created
	serviceaccount/ml-pipeline-persistenceagent created
	serviceaccount/ml-pipeline-scheduledworkflow created
	serviceaccount/ml-pipeline-ui created
	serviceaccount/ml-pipeline-viewer-crd-service-account created
	serviceaccount/ml-pipeline-visualizationserver created
	serviceaccount/mysql created
	serviceaccount/pipeline-runner created
	role.rbac.authorization.k8s.io/argo-role created
	role.rbac.authorization.k8s.io/kubeflow-pipelines-cache-deployer-role created
	role.rbac.authorization.k8s.io/kubeflow-pipelines-cache-role created
	role.rbac.authorization.k8s.io/kubeflow-pipelines-metadata-writer-role created
	role.rbac.authorization.k8s.io/ml-pipeline created
	role.rbac.authorization.k8s.io/ml-pipeline-persistenceagent-role created
	role.rbac.authorization.k8s.io/ml-pipeline-scheduledworkflow-role created
	role.rbac.authorization.k8s.io/ml-pipeline-ui created
	role.rbac.authorization.k8s.io/ml-pipeline-viewer-controller-role created
	role.rbac.authorization.k8s.io/pipeline-runner created
	clusterrole.rbac.authorization.k8s.io/aggregate-to-kubeflow-pipelines-edit created
	clusterrole.rbac.authorization.k8s.io/aggregate-to-kubeflow-pipelines-view created
	clusterrole.rbac.authorization.k8s.io/argo-aggregate-to-admin created
	clusterrole.rbac.authorization.k8s.io/argo-aggregate-to-edit created
	clusterrole.rbac.authorization.k8s.io/argo-aggregate-to-view created
	clusterrole.rbac.authorization.k8s.io/argo-cluster-role created
	clusterrole.rbac.authorization.k8s.io/kubeflow-pipelines-cache-deployer-clusterrole created
	clusterrole.rbac.authorization.k8s.io/kubeflow-pipelines-cache-role created
	clusterrole.rbac.authorization.k8s.io/kubeflow-pipelines-edit created
	clusterrole.rbac.authorization.k8s.io/kubeflow-pipelines-metadata-writer-role created
	clusterrole.rbac.authorization.k8s.io/kubeflow-pipelines-view created
	clusterrole.rbac.authorization.k8s.io/ml-pipeline-persistenceagent-role created
	clusterrole.rbac.authorization.k8s.io/ml-pipeline-scheduledworkflow-role created
	clusterrole.rbac.authorization.k8s.io/ml-pipeline-ui created
	clusterrole.rbac.authorization.k8s.io/ml-pipeline-viewer-controller-role created
	clusterrole.rbac.authorization.k8s.io/ml-pipeline created
	rolebinding.rbac.authorization.k8s.io/argo-binding created
	rolebinding.rbac.authorization.k8s.io/kubeflow-pipelines-cache-binding created
	rolebinding.rbac.authorization.k8s.io/kubeflow-pipelines-cache-deployer-rolebinding created
	rolebinding.rbac.authorization.k8s.io/kubeflow-pipelines-metadata-writer-binding created
	rolebinding.rbac.authorization.k8s.io/ml-pipeline created
	rolebinding.rbac.authorization.k8s.io/ml-pipeline-persistenceagent-binding created
	rolebinding.rbac.authorization.k8s.io/ml-pipeline-scheduledworkflow-binding created
	rolebinding.rbac.authorization.k8s.io/ml-pipeline-ui created
	rolebinding.rbac.authorization.k8s.io/ml-pipeline-viewer-crd-binding created
	rolebinding.rbac.authorization.k8s.io/pipeline-runner-binding created
	clusterrolebinding.rbac.authorization.k8s.io/argo-binding created
	clusterrolebinding.rbac.authorization.k8s.io/kubeflow-pipelines-cache-binding created
	clusterrolebinding.rbac.authorization.k8s.io/kubeflow-pipelines-cache-deployer-clusterrolebinding created
	clusterrolebinding.rbac.authorization.k8s.io/kubeflow-pipelines-metadata-writer-binding created
	clusterrolebinding.rbac.authorization.k8s.io/meta-controller-cluster-role-binding created
	clusterrolebinding.rbac.authorization.k8s.io/ml-pipeline-persistenceagent-binding created
	clusterrolebinding.rbac.authorization.k8s.io/ml-pipeline-scheduledworkflow-binding created
	clusterrolebinding.rbac.authorization.k8s.io/ml-pipeline-ui created
	clusterrolebinding.rbac.authorization.k8s.io/ml-pipeline-viewer-crd-binding created
	clusterrolebinding.rbac.authorization.k8s.io/ml-pipeline created
	configmap/kubeflow-pipelines-profile-controller-code-c2cd68d9k4 created
	configmap/kubeflow-pipelines-profile-controller-env-5252m69c4c created
	configmap/metadata-grpc-configmap created
	configmap/ml-pipeline-ui-configmap created
	configmap/pipeline-api-server-config-dc9hkg52h6 created
	configmap/pipeline-install-config created
	configmap/workflow-controller-configmap created
	secret/mlpipeline-minio-artifact created
	secret/mysql-secret created
	service/cache-server created
	service/kubeflow-pipelines-profile-controller created
	service/metadata-envoy-service created
	service/metadata-grpc-service created
	service/minio-service created
	service/ml-pipeline created
	service/ml-pipeline-ui created
	service/ml-pipeline-visualizationserver created
	service/mysql created
	service/workflow-controller-metrics created
	persistentvolumeclaim/minio-pvc created
	persistentvolumeclaim/mysql-pv-claim created
	deployment.apps/cache-deployer-deployment created
	deployment.apps/cache-server created
	deployment.apps/kubeflow-pipelines-profile-controller created
	deployment.apps/metadata-envoy-deployment created
	deployment.apps/metadata-grpc-deployment created
	deployment.apps/metadata-writer created
	deployment.apps/minio created
	deployment.apps/ml-pipeline created
	deployment.apps/ml-pipeline-persistenceagent created
	deployment.apps/ml-pipeline-scheduledworkflow created
	deployment.apps/ml-pipeline-ui created
	deployment.apps/ml-pipeline-viewer-crd created
	deployment.apps/ml-pipeline-visualizationserver created
	deployment.apps/mysql created
	deployment.apps/workflow-controller created
	statefulset.apps/metacontroller created
	destinationrule.networking.istio.io/ml-pipeline created
	destinationrule.networking.istio.io/ml-pipeline-minio created
	destinationrule.networking.istio.io/ml-pipeline-mysql created
	destinationrule.networking.istio.io/ml-pipeline-ui created
	destinationrule.networking.istio.io/ml-pipeline-visualizationserver created
	virtualservice.networking.istio.io/metadata-grpc created
	virtualservice.networking.istio.io/ml-pipeline-ui created
	authorizationpolicy.security.istio.io/metadata-grpc-service created
	authorizationpolicy.security.istio.io/minio-service created
	authorizationpolicy.security.istio.io/ml-pipeline created
	authorizationpolicy.security.istio.io/ml-pipeline-ui created
	authorizationpolicy.security.istio.io/ml-pipeline-visualizationserver created
	authorizationpolicy.security.istio.io/mysql created
	authorizationpolicy.security.istio.io/service-cache-server created
#创建了很多新的组件

大多数组件过一段实践可以running起来,(耐心等一段时间10min+)但此时还有很多组件异常。
下面就是补丁起作用啦!

# data.yaml
root@master:/home/hqc/Kubeflow/Kubeflow1.3/kubeflow-manifests# kubectl delete -f patch/data.yaml 
	deployment.apps "minio" deleted
root@master:/home/hqc/Kubeflow/Kubeflow1.3/kubeflow-manifests# kubectl apply -f patch/data.yaml 
	deployment.apps/minio created
	
# pipeline
root@master:/home/hqc/Kubeflow/Kubeflow1.3/kubeflow-manifests# kubectl delete -f patch/pipeline-env-platform-agnostic-multi-user.yaml 
	configmap "kubeflow-pipelines-profile-controller-code-c2cd68d9k4" deleted
	configmap "pipeline-install-config" deleted
	deployment.apps "workflow-controller" deleted
	deployment.apps "kubeflow-pipelines-profile-controller" deleted
root@master:/home/hqc/Kubeflow/Kubeflow1.3/kubeflow-manifests# kubectl apply -f patch/pipeline-env-platform-agnostic-multi-user.yaml 
	configmap/kubeflow-pipelines-profile-controller-code-c2cd68d9k4 created
	configmap/pipeline-install-config created
	deployment.apps/workflow-controller created
	deployment.apps/kubeflow-pipelines-profile-controller created
	
# workflow-controller
root@master:/home/hqc/Kubeflow/Kubeflow1.3/kubeflow-manifests# kubectl apply -f patch/workflow-controller.yaml 
	configmap/workflow-controller-configmap configured
	deployment.apps/workflow-controller unchanged

2.4.8 查看结果

root@master:/home/hqc/Kubeflow/Kubeflow1.3# kubectl get pod --all-namespaces
	NAMESPACE            NAME                                                         READY   STATUS    RESTARTS   AGE
	auth                 dex-bb655f999-nw98h                                          1/1     Running   1          2d1h
	cert-manager         cert-manager-cainjector-846b7c9f8c-4sgvn                     1/1     Running   16         2d1h
	cert-manager         cert-manager-fbc979d45-4nqpf                                 1/1     Running   3          2d1h
	cert-manager         cert-manager-webhook-67956cb44b-rxwfn                        1/1     Running   1          2d1h
	istio-system         authservice-0                                                1/1     Running   1          2d1h
	istio-system         cluster-local-gateway-d8688cfdd-c8zxp                        1/1     Running   1          2d1h
	istio-system         istio-ingressgateway-84f6567479-24v7c                        1/1     Running   1          2d1h
	istio-system         istiod-5d6d848d84-2k44j                                      1/1     Running   1          2d1h
	knative-eventing     broker-controller-d675f7d9f-8xlcw                            1/1     Running   1          2d1h
	knative-eventing     eventing-controller-8554597688-9k8vb                         1/1     Running   1          2d1h
	knative-eventing     eventing-webhook-5f7565f8dd-m66bn                            1/1     Running   1          2d1h
	knative-eventing     imc-controller-5fd9999fb-4nn4b                               1/1     Running   1          2d1h
	knative-eventing     imc-dispatcher-fc5555f96-748zp                               1/1     Running   1          2d1h
	knative-serving      activator-66c44ddffc-v98s2                                   1/1     Running   1          2d1h
	knative-serving      autoscaler-5c77d67c69-shpvv                                  1/1     Running   1          2d1h
	knative-serving      controller-b9bfbfb4f-qmgr6                                   1/1     Running   1          2d1h
	knative-serving      istio-webhook-5466fb9cd-kfgtk                                1/1     Running   1          2d1h
	knative-serving      networking-istio-94d878c4b-xsbvc                             1/1     Running   1          2d1h
	knative-serving      webhook-5d96ccb4fc-74g27                                     1/1     Running   1          2d1h
	kube-system          coredns-66bff467f8-p8txx                                     1/1     Running   20         45d
	kube-system          coredns-66bff467f8-qqrn9                                     1/1     Running   20         45d
	kube-system          etcd-master                                                  1/1     Running   4          45d
	kube-system          kube-apiserver-master                                        1/1     Running   11565      45d
	kube-system          kube-controller-manager-master                               1/1     Running   33         45d
	kube-system          kube-flannel-ds-8gb4m                                        1/1     Running   26         45d
	kube-system          kube-flannel-ds-tpnlj                                        1/1     Running   11         45d
	kube-system          kube-proxy-vrcts                                             1/1     Running   23         45d
	kube-system          kube-proxy-w8sv8                                             1/1     Running   4          45d
	kube-system          kube-scheduler-master                                        1/1     Running   34         45d
	kubeflow             admission-webhook-deployment-8678d7d5fc-w5llp                1/1     Running   1          2d1h
	kubeflow             cache-deployer-deployment-7cb5846cfb-w2zt4                   2/2     Running   1          11m
	kubeflow             cache-server-7d5679f47f-tp75j                                2/2     Running   0          11m
	kubeflow             centraldashboard-75466989b6-29hkv                            1/1     Running   1          2d1h
	kubeflow             jupyter-web-app-deployment-b9df56ff-nz8xx                    1/1     Running   1          2d1h
	kubeflow             katib-controller-b7b78dcf-v2pmf                              1/1     Running   1          2d1h
	kubeflow             katib-db-manager-755464ffcf-946nd                            1/1     Running   1          2d1h
	kubeflow             katib-mysql-f6b75dd75-5spgj                                  1/1     Running   1          2d1h
	kubeflow             katib-ui-7b997fd84f-hmn9d                                    1/1     Running   1          2d1h
	kubeflow             kfserving-controller-manager-0                               2/2     Running   2          2d1h
	kubeflow             kubeflow-pipelines-profile-controller-65c8c9dc9c-mktlk       1/1     Running   0          4m22s
	kubeflow             metacontroller-0                                             1/1     Running   0          11m
	kubeflow             metadata-envoy-deployment-5b8555884c-7g4j9                   1/1     Running   0          11m
	kubeflow             metadata-grpc-deployment-844fdd8f45-k5sr7                    2/2     Running   5          11m
	kubeflow             metadata-writer-7b889fb74d-fjzkm                             2/2     Running   2          11m
	kubeflow             minio-6f4c68d54f-tqvgm                                       2/2     Running   0          5m47s
	kubeflow             ml-pipeline-84bc5648fc-nz8rd                                 2/2     Running   4          11m
	kubeflow             ml-pipeline-persistenceagent-69d8f6d499-tcc6s                2/2     Running   1          11m
	kubeflow             ml-pipeline-scheduledworkflow-6cb4797f7f-nq2tx               2/2     Running   0          11m
	kubeflow             ml-pipeline-ui-56cc5c444b-kg7zp                              2/2     Running   0          11m
	kubeflow             ml-pipeline-viewer-crd-67f54547b4-b2gg5                      2/2     Running   1          11m
	kubeflow             ml-pipeline-visualizationserver-7b6ff7bf5f-qsqrs             2/2     Running   0          11m
	kubeflow             mpi-operator-6cd4967df-pwbdn                                 1/1     Running   3          2d1h
	kubeflow             mxnet-operator-65ddbb8bb7-kjh2f                              1/1     Running   3          2d1h
	kubeflow             mysql-79cb69477c-6d7lv                                       2/2     Running   0          11m
	kubeflow             notebook-controller-deployment-7fb67c4d4c-sfgmc              1/1     Running   1          2d1h
	kubeflow             profiles-deployment-6888b86fc8-8v2dv                         2/2     Running   2          2d1h
	kubeflow             pytorch-operator-5ccf6f746d-gt8xd                            2/2     Running   5          2d1h
	kubeflow             tensorboard-controller-controller-manager-85fbc9cb98-rzw4n   3/3     Running   23         2d1h
	kubeflow             tensorboards-web-app-deployment-75d87f8559-xxvvh             1/1     Running   1          2d1h
	kubeflow             tf-job-operator-7c79b5b65f-kffrp                             1/1     Running   16         2d1h
	kubeflow             volumes-web-app-deployment-64db74d95d-z2q2b                  1/1     Running   1          2d1h
	kubeflow             workflow-controller-9f444667d-6cgmf                          2/2     Running   2          4m22s
	kubeflow             xgboost-operator-deployment-7d8df579f5-jhx5g                 2/2     Running   6          2d1h
	local-path-storage   local-path-provisioner-7c6fcb5b5f-8cg9f                      1/1     Running   16         2d1h

root@master:/home/hqc/Kubeflow/Kubeflow1.3# kubectl -n kubeflow get all
	NAME                                                             READY   STATUS    RESTARTS   AGE
	pod/admission-webhook-deployment-8678d7d5fc-w5llp                1/1     Running   1          2d1h
	pod/cache-deployer-deployment-7cb5846cfb-w2zt4                   2/2     Running   1          38m
	pod/cache-server-7d5679f47f-tp75j                                2/2     Running   0          38m
	pod/centraldashboard-75466989b6-29hkv                            1/1     Running   1          2d1h
	pod/jupyter-web-app-deployment-b9df56ff-nz8xx                    1/1     Running   1          2d1h
	pod/katib-controller-b7b78dcf-v2pmf                              1/1     Running   1          2d1h
	pod/katib-db-manager-755464ffcf-946nd                            1/1     Running   1          2d1h
	pod/katib-mysql-f6b75dd75-5spgj                                  1/1     Running   1          2d1h
	pod/katib-ui-7b997fd84f-hmn9d                                    1/1     Running   1          2d1h
	pod/kfserving-controller-manager-0                               2/2     Running   2          2d1h
	pod/kubeflow-pipelines-profile-controller-65c8c9dc9c-mktlk       1/1     Running   0          31m
	pod/metacontroller-0                                             1/1     Running   0          38m
	pod/metadata-envoy-deployment-5b8555884c-7g4j9                   1/1     Running   0          38m
	pod/metadata-grpc-deployment-844fdd8f45-k5sr7                    2/2     Running   5          38m
	pod/metadata-writer-7b889fb74d-fjzkm                             2/2     Running   2          38m
	pod/minio-6f4c68d54f-tqvgm                                       2/2     Running   0          32m
	pod/ml-pipeline-84bc5648fc-nz8rd                                 2/2     Running   4          38m
	pod/ml-pipeline-persistenceagent-69d8f6d499-tcc6s                2/2     Running   1          38m
	pod/ml-pipeline-scheduledworkflow-6cb4797f7f-nq2tx               2/2     Running   0          38m
	pod/ml-pipeline-ui-56cc5c444b-kg7zp                              2/2     Running   0          38m
	pod/ml-pipeline-viewer-crd-67f54547b4-b2gg5                      2/2     Running   1          38m
	pod/ml-pipeline-visualizationserver-7b6ff7bf5f-qsqrs             2/2     Running   0          38m
	pod/mpi-operator-6cd4967df-pwbdn                                 1/1     Running   3          2d1h
	pod/mxnet-operator-65ddbb8bb7-kjh2f                              1/1     Running   3          2d1h
	pod/mysql-79cb69477c-6d7lv                                       2/2     Running   0          38m
	pod/notebook-controller-deployment-7fb67c4d4c-sfgmc              1/1     Running   1          2d1h
	pod/profiles-deployment-6888b86fc8-8v2dv                         2/2     Running   2          2d1h
	pod/pytorch-operator-5ccf6f746d-gt8xd                            2/2     Running   5          2d1h
	pod/tensorboard-controller-controller-manager-85fbc9cb98-rzw4n   3/3     Running   23         2d1h
	pod/tensorboards-web-app-deployment-75d87f8559-xxvvh             1/1     Running   1          2d1h
	pod/tf-job-operator-7c79b5b65f-kffrp                             1/1     Running   16         2d1h
	pod/volumes-web-app-deployment-64db74d95d-z2q2b                  1/1     Running   1          2d1h
	pod/workflow-controller-9f444667d-6cgmf                          2/2     Running   2          31m
	pod/xgboost-operator-deployment-7d8df579f5-jhx5g                 2/2     Running   6          2d1h
	
	NAME                                                                TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)             AGE
	service/admission-webhook-service                                   ClusterIP   10.107.240.242   <none>        443/TCP             2d1h
	service/cache-server                                                ClusterIP   10.103.205.52    <none>        443/TCP             38m
	service/centraldashboard                                            ClusterIP   10.105.105.223   <none>        80/TCP              2d1h
	service/jupyter-web-app-service                                     ClusterIP   10.96.92.138     <none>        80/TCP              2d1h
	service/katib-controller                                            ClusterIP   10.104.105.114   <none>        443/TCP,8080/TCP    2d1h
	service/katib-db-manager                                            ClusterIP   10.104.242.135   <none>        6789/TCP            2d1h
	service/katib-mysql                                                 ClusterIP   10.104.127.193   <none>        3306/TCP            2d1h
	service/katib-ui                                                    ClusterIP   10.102.31.52     <none>        80/TCP              2d1h
	service/kfserving-controller-manager-metrics-service                ClusterIP   10.107.165.77    <none>        8443/TCP            2d1h
	service/kfserving-controller-manager-service                        ClusterIP   10.97.70.9       <none>        443/TCP             2d1h
	service/kfserving-webhook-server-service                            ClusterIP   10.101.18.48     <none>        443/TCP             2d1h
	service/kubeflow-pipelines-profile-controller                       ClusterIP   10.106.136.54    <none>        80/TCP              38m
	service/metadata-envoy-service                                      ClusterIP   10.103.159.117   <none>        9090/TCP            38m
	service/metadata-grpc-service                                       ClusterIP   10.109.160.91    <none>        8080/TCP            38m
	service/minio-service                                               ClusterIP   10.104.196.141   <none>        9000/TCP            38m
	service/ml-pipeline                                                 ClusterIP   10.105.62.168    <none>        8888/TCP,8887/TCP   38m
	service/ml-pipeline-ui                                              ClusterIP   10.103.205.88    <none>        80/TCP              38m
	service/ml-pipeline-visualizationserver                             ClusterIP   10.109.249.129   <none>        8888/TCP            38m
	service/mysql                                                       ClusterIP   10.103.78.198    <none>        3306/TCP            38m
	service/notebook-controller-service                                 ClusterIP   10.109.155.251   <none>        443/TCP             2d1h
	service/profiles-kfam                                               ClusterIP   10.111.128.33    <none>        8081/TCP            2d1h
	service/pytorch-operator                                            ClusterIP   10.109.153.150   <none>        8443/TCP            2d1h
	service/tensorboard-controller-controller-manager-metrics-service   ClusterIP   10.101.132.68    <none>        8443/TCP            2d1h
	service/tensorboards-web-app-service                                ClusterIP   10.102.18.212    <none>        80/TCP              2d1h
	service/tf-job-operator                                             ClusterIP   10.101.26.17     <none>        8443/TCP            2d1h
	service/volumes-web-app-service                                     ClusterIP   10.106.84.128    <none>        80/TCP              2d1h
	service/workflow-controller-metrics                                 ClusterIP   10.110.236.185   <none>        9090/TCP            38m
	service/xgboost-operator-service                                    ClusterIP   10.109.83.203    <none>        443/TCP             2d1h
	
	NAME                                                        READY   UP-TO-DATE   AVAILABLE   AGE
	deployment.apps/admission-webhook-deployment                1/1     1            1           2d1h
	deployment.apps/cache-deployer-deployment                   1/1     1            1           38m
	deployment.apps/cache-server                                1/1     1            1           38m
	deployment.apps/centraldashboard                            1/1     1            1           2d1h
	deployment.apps/jupyter-web-app-deployment                  1/1     1            1           2d1h
	deployment.apps/katib-controller                            1/1     1            1           2d1h
	deployment.apps/katib-db-manager                            1/1     1            1           2d1h
	deployment.apps/katib-mysql                                 1/1     1            1           2d1h
	deployment.apps/katib-ui                                    1/1     1            1           2d1h
	deployment.apps/kubeflow-pipelines-profile-controller       1/1     1            1           31m
	deployment.apps/metadata-envoy-deployment                   1/1     1            1           38m
	deployment.apps/metadata-grpc-deployment                    1/1     1            1           38m
	deployment.apps/metadata-writer                             1/1     1            1           38m
	deployment.apps/minio                                       1/1     1            1           32m
	deployment.apps/ml-pipeline                                 1/1     1            1           38m
	deployment.apps/ml-pipeline-persistenceagent                1/1     1            1           38m
	deployment.apps/ml-pipeline-scheduledworkflow               1/1     1            1           38m
	deployment.apps/ml-pipeline-ui                              1/1     1            1           38m
	deployment.apps/ml-pipeline-viewer-crd                      1/1     1            1           38m
	deployment.apps/ml-pipeline-visualizationserver             1/1     1            1           38m
	deployment.apps/mpi-operator                                1/1     1            1           2d1h
	deployment.apps/mxnet-operator                              1/1     1            1           2d1h
	deployment.apps/mysql                                       1/1     1            1           38m
	deployment.apps/notebook-controller-deployment              1/1     1            1           2d1h
	deployment.apps/profiles-deployment                         1/1     1            1           2d1h
	deployment.apps/pytorch-operator                            1/1     1            1           2d1h
	deployment.apps/tensorboard-controller-controller-manager   1/1     1            1           2d1h
	deployment.apps/tensorboards-web-app-deployment             1/1     1            1           2d1h
	deployment.apps/tf-job-operator                             1/1     1            1           2d1h
	deployment.apps/volumes-web-app-deployment                  1/1     1            1           2d1h
	deployment.apps/workflow-controller                         1/1     1            1           31m
	deployment.apps/xgboost-operator-deployment                 1/1     1            1           2d1h
	
	NAME                                                                   DESIRED   CURRENT   READY   AGE
	replicaset.apps/admission-webhook-deployment-8678d7d5fc                1         1         1       2d1h
	replicaset.apps/cache-deployer-deployment-7cb5846cfb                   1         1         1       38m
	replicaset.apps/cache-server-7d5679f47f                                1         1         1       38m
	replicaset.apps/centraldashboard-75466989b6                            1         1         1       2d1h
	replicaset.apps/jupyter-web-app-deployment-b9df56ff                    1         1         1       2d1h
	replicaset.apps/katib-controller-b7b78dcf                              1         1         1       2d1h
	replicaset.apps/katib-db-manager-755464ffcf                            1         1         1       2d1h
	replicaset.apps/katib-mysql-f6b75dd75                                  1         1         1       2d1h
	replicaset.apps/katib-ui-7b997fd84f                                    1         1         1       2d1h
	replicaset.apps/kubeflow-pipelines-profile-controller-65c8c9dc9c       1         1         1       31m
	replicaset.apps/metadata-envoy-deployment-5b8555884c                   1         1         1       38m
	replicaset.apps/metadata-grpc-deployment-844fdd8f45                    1         1         1       38m
	replicaset.apps/metadata-writer-7b889fb74d                             1         1         1       38m
	replicaset.apps/minio-6f4c68d54f                                       1         1         1       32m
	replicaset.apps/ml-pipeline-84bc5648fc                                 1         1         1       38m
	replicaset.apps/ml-pipeline-persistenceagent-69d8f6d499                1         1         1       38m
	replicaset.apps/ml-pipeline-scheduledworkflow-6cb4797f7f               1         1         1       38m
	replicaset.apps/ml-pipeline-ui-56cc5c444b                              1         1         1       38m
	replicaset.apps/ml-pipeline-viewer-crd-67f54547b4                      1         1         1       38m
	replicaset.apps/ml-pipeline-visualizationserver-7b6ff7bf5f             1         1         1       38m
	replicaset.apps/mpi-operator-6cd4967df                                 1         1         1       2d1h
	replicaset.apps/mxnet-operator-65ddbb8bb7                              1         1         1       2d1h
	replicaset.apps/mysql-79cb69477c                                       1         1         1       38m
	replicaset.apps/notebook-controller-deployment-7fb67c4d4c              1         1         1       2d1h
	replicaset.apps/profiles-deployment-6888b86fc8                         1         1         1       2d1h
	replicaset.apps/pytorch-operator-5ccf6f746d                            1         1         1       2d1h
	replicaset.apps/tensorboard-controller-controller-manager-85fbc9cb98   1         1         1       2d1h
	replicaset.apps/tensorboards-web-app-deployment-75d87f8559             1         1         1       2d1h
	replicaset.apps/tf-job-operator-7c79b5b65f                             1         1         1       2d1h
	replicaset.apps/volumes-web-app-deployment-64db74d95d                  1         1         1       2d1h
	replicaset.apps/workflow-controller-9f444667d                          1         1         1       31m
	replicaset.apps/xgboost-operator-deployment-7d8df579f5                 1         1         1       2d1h
	
	NAME                                            READY   AGE
	statefulset.apps/kfserving-controller-manager   1/1     2d1h
	statefulset.apps/metacontroller                 1/1     38m

全部正常ready!

2.5 成功部署!!!

在这里插入图片描述

评论 13
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值