无法使用 metrics得采集指令,kubectl top nodes ,无法使用
问题描述
在主机重启之后,metrics的相关操作均不可用,日志如下
# metrics-servcie 日志
I1113 06:30:38.255110 1 serving.go:312] Generated self-signed cert (/tmp/apiserver.crt, /tmp/apiserver.key)
I1113 06:30:41.082525 1 secure_serving.go:116] Serving securely on [::]:4443
E1113 06:30:55.501456 1 reststorage.go:160] unable to fetch pod metrics for pod istio-system/jaeger-collector-76bf54b467-z8smv: no metrics known for pod
E1113 06:30:55.506335 1 reststorage.go:160] unable to fetch pod metrics for pod istio-system/jaeger-collector-76bf54b467-z8smv: no metrics known for pod
E1113 06:31:09.079549 1 reststorage.go:160] unable to fetch pod metrics for pod istio-system/jaeger-collector-76bf54b467-z8smv: no metrics known for pod
E1113 06:31:09.083549 1 reststorage.go:160] unable to fetch pod metrics for pod istio-system/jaeger-collector-76bf54b467-z8smv: no metrics known for pod
E1113 06:31:25.568509 1 reststorage.go:160] unable to fetch pod metrics for pod istio-system/jaeger-collector-76bf54b467-z8smv: no metrics known for pod
E1113 06:31:25.593969 1 reststorage.go:160] unable to fetch pod metrics for pod istio-system/jaeger-collector-76bf54b467-z8smv: no metrics known for pod
E1113 06:31:38.873501 1 reststorage.go:160] unable to fetch pod metrics for pod istio-system/jaeger-collector-76bf54b467-z8smv: no metrics known for pod
E1113 06:31:38.877544 1 reststorage.go:160] unable to fetch pod metrics for pod istio-system/jaeger-collector-76bf54b467-z8smv: no metrics known for pod
E1113 06:39:41.179891 1 manager.go:111] unable to fully collect metrics: unable to fully scrape metrics from source kubelet_summary:node1: unable to get a valid timestamp for metric point for container "wait-mysql" in pod kubesphere-alerting-system/notification-deployment-67cd9b7985-xzll8 on node "61.155.5.52", discarding data: no non-zero timestamp on either CPU or memory
问题分析
这里通过对各种的日志以及相关服务查看,发现其实是调取不到api,后来想到是不是网络的一些问题,然后还真的是,我这里的原因是 calico
实际得问题是,不值知道为什么,多了一个网卡,calico在初始化得时候,读取了这个虚拟网卡,而非主机得网卡,才会导致他无法与其他主机得pod进行通信!
大坑!
问题解决
保证集群网络可用我这里的bug就解决了
顺带提下,我这里是因为calico选择的并不是我的实际网卡问题导致的网络不可用
这里的是修改是制定了calico使用的网卡 em1
# 在配置文件指定网卡,这里是使用 em1
# 在env里面添加
- name: IP_AUTODETECTION_METHOD
value: "interface=em1"
以上!