我的k8s版本是1.17.0,本地k8s部署metrics-server无法度量到数据,HPA显示unknow。
通过命令 kubectl logs metrics-server-dc6fb55f4-z88lm -n kube-system 可以看到类似如下错误
E0225 02:30:52.433523 1 manager.go:111] unable to fully collect metrics: [unable to fully scrape metrics from source kubelet_summary:k8s-n-2: unable to fetch metrics from Kubelet k8s-n-2 (k8s-n-2): Get https://k8s-n-2:10250/stats/summary?only_cpu_and_memory=true: dial tcp: lookup k8s-n-2 on 10.96.0.10:53: no such host, unable to fully scrape metrics from source kubelet_summary:k8s-m: unable to fetch metrics from Kubelet k8s-m (k8s-m): Get https://k8s-m:10250/stats/summary?only_cpu_and_memory=true: dial tcp: lookup k8s-m on 10.96.0.10:53: no such host, unable to fully scrape metrics from source kubelet_summary:k8s-n-1: unable to fetch metrics from Kubelet k8s-n-1 (k8s-n-1): Get https://k8s-n-1:10250/stats/summary?only_cpu_and_memory=true: dial tcp: lookup k8s-n-1 on 10.96.0.10:53: no such host]
这个问题是由于主机名k8s-n-2没有作DNS解析,所以croedns无法找到主机。 各种云上不会有这个问题,因为一般云上主机在云端内部DNS服务器上都是自动添加了DNS记录的,使用云端自有DNS就能解释了主机的IP,这个问题一般出现在本地部署。
解决办法有二种。
第一种,安装类似DNSMASQ的服务器,自己解释主机IP,在Master的主机上使用这个DNS(coredns自动继承使用Master主机的DNS配置),这种应该是比较正规的做法。
第二种,修改metrics-server的Deployment,增加以下命令段
command:
- /metrics-server
- --metric-resolution=30s
- --kubelet-insecure-tls
- --kubelet-preferred-address-types=InternalIP,Hostname,InternalDNS,ExternalDNS,ExternalIP
经过以上修改,metrics-server就会改为以IP形式来请求metrics数据,kubelet-insecure-tls参数是因为改为IP后,原来基于主机名的证书就不能用了(会提示x.509证书错误),只能使用非安全连接。
完整metrics-server-0.3.6\deploy\1.8+\metrics-server-deployment.yaml修改如下
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: metrics-server
namespace: kube-system
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: metrics-server
namespace: kube-system
labels:
k8s-app: metrics-server
spec:
selector:
matchLabels:
k8s-app: metrics-server
template:
metadata:
name: metrics-server
labels:
k8s-app: metrics-server
spec:
serviceAccountName: metrics-server
volumes:
# mount in tmp so we can safely use from-scratch images and/or read-only containers
- name: tmp-dir
emptyDir: {}
containers:
- name: metrics-server
image: k8s.gcr.io/metrics-server-amd64:v0.3.6
imagePullPolicy: Always
volumeMounts:
- name: tmp-dir
mountPath: /tmp
command:
- /metrics-server
- --metric-resolution=30s
- --kubelet-insecure-tls
- --kubelet-preferred-address-types=InternalIP,Hostname,InternalDNS,ExternalDNS,ExternalIP
最后重新Apply一下这个YAML即可。