Kubernetes实录系列记录文档完整目录参考: Kubernetes实录-目录
相关记录链接地址 :
Metrics server是Kubernetes集群资源使用情况的聚合器,Kubernetes中有些组件依赖资源指标API(metric API)的功能 ,如kubectl top 、hpa。如果没有资源指标API接口,这些组件无法运行。自kubernetes 1.8开始,资源使用指标通过 Metrics API 在 Kubernetes 中获取,从kubernetes1.11开始Heapster被废弃不在使用,metrics-server 替代了heapster。
- 通过Metrics API可以获取指定node或者pod的
当前资源使用情况
(而无法获取历史数据) - Metrics API的api路径:/apis/metrics.k8s.io/
- Metrics API的使用需要在K8S集群中成功部署metrics server
一、Kubernetes环境
1.1 kubernetes集群信息
主机名称 | ip地址 | 操作系统 | 角色 | 软件版本 | 备注 |
---|---|---|---|---|---|
ejucsmaster-shqs-1 | 10.99.12.201 | CentOS 7.8 | proxy, master | ||
ejucsmaster-shqs-2 | 10.99.12.202 | CentOS 7.8 | proxy, master | ||
ejucsmaster-shqs-3 | 10.99.12.203 | CentOS 7.8 | proxy, master | ||
ejucsnode-shqs-1 | 10.99.12.204 | CentOS 7.8 | worker | ||
ejucsnode-shqs-2 | 10.99.12.205 | CentOS 7.8 | worker | ||
ejucsnode-shqs-2 | 10.99.12.206 | CentOS 7.8 | worker |
kubernetes集群部署参考 使用kubeadm配置HA模式kubernets集群
二、metrics server部署
2.1 下载metrics server部署文件
当前最新版本是0.3.7,从github下载部署文件
mkdir -p kubernetes/05_metrics-server
cd kubernetes/05_metrics-server
wget https://github.com/kubernetes-sigs/metrics-server/releases/download/v0.3.7/components.yaml -O metrics-server.yaml
2.2 修改deployment.yaml文件,修正集群问题
- 问题1:metrics-server默认使用节点hostname通过kubelet 10250端口获取数据,但是coredns里面没有该数据无法解析(10.96.0.10:53),可以在metrics server启动命令添加参数
--kubelet-preferred-address-types=InternalIP
直接使用节点IP地址获取数据 - 问题2:kubelet 的10250端口使用的是https协议,连接需要验证tls证书。可以在metrics server启动命令添加参数
--kubelet-insecure-tls
不验证客户端证书 - 问题3:yaml文件中的image地址k8s.gcr.io/metrics-server/metrics-server:v0.3.7 需要梯子,目前其他的镜像源一般是0.3.6版本,新的0.3.7不好找,我自己从k8s.gcr.io同步了一个过来(原装的什么也没有改)pull下来或修改下tag,或者将yaml中的image参数修改下。
针对以上3个问题修正后的部署文件内容如下,其他什么都没有修改,保持原装
:
args:
- --cert-dir=/tmp
- --secure-port=4443
- --metric-resolution=30s
- --kubelet-insecure-tls
- --kubelet-preferred-address-types=InternalIP,Hostname,InternalDNS,ExternalDNS,ExternalIP
- --logtostderr
完整的yaml文件如下(我这里调整了一下资源顺序,除了上面的args参数其他的什么也没有改动)
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: metrics-server
namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: system:aggregated-metrics-reader
labels:
rbac.authorization.k8s.io/aggregate-to-view: "true"
rbac.authorization.k8s.io/aggregate-to-edit: "true"
rbac.authorization.k8s.io/aggregate-to-admin: "true"
rules:
- apiGroups: ["metrics.k8s.io"]
resources: ["pods", "nodes"]
verbs: ["get", "list", "watch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: system:metrics-server
rules:
- apiGroups:
- ""
resources:
- pods
- nodes
- nodes/stats
- namespaces
- configmaps
verbs:
- get
- list
- watch
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: metrics-server:system:auth-delegator
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: system:auth-delegator
subjects:
- kind: ServiceAccount
name: metrics-server
namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: system:metrics-server
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: system:metrics-server
subjects:
- kind: ServiceAccount
name: metrics-server
namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: metrics-server-auth-reader
namespace: kube-system
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: extension-apiserver-authentication-reader
subjects:
- kind: ServiceAccount
name: metrics-server
namespace: kube-system
---
apiVersion: apiregistration.k8s.io/v1beta1
kind: APIService
metadata:
name: v1beta1.metrics.k8s.io
spec:
service:
name: metrics-server
namespace: kube-system
group: metrics.k8s.io
version: v1beta1
insecureSkipTLSVerify: true
groupPriorityMinimum: 100
versionPriority: 100
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: metrics-server
namespace: kube-system
labels:
k8s-app: metrics-server
spec:
selector:
matchLabels:
k8s-app: metrics-server
template:
metadata:
name: metrics-server
labels:
k8s-app: metrics-server
spec:
serviceAccountName: metrics-server
volumes:
# mount in tmp so we can safely use from-scratch images and/or read-only containers
- name: tmp-dir
emptyDir: {}
hostNetwork: true
containers:
- name: metrics-server
# image: k8s.gcr.io/metrics-server/metrics-server:v0.3.7
image: oyymmw/metrics-server:0.3.7
imagePullPolicy: IfNotPresent
args:
- --cert-dir=/tmp
- --secure-port=4443
- --metric-resolution=30s
- --kubelet-insecure-tls
- --kubelet-preferred-address-types=InternalIP,Hostname,InternalDNS,ExternalDNS,ExternalIP
- --logtostderr
ports:
- name: main-port
containerPort: 4443
protocol: TCP
securityContext:
readOnlyRootFilesystem: true
runAsNonRoot: true
runAsUser: 1000
volumeMounts:
- name: tmp-dir
mountPath: /tmp
nodeSelector:
kubernetes.io/os: linux
kubernetes.io/arch: "amd64"
---
apiVersion: v1
kind: Service
metadata:
name: metrics-server
namespace: kube-system
labels:
kubernetes.io/name: "Metrics-server"
kubernetes.io/cluster-service: "true"
spec:
selector:
k8s-app: metrics-server
ports:
- port: 443
protocol: TCP
targetPort: main-port
备注: kubernetes 1.19+ 使用metrics-server 版本需要在container同级别添加 hostNetwork: true
2.3 部署启动metrics-server
# kubectl apply -f kubernetes/05_metrics-server/metrics-server.yaml
serviceaccount/metrics-server create
clusterrole.rbac.authorization.k8s.io/system:aggregated-metrics-reader create
clusterrole.rbac.authorization.k8s.io/system:metrics-server create
clusterrolebinding.rbac.authorization.k8s.io/metrics-server:system:auth-delegator create
clusterrolebinding.rbac.authorization.k8s.io/system:metrics-server create
rolebinding.rbac.authorization.k8s.io/metrics-server-auth-reader create
apiservice.apiregistration.k8s.io/v1beta1.metrics.k8s.io create
deployment.apps/metrics-server create
service/metrics-server create
kubectl get pods -n kube-system
NAME READY STATUS RESTARTS AGE IP NODE
metrics-server-9448b75bb-qrrsj 1/1 Running 0 45m 192.168.187.69 ejucsnode-shqs-3
2.4 查看API资源
# kubectl api-versions
...
metrics.k8s.io/v1beta1 #多了这个
三、验证
3.1 查看集群节点资源使用情况(CPU,MEM)
# kubectl top nodes
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
ejucsmaster-shqs-1 462m 1% 3731Mi 2%
ejucsmaster-shqs-2 374m 1% 3600Mi 2%
ejucsmaster-shqs-3 284m 1% 3545Mi 2%
ejucsnode-shqs-1 181m 0% 2905Mi 2%
ejucsnode-shqs-2 242m 1% 3063Mi 2%
ejucsnode-shqs-3 199m 0% 2102Mi 1%
3.2 查看pods资源使用情况
# kubectl get --raw "/apis/metrics.k8s.io/v1beta1/nodes"
{"kind":"NodeMetricsList","apiVersion":"metrics.k8s.io/v1beta1","metadata":{"selfLink":"/apis/metrics.k8s.io/v1beta1/nodes"},"items":[{"metadata":{"name":"ejucsnode-shqs-3","selfLink":"/apis/metrics.k8s.io/v1beta1/nodes/ejucsnode-shqs-3","creationTimestamp":"2020-07-28T03:19:55Z"},"timestamp":"2020-07-28T03:19:50Z","window":"30s","usage":{"cpu":"180595162n","memory":"2152768Ki"}},{"metadata":{"name":"ejucsmaster-shqs-1","selfLink":"/apis/metrics.k8s.io/v1beta1/nodes/ejucsmaster-shqs-1","creationTimestamp":"2020-07-28T03:19:55Z"},"timestamp":"2020-07-28T03:19:42Z","window":"30s","usage":{"cpu":"376722789n","memory":"3820568Ki"}},{"metadata":{"name":"ejucsmaster-shqs-2","selfLink":"/apis/metrics.k8s.io/v1beta1/nodes/ejucsmaster-shqs-2","creationTimestamp":"2020-07-28T03:19:55Z"},"timestamp":"2020-07-28T03:19:46Z","window":"30s","usage":{"cpu":"329415091n","memory":"3687180Ki"}},{"metadata":{"name":"ejucsmaster-shqs-3","selfLink":"/apis/metrics.k8s.io/v1beta1/nodes/ejucsmaster-shqs-3","creationTimestamp":"2020-07-28T03:19:55Z"},"timestamp":"2020-07-28T03:19:45Z","window":"30s","usage":{"cpu":"310631602n","memory":"3631768Ki"}},{"metadata":{"name":"ejucsnode-shqs-1","selfLink":"/apis/metrics.k8s.io/v1beta1/nodes/ejucsnode-shqs-1","creationTimestamp":"2020-07-28T03:19:55Z"},"timestamp":"2020-07-28T03:19:48Z","window":"30s","usage":{"cpu":"204152407n","memory":"2984168Ki"}},{"metadata":{"name":"ejucsnode-shqs-2","selfLink":"/apis/metrics.k8s.io/v1beta1/nodes/ejucsnode-shqs-2","creationTimestamp":"2020-07-28T03:19:55Z"},"timestamp":"2020-07-28T03:19:45Z","window":"30s","usage":{"cpu":"193832881n","memory":"3142804Ki"}}]}
# kubectl -n kube-system top pods
NAME CPU(cores) MEMORY(bytes)
calico-kube-controllers-578894d4cd-xclh6 5m 34Mi
calico-node-2hlmm 36m 83Mi
calico-node-2swcp 42m 85Mi
calico-node-8z74g 42m 87Mi
calico-node-hdsjc 40m 83Mi
calico-node-nqkxn 46m 85Mi
calico-node-tvzhf 41m 83Mi
coredns-66bff467f8-4hh52 3m 18Mi
coredns-66bff467f8-xn6c2 3m 19Mi
etcd-ejucsmaster-shqs-1 57m 355Mi
etcd-ejucsmaster-shqs-2 37m 429Mi
etcd-ejucsmaster-shqs-3 45m 440Mi
glusterfs-public-fgl7c 5m 43Mi
glusterfs-public-fxljn 7m 44Mi
glusterfs-public-qk7nk 6m 43Mi
glusterfs-system-lhwgz 5m 65Mi
glusterfs-system-nrgqz 4m 66Mi
glusterfs-system-pngsc 6m 71Mi
heketi-7d8bd8cd86-xphrz 1m 51Mi
kube-apiserver-ejucsmaster-shqs-1 64m 457Mi
kube-apiserver-ejucsmaster-shqs-2 44m 375Mi
kube-apiserver-ejucsmaster-shqs-3 41m 392Mi
kube-controller-manager-ejucsmaster-shqs-1 3m 24Mi
kube-controller-manager-ejucsmaster-shqs-2 25m 61Mi
kube-controller-manager-ejucsmaster-shqs-3 3m 24Mi
kube-proxy-8t2n7 1m 20Mi
kube-proxy-cgjs5 1m 22Mi
kube-proxy-d4bh5 1m 22Mi
kube-proxy-mvt49 1m 21Mi
kube-proxy-nz49z 1m 19Mi
kube-proxy-p8q7m 1m 20Mi
kube-scheduler-ejucsmaster-shqs-1 4m 26Mi
kube-scheduler-ejucsmaster-shqs-2 6m 28Mi
kube-scheduler-ejucsmaster-shqs-3 4m 24Mi
metrics-server-9448b75bb-qrrsj 1m 22Mi
traefik-ingress-controller-5fm5h 1m 30Mi
traefik-ingress-controller-9dpwk 2m 26Mi
traefik-ingress-controller-pblx8 1m 27Mi
3.3 查看指定pod资源使用情况
# kubectl -n kube-system top pods metrics-server-9448b75bb-qrrsj
NAME CPU(cores) MEMORY(bytes)
metrics-server-9448b75bb-qrrsj 2m 22Mi
四、问题排查
在获取资源使用情况数据是如果出现问题可以查看metrics-server的日志例如
kubectl top nodes
error: metrics not available yet #说明配置没有成功
#出现如上类似错误,可以查看日志
kubectl logs -f metrics-server-9448b75bb-qrrsj -c metrics-server -n kube-system
I0728 03:23:49.720467 1 serving.go:312] Generated self-signed cert (/tmp/apiserver.crt, /tmp/apiserver.key)
I0728 03:23:50.463544 1 secure_serving.go:116] Serving securely on [::]:4443
在部署metrics-server是出现过问题,例如节点hostname解析不了,证书问题。日志内容没有保存下来,具体解决方案参考 2.2 修改deployment.yaml文件,修正集群问题
。