【prometheus】-06 Kubernetes云原生监控之cAdvisor容器资源监控

【prometheus】-05 Kubernetes云原生监控之节点性能监控

【prometheus】-04 轻松搞定Prometheus Eureka服务发现

【prometheus】-03 轻松搞定Prometheus文件服务发现

【prometheus】-02 一张图彻底搞懂Prometheus服务发现机制


Kubernetes云原生监控之kube-state-metrics集群资源监控
概述
Kubernetes
云原生集群监控主要涉及到如下三类指标:node
物理节点指标、pod & container
容器资源指标和Kubernetes
云原生集群资源指标。针对这三类指标都有比较成熟的方案,见下图:
上节我们梳理了cAdvisor
容器性能指标如何监控,这一节我们就来分析下云原生集群资源监控。
Kubernetes
集群在运行过程中,我们想了解服务运行状态,这时就需要kube-state-metrics
,它主要关注deployment、service 、 pod
等集群资源对象的状态。
Kube State Metrics
是一个简单的服务,该服务通过监听 Kubernetes API
服务器来生成不同资源的状态的 Metrics
数据。
cAdvisor
已经被 Kubernetes
默认集成,而 Kube State Metrics
并没有被默认集成,所以我们要想监控集群完整数据,就需要在 Kubernetes
中单独部署 Kube State Metrics
组件,这样才能够将集群中的服务资源指标数据暴露出来,以便于对不同资源进行监控。
环境信息
本人搭建的 Kubernetes
集群环境如下图,后续都是基于该集群演示:
kube-state-metrics安装
1、选择与Kubernetes
版本兼容的kube-state-metrics
版本,https://github.com/kubernetes/kube-state-metrics
[root@master kube-state-metrics]# kubectl version
Client Version: version.Info{Major:"1", Minor:"19", GitVersion:"v1.19.5", GitCommit:"e338cf2c6d297aa603b50ad3a301f761b4173aa6", GitTreeState:"clean", BuildDate:"2020-12-09T11:18:51Z", GoVersion:"go1.15.2", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"19", GitVersion:"v1.19.5", GitCommit:"e338cf2c6d297aa603b50ad3a301f761b4173aa6", GitTreeState:"clean", BuildDate:"2020-12-09T11:10:32Z", GoVersion:"go1.15.2", Compiler:"gc", Platform:"linux/amd64"}
我的k8s
版本是v1.19.5
,所以这里选择kube-state-metrics
版本v2.1.1
。
2、在master节点上创建目录kube-state-metrics
,并将kube-state-metrics-2.1.1.zip
解压包中examples/standard
目录下文件拷贝到kube-state-metrics
目录中:
[root@master kube-state-metrics]# ls -lah
总用量 20K
drwxr-xr-x. 2 root root 135 7月 21 13:40 .
drwxr-xr-x. 5 root root 74 7月 21 13:39 ..
-rw-r--r--. 1 root root 376 7月 29 2021 cluster-role-binding.yaml
-rw-r--r--. 1 root root 1.6K 7月 29 2021 cluster-role.yaml
-rw-r--r--. 1 root root 1.2K 7月 29 2021 deployment.yaml
-rw-r--r--. 1 root root 192 7月 29 2021 service-account.yaml
-rw-r--r--. 1 root root 405 7月 29 2021 service.yaml
“由于
Kube State Metrics
组件需要通过与kube-apiserver
连接,并调用相应的接口去获取kubernetes
集群数据,这个过程需要Kube State Metrics
组件拥有一定的权限才能成功执行这些操作。在Kubernetes
中默认使用RBAC
方式管理权限。所以,我们需要创建相应的 RBAC 资源来提供该组件使用。
deployment.yaml文件注意如下暴露的两个端口作用:
apiVersion: apps/v1
kind: Deployment
metadata:
name: kube-state-metrics
labels:
k8s-app: kube-state-metrics
spec:
replicas: 1
selector:
matchLabels:
k8s-app: kube-state-metrics
template:
metadata:
labels:
k8s-app: kube-state-metrics
spec:
serviceAccountName: kube-state-metrics
containers:
- name: kube-state-metrics
image: bitnami/kube-state-metrics:2.0.0
securityContext:
runAsUser: 65534
ports:
- name: http-metrics ##用于公开kubernetes的指标数据的端口
containerPort: 8080
- name: telemetry ##用于公开自身kube-state-metrics的指标数据的端口
containerPort: 8081
3、创建
[root@master kube-state-metrics]# kubectl apply -f ./
clusterrolebinding.rbac.authorization.k8s.io/kube-state-metrics created
clusterrole.rbac.authorization.k8s.io/kube-state-metrics created
deployment.apps/kube-state-metrics created
serviceaccount/kube-state-metrics created
service/kube-state-metrics created
4、查看
#查看是否运行成功
[root@master kube-state-metrics]# kubectl get pod -n kube-system -owide |grep kube-state-metrics
kube-state-metrics-5f84848c58-v7v9z 1/1 Running 0 50m 10.100.166.135 node1 <none> <none>
[root@master kube-state-metrics]# kubectl get svc -n kube-system |grep kube-state-metrics
kube-state-metrics ClusterIP None <none> 8080/TCP,8081/TCP 50m
这个镜像是特别注意的,因为前缀是k8s.gcr.io
,所以是pull
不了的,改为bitnami/kube-state-metrics:2.1.1
,之后使用docker tag
改名为k8s.gcr.io/kube-state-metrics/kube-state-metrics:v2.1.1
[root@node1 ~]# docker pull bitnami/kube-state-metrics:2.1.1
[root@node1 ~]# docker tag f0db7c5a6de8 k8s.gcr.io/kube-state-metrics/kube-state-metrics:v2.1.1
[root@node1 ~]# docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
bitnami/kube-state-metrics 2.1.1 f0db7c5a6de8 Less than a second ago 121MB
k8s.gcr.io/kube-state-metrics/kube-state-metrics v2.1.1 f0db7c5a6de8 Less than a second ago 121MB
registry.aliyuncs.com/k8sxio/kube-proxy v1.19.5 6e5666d85a31 7 months ago 118MB
5、验证指标是否采集成功 请求kube-state-metrics
的pod ip+8080
端口:
[root@master kube-state-metrics]# curl 10.100.166.135:8080/metrics
Prometheus接入
1、kube-state-metrics创建的svc是ClusterIP
类型,只能被集群内部访问,所以prometheus
需要部署在集群节点上,不然IP可能无法访问:
- job_name: 'kube-state-metrics'
metrics_path: metrics
kubernetes_sd_configs:
- role: pod
api_server: https://apiserver.simon:6443
bearer_token_file: /tools/token.k8s
tls_config:
insecure_skip_verify: true
bearer_token_file: /tools/token.k8s
tls_config:
insecure_skip_verify: true
relabel_configs:
- source_labels: [__meta_kubernetes_pod_name]
action: replace
target_label: pod
- action: labelmap
regex: __meta_kubernetes_pod_label_(.+)
- source_labels: [__meta_kubernetes_pod_ip]
regex: (.+)
target_label: __address__
replacement: ${1}:8080
- source_labels: ["__meta_kubernetes_pod_container_name"]
regex: "^kube-state-metrics.*"
action: keep
抓取指标列表:
kube_limitrange{}
kube_replicaset_created{}
kube_persistentvolumeclaim_status_phase{}
kube_pod_container_status_terminated{}
kube_secret_info{}
kube_service_info{}
kube_daemonset_status_observed_generation{}
kube_node_role{}
kube_persistentvolume_claim_ref{}
kube_pod_start_time{}
kube_configmap_info{}
kube_daemonset_created{}
kube_endpoint_address_not_ready{}
kube_node_created{}
kube_pod_init_container_status_waiting{}
kube_secret_metadata_resource_version{}
kube_pod_container_resource_requests{}
kube_pod_status_ready{}
kube_secret_created{}
kube_persistentvolume_capacity_bytes{}
kube_persistentvolumeclaim_info{}
kube_pod_status_reason{}
kube_secret_type{}
kube_deployment_spec_strategy_rollingupdate_max_unavailable{}
kube_deployment_status_condition{}
kube_pod_container_status_ready{}
kube_pod_created{}
kube_deployment_spec_replicas{}
kube_ingress_metadata_resource_version{}
kube_ingress_tls{}
kube_persistentvolumeclaim_resource_requests_storage_bytes{}
kube_deployment_status_replicas{}
kube_limitrange_created{}
kube_namespace_status_phase{}
kube_node_info{}
kube_endpoint_address_available{}
kube_ingress_labels{}
kube_pod_init_container_status_restarts_total{}
kube_daemonset_status_number_unavailable{}
kube_endpoint_created{}
kube_pod_status_phase{}
kube_deployment_spec_strategy_rollingupdate_max_surge{}
kube_deployment_status_replicas_available{}
kube_node_spec_unschedulable{}
kube_deployment_metadata_generation{}
kube_lease_renew_time{}
kube_node_status_capacity{}
kube_persistentvolumeclaim_access_mode{}
kube_daemonset_status_updated_number_scheduled{}
kube_namespace_created{}
kube_persistentvolume_status_phase{}
kube_pod_container_status_running{}
kube_daemonset_metadata_generation{}
kube_node_status_allocatable{}
kube_pod_container_resource_limits{}
kube_pod_init_container_status_terminated_reason{}
kube_configmap_created{}
kube_ingress_path{}
kube_pod_restart_policy{}
kube_replicaset_status_ready_replicas{}
kube_namespace_labels{}
kube_pod_status_scheduled_time{}
kube_configmap_metadata_resource_version{}
kube_pod_info{}
kube_pod_spec_volumes_persistentvolumeclaims_info{}
kube_replicaset_owner{}
kube_pod_owner{}
kube_pod_status_scheduled{}
kube_daemonset_labels{}
kube_deployment_created{}
kube_deployment_spec_paused{}
kube_persistentvolume_info{}
kube_pod_container_status_restarts_total{}
kube_pod_init_container_status_ready{}
kube_service_created{}
kube_persistentvolume_labels{}
kube_daemonset_status_number_available{}
kube_node_spec_taint{}
kube_pod_completion_time{}
kube_pod_container_info{}
kube_pod_init_container_status_running{}
kube_replicaset_labels{}
kube_daemonset_status_number_ready{}
kube_deployment_status_observed_generation{}
kube_ingress_info{}
kube_node_labels{}
kube_pod_container_status_terminated_reason{}
kube_pod_init_container_info{}
kube_daemonset_status_number_misscheduled{}
kube_deployment_status_replicas_updated{}
kube_endpoint_info{}
kube_endpoint_labels{}
kube_secret_labels{}
kube_deployment_status_replicas_unavailable{}
kube_lease_owner{}
kube_pod_container_status_waiting{}
kube_daemonset_status_current_number_scheduled{}
kube_ingress_created{}
kube_replicaset_metadata_generation{}
kube_deployment_labels{}
kube_node_status_condition{}
kube_pod_container_status_last_terminated_reason{}
kube_pod_init_container_status_terminated{}
kube_service_spec_type{}
kube_persistentvolumeclaim_labels{}
kube_pod_container_state_started{}
kube_pod_labels{}
kube_replicaset_status_observed_generation{}
kube_service_labels{}
kube_daemonset_status_desired_number_scheduled{}
kube_pod_spec_volumes_persistentvolumeclaims_readonly{}
kube_replicaset_status_replicas{}
kube_replicaset_spec_replicas{}
kube_replicaset_status_fully_labeled_replicas{}
2、Prometheus target中检查是否成功接入:
dashboard配置
导入14518 dashboard
,kube-state-metrics
性能监控指标就展示到模板上,如下图: