Kubernetes实录系列记录文档完整目录参考: Kubernetes实录-目录
相关记录链接地址 :
上一篇 记录Kubernetes集群内部prometheus子系统的各种用来收集监控数据的metrics客户端服务的部署配置。本篇记录用来在Kubernetes集群内部部署prometheus并配置job收集上一篇笔记中部署的exporter的监控指标。
一、prometheus收集监控指标的job
上一篇介绍了cAdvisor,kubelet,node-exporter,kube-state-metrics,blackbox-exporter的配置,本篇笔记配置prometheus收集这些exporter的监控指标。关于prometheus监控系统这里不做进一步的介绍,具体参考官方文档。
job_name按照4个类别区分:
- 集群组件监控指标数据获取,例如etcd,apiserver,controller-manager,scheduler,kube-proxy
job_name命名规则:kubernetes-component-* - 集群节点级别监控指标数据获取,例如cAdvisor,kubelet,node-exporter
job_name命名规则:kubernetes-node-* - 集群系统级服务监控指标数据获取,例如kube-dns,blackbox-exporter,grafana,kube-state-metrics,
job_name命名规则:暂时全部在kubernetes-service-endpoints - 集群内部署的业务应用监控指标数获取,例如tcp探测,http探测,以及应用主动暴露的应用监控指标metrics
job_name命名规则:applications-service-*
1. job_name: kubernetes-component-apiservers
用途:从kubernetes集群的apiserver服务收集监控指标数据。
- job_name: 'kubernetes-component-apiservers'
kubernetes_sd_configs:
- role: endpoints
scheme: https
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
relabel_configs:
- source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
action: keep
regex: default;kubernetes;https
2. job_name: kubernetes-component-controller-manager
用途:从kubernetes集群的kube-controller-manager服务收集监控指标数据。
cat kubernetes-controller-manager-svc.yaml
apiVersion: v1
kind: Service
metadata:
namespace: kube-system
name: controller-manager
labels:
k8s-app: kube-controller-manager
annotations:
prometheus.io/scrape: 'true'
spec:
selector:
component: kube-controller-manager
type: ClusterIP
clusterIP: None
ports:
- name: kubernetes-components-metrics
port: 10252
targetPort: 10252
protocol: TCP
kubectl apply -f kubernetes-controller-manager-svc.yaml
service/controller-manager created
- job_name: 'kubernetes-component-controller-manager'
kubernetes_sd_configs:
- role: endpoints
scheme: https
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
relabel_configs:
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape, __meta_kubernetes_namespace, __meta_kubernetes_service_name]
regex: true;kube-system;controller-manager
action: keep
3. job_name: kubernetes-component-scheduler
用途:从kubernetes集群的kube-scheduler服务收集监控指标数据。
cat kubernetes-scheduler-svc.yaml
apiVersion: v1
kind: Service
metadata:
namespace: kube-system
name: kube-scheduler
labels:
k8s-app: kube-scheduler
annotations:
prometheus.io/scrape: 'true'
spec:
selector:
component: kube-scheduler
type: ClusterIP
clusterIP: None
ports:
- name: kubernetes-components-metrics
port: 10251
targetPort: 10251
protocol: TCP
kubectl apply -f ../components/kubernetes-scheduler-svc.yaml
service/kube-scheduler created
- job_name: 'kubernetes-component-scheduler'
kubernetes_sd_configs:
- role: endpoints
scheme: http
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
relabel_configs:
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape, __meta_kubernetes_namespace, __meta_kubernetes_service_name]
regex: true;kube-system;kube-scheduler
action: keep
4. job_name: kubernetes-component-proxy
用途:从kubernetes集群的kube-proxy服务收集监控指标数据。
kubernetes-proxy的metrics监控端口是10249,kubeadmin部署时默认监听地址是127.0.0.1,需要修改为0.0.0.0
kubectl edit cm kube-proxy -n kube-system
metricsBindAddress: 0.0.0.0:10249
# 然后重启kube-proxy服务,通过kubectl delete pod kube-proxy-xxxx
apiVersion: v1
kind: Service
metadata:
namespace: kube-system
name: kube-proxy
labels:
k8s-app: kube-proxy
annotations:
prometheus.io/scrape: 'true'
spec:
selector:
k8s-app: kube-proxy
type: ClusterIP
clusterIP: None
ports:
- name: kubernetes-components-metrics
port: 10249
targetPort: 10249
protocol: TCP
kubectl apply -f ../components/kubernetes-proxy-svc.yaml
service/kube-proxy created
- job_name: 'kubernetes-component-proxy'
kubernetes_sd_configs:
- role: endpoints
scheme: http
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
relabel_configs:
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape, __meta_kubernetes_namespace, __meta_kubernetes_service_name]
regex: true;kube-system;kube-proxy
action: keep
5. job_name: kubernetes-component-etcd
类型:集群组件监控指标
etcd监控指标测试:
curl https://10.99.12.201:2379/metrics -k --cert /etc/kubernetes/pki/etcd/server.crt --key /etc/kubernetes/pki/etcd/server.key
# HELP etcd_debugging_mvcc_db_compaction_pause_duration_milliseconds Bucketed histogram of db compaction pause duration.
# TYPE etcd_debugging_mvcc_db_compaction_pause_duration_milliseconds histogram
etcd_debugging_mvcc_db_compaction_pause_duration_milliseconds_bucket{le="1"} 0
etcd_debugging_mvcc_db_compaction_pause_duration_milliseconds_bucket{le="2"} 0
etcd_debugging_mvcc_db_compaction_pause_duration_milliseconds_bucket{le="4"} 0
... 省略很多
kubeadm配置的kubernetes集群,etcd是以static pod的形式启动的,没有service及对应endpoint可供kubernetes集群内的prometheus访问。需要配置如下
kubectl -n kube-system create secret generic etcd-certs --from-file=/etc/kubernetes/pki/etcd/server.crt --from-file=/etc/kubernetes/pki/etcd/server.key
# prometheus 添加加载etcd证书的volume
cat prometheus-deployment.yaml
volumeMounts:
- mountPath: "/etc/etcd-certs"
name: etcd-certs
volumes:
- name: etcd-certs
secret:
secretName: etcd-certs
cat kubernetes-etcd-svc.yaml
apiVersion: v1
kind: Service
metadata:
namespace: kube-system
name: etcd-cluster
labels:
component: etcd
annotations:
prometheus.io/scrape: 'true'
spec:
selector:
component: etcd
type: ClusterIP
clusterIP: None
ports:
- name: kubernetes-components-metrics
port: 2379
targetPort: 2379
protocol: TCP
kubectl apply -f kubernetes-etcd-svc.yaml
service/etcd-cluster created
kubectl get -f kubernetes-etcd-svc.yaml
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
etcd-cluster ClusterIP None <none> 2379/TCP 12s
prometheus-configmap,prometheus-deployment重新加载定义配置。
kubectl apply -f prometheus-configmap.yaml
kubectl apply -f prometheus-deployment.yaml
- job_name: 'kubernetes-component-etcd'
kubernetes_sd_configs:
- role: endpoints
scheme: https
tls_config:
cert_file: /etc/etcd-certs/server.crt
key_file: /etc/etcd-certs/server.key
insecure_skip_verify: true
relabel_configs:
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape, __meta_kubernetes_namespace, __meta_kubernetes_service_name]
regex: true;kube-system;etcd-cluster
action: keep
6. job_name: kubernetes-node-cadvisor
用途:从kubernetes集群节点cAdvisor收集监控指标数据
- job_name: 'kubernetes-node-cadvisor'
kubernetes_sd_configs:
- role: node
scheme: https
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
relabel_configs:
- action: labelmap
regex: __meta_kubernetes_node_label_(.+)
- target_label: __address__
# 地址替换为kubernetes api service 集群内部域名.
replacement: kubernetes.default.svc:443
- source_labels: [__meta_kubernetes_node_name]
regex: (.+)
target_label: __metrics_path__
replacement: /api/v1/nodes/${1}/proxy/metrics/cadvisor
7. job_name: kubernetes-node-kubelet
用途:从kubernetes集群节点的kubelet服务获取监控指标数据
- job_name: 'kubernetes-node-kubelet'
kubernetes_sd_configs:
- role: node
scheme: https
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
relabel_configs:
- action: labelmap
regex: __meta_kubernetes_node_label_(.+)
- target_label: __address__
replacement: kubernetes.default.svc:443
- source_labels: [__meta_kubernetes_node_name]
regex: (.+)
target_label: __metrics_path__
replacement: /api/v1/nodes/${1}/proxy/metrics
8. job_name: kubernetes-node-exporter
用途:从kubernetes集群每个节点部署的node-exporter收集监控指标数据
- job_name: 'kubernetes-node-exporters'
kubernetes_sd_configs:
- role: endpoints
relabel_configs:
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape, __meta_kubernetes_namespace, __meta_kubernetes_service_name]
regex: true;kube-system;prometheus-node-exporter
action: keep
- source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]
action: replace
target_label: __address__
regex: ([^:]+)(?::\d+)?;(\d+)
replacement: $1:$2
- action: labelmap
regex: __meta_kubernetes_service_label_(.+)
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_service_name]
action: replace
target_label: kubernetes_name
用于:自动发现并收集符合过滤条件的应用service内收集应用的监控指标数据。kubernetes内配置的service资源只要 annotations 中定义标注 prometheus.io/scrape: 'true'
,都会被prometheus发现并采集监控指标。
备注:通过过滤条件drop掉系统组件服务的监控收集(另外的job会收集这些数据)。
- job_name: 'kubernetes-service-endpoints'
kubernetes_sd_configs:
- role: endpoints
relabel_configs:
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
action: keep
regex: true
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]
action: replace
regex: (https?)
target_label: __scheme__
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
action: replace
target_label: __metrics_path__
regex: (.+)
- source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]
action: replace
target_label: __address__
regex: ([^:]+)(?::\d+)?;(\d+)
replacement: $1:$2
- action: labelmap
regex: __meta_kubernetes_service_label_(.+)
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_service_name]
action: replace
target_label: kubernetes_name
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape, __meta_kubernetes_namespace, __meta_kubernetes_endpoint_port_name]
# 匹配过滤drop掉系统组件服务的监控收集(另外的job会收集这些数据)
regex: true;kube-system;kubernetes-components-metrics
action: drop
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape, __meta_kubernetes_namespace, __meta_kubernetes_service_name]
# 匹配过滤drop掉node-exporter监控指标数据收集(另外的job会收集这些数据)
regex: true;kube-system;prometheus-node-exporter
action: drop
10. job_name: applications-service-metrics
- job_name: 'applications-service-metrics'
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
kubernetes_sd_configs:
- role: endpoints
relabel_configs:
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape, __meta_kubernetes_service_annotation_prometheus_io_app_metrics]
regex: true;true
action: keep
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_app_metrics_path]
action: replace
target_label: __metrics_path__
regex: (.+)
- source_labels: [__meta_kubernetes_pod_ip, __meta_kubernetes_service_annotation_prometheus_io_app_metrics_port]
action: replace
target_label: __address__
regex: (.+);(.+)
replacement: $1:$2
- action: labelmap
regex: __meta_kubernetes_service_annotation_prometheus_io_app_info_(.+)
11. job_name: applications-service-http-probe
数据来源于blackbox-exporter。job_name的编辑来自博客:https://blog.csdn.net/liukuan73/article/details/78881008
- job_name: 'applications-service-http-probe'
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
kubernetes_sd_configs:
- role: service
metrics_path: /probe
params:
module: [http_2xx]
relabel_configs:
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape, __meta_kubernetes_service_annotation_prometheus_io_http_probe]
regex: true;true
action: keep
- source_labels: [__meta_kubernetes_service_name, __meta_kubernetes_namespace, __meta_kubernetes_service_annotation_prometheus_io_http_probe_port, __meta_kubernetes_service_annotation_prometheus_io_http_probe_path]
action: replace
target_label: __param_target
regex: (.+);(.+);(.+);(.+)
replacement: $1.$2:$3$4
#- source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_http_probe_path]
# action: replace
# target_label: __param_target
# regex: (.+);(.+)
# replacement: $1$2
- target_label: __address__
replacement: blackbox-exporter:9115
- source_labels: [__param_target]
target_label: instance
- action: labelmap
regex: __meta_kubernetes_service_annotation_prometheus_io_app_info_(.+)
#- source_labels: [__meta_kubernetes_namespace]
# target_label: kubernetes_namespace
#- source_labels: [__meta_kubernetes_service_name]
# target_label: kubernetes_name
12. job_name: applicatiojns-service-tcp-probe
数据来源于blackbox-exporter。job_name的编辑来自博客:https://blog.csdn.net/liukuan73/article/details/78881008
- job_name: 'applications-service-tcp-probe'
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
kubernetes_sd_configs:
- role: service
metrics_path: /probe
params:
module: [tcp_connect]
relabel_configs:
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape, __meta_kubernetes_service_annotation_prometheus_io_tcp_probe]
regex: true;true
action: keep
- source_labels: [__meta_kubernetes_service_name, __meta_kubernetes_namespace, __meta_kubernetes_service_annotation_prometheus_io_tcp_probe_port]
action: replace
target_label: __param_target
regex: (.+);(.+);(.+)
replacement: $1.$2:$3
#- source_labels: [__address__]
# target_label: __param_target
- target_label: __address__
replacement: blackbox-exporter:9115
- source_labels: [__param_target]
target_label: instance
- action: labelmap
regex: __meta_kubernetes_service_annotation_prometheus_io_app_info_(.+)
二、在Kubernetes集群内部署prometheus
mkdir prometheus
cd prometheus
tree
├── prometheus-configmap.yaml #配置文件
├── prometheus-deployment.yaml #部署定义文件
├── prometheus-ingress.yaml #traefik代理配置
├── prometheus-rbac.yaml #角色与权限
└── prometheus-service.yaml #服务定义
2.1 prometheus-configmap
该配置文件就是包含收集监控指标job_name的相关参数,以及全局参数。具体参数参考前面的介绍
apiVersion: v1
kind: ConfigMap
metadata:
name: prometheus-config
namespace: kube-system
data:
prometheus.yml: |-
global:
scrape_interval: 10s
evaluation_interval: 10s
scrape_configs:
- job_name: 'kubernetes-component-apiservers'
... #省略,参考上面介绍
- job_name: 'kubernetes-component-controller-manager'
... #省略,参考上面介绍
- job_name: 'kubernetes-component-scheduler'
... #省略,参考上面介绍
- job_name: 'kubernetes-component-proxy'
... #省略,参考上面介绍
- job_name: 'kubernetes-component-etcd'
... #省略,参考上面介绍
- job_name: 'kubernetes-node-cadvisor'
... #省略,参考上面介绍
- job_name: 'kubernetes-node-kubelet'
... #省略,参考上面介绍
- job_name: 'kubernetes-node-exporter'
... #省略,参考上面介绍
- job_name: 'kubernetes-service-endpoints'
... #省略,参考上面介绍
- job_name: 'applications-service-metrics'
... #省略,参考上面介绍
- job_name: 'applications-service-http-probe'
... #省略,参考上面介绍
- job_name: 'applications-service-tcp-probe'
... #省略,参考上面介绍
2.2 kubernetes集群内部部署prometheus监控服务
2.2.1 prometheus-rbac.yaml
- 定义文件
cat prometheus-rbac.yaml apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: name: prometheus rules: - apiGroups: [""] resources: - nodes - nodes/proxy - services - endpoints - pods verbs: ["get", "list", "watch"] - apiGroups: - extensions resources: - ingresses verbs: ["get", "list", "watch"] - nonResourceURLs: ["/metrics"] verbs: ["get"] --- apiVersion: v1 kind: ServiceAccount metadata: name: prometheus namespace: kube-system --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: name: prometheus roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: prometheus subjects: - kind: ServiceAccount name: prometheus namespace: kube-system
- 配置指令
kubectl apply -f prometheus-rbac.yaml clusterrole.rbac.authorization.k8s.io/prometheus created serviceaccount/prometheus created clusterrolebinding.rbac.authorization.k8s.io/prometheus created kubectl get -f prometheus-rbac.yaml NAME AGE clusterrole.rbac.authorization.k8s.io/prometheus 12s NAME SECRETS AGE serviceaccount/prometheus 1 12s NAME AGE clusterrolebinding.rbac.authorization.k8s.io/prometheus 11s
2.2.2 prometheus-configmap.yaml
- 定义文件
见上面的描述 - 配置指令
kubectl apply -f prometheus-configmap.yaml -n kube-system configmap/prometheus-config created kubectl get -f prometheus-configmap.yaml NAME DATA AGE prometheus-config 1 27s
2.2.3 prometheus-deployment.yaml
- 定义文件
cat prometheus-deployment.yaml --- apiVersion: apps/v1 kind: Deployment metadata: labels: name: prometheus-deployment name: prometheus namespace: kube-system spec: replicas: 1 selector: matchLabels: app: prometheus template: metadata: labels: app: prometheus spec: containers: - image: prom/prometheus:v2.7.1 name: prometheus command: - "/bin/prometheus" args: - "--config.file=/etc/prometheus/prometheus.yml" - "--storage.tsdb.path=/prometheus" - "--storage.tsdb.retention=48h" ports: - containerPort: 9090 protocol: TCP volumeMounts: - mountPath: "/prometheus" name: data - mountPath: "/etc/prometheus" name: config-volume - mountPath: "/etc/etcd-certs" name: etcd-certs resources: requests: cpu: 200m memory: 200Mi limits: cpu: 1000m memory: 4000Mi serviceAccountName: prometheus imagePullSecrets: - name: regsecret volumes: - name: data emptyDir: {} - name: config-volume configMap: name: prometheus-config - name: etcd-certs secret: secretName: etcd-certs
- 配置指令
kubectl apply -f prometheus-deployment.yaml -n kube-system deployment.apps/prometheus created kubectl get -f prometheus-deployment.yaml -n kube-system NAME READY UP-TO-DATE AVAILABLE AGE prometheus 1/1 1 1 98s
2.2.4 prometheus-service.yaml
- 定义文件
cat prometheus-service.yaml --- kind: Service apiVersion: v1 metadata: labels: app: prometheus name: prometheus namespace: kube-system spec: selector: app: prometheus ports: - name: monitor port: 9090 targetPort: 9090
- 配置指令
kubectl apply -f prometheus-service.yaml service/prometheus created kubectl get -f prometheus-service.yaml NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE prometheus ClusterIP 10.111.188.250 <none> 9090/TCP 12s
2.2.5 prometheus-ingress.yaml
使用traefik作为集群内反向代理服务
- 定义文件
cat prometheus-ingress.yaml
---
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
name: prometheus
namespace: kube-system
spec:
rules:
- host: prometheus.ejuops.com
http:
paths:
- path: /
backend:
serviceName: prometheus
servicePort: monitor
- 配置指令
kubectl apply -f prometheus-ingress.yaml
ingress.extensions/prometheus created
kubectl get -f prometheus-ingress.yaml
NAME HOSTS ADDRESS PORTS AGE
prometheus prometheus.ejuops.com 80 14s
-
访问配置
由于 prometheus.ejuops.com 还不是一个真实存在的域名,traefik配置的daemonset模式,使用主机的80端口,通过配置hosts文件进行访问C:\Windows\System32\drivers\etc\hosts ... ... 10.99.12.201 prometheus.ejuops.com
-
浏览器访问
服务自动发现: status –> Targets,马上就可以看到所有Kubernetes集群上的Endpoint通过服务发现的方式自动连接到了Prometheus。可以看到配置的job_name状态(up)
图形化展示,或者查询:点击 graph,输入指标key
到这里prometheus就配置完成了,接下来配置监控数据的可视化展示工具grafana
参考文档:
(1) https://www.kancloud.cn/huyipow/prometheus/525005
(2) https://blog.csdn.net/liukuan73/article/details/78881008
(3) https://www.cnblogs.com/aguncn/p/9933204.html
(4) kubernetes官网