Prometheus 监控外部 Kubernetes 集群

背景

实际环境中很多企业是将 Prometheus 单独部署在集群外部的,甚至直接监控多个 Kubernetes 集群,虽然不推荐这样去做,因为 Prometheus 采集的数据量太大,或大量消耗资源,比较推荐的做法有:

  1. 用不同的 Prometheus 实例监控不同的集群,然后用一种工具(比如grafana,prometheus作为数据源)进行汇总;
  2. 搭建一个资源很大的中心prometheus,然后在各个集群各建立一个实例,然后让各个实例的数据推送到中心prometheus上。

但是使用 Prometheus 监控外部的 Kubernetes 集群这个需求还是非常有必要的。

搭建步骤

创建用户授权

下面这个文件是需要再 被prometheus监控的k8s集群 执行

$ cat prometheus-rbac.yaml


apiVersion: v1
kind: ServiceAccount
metadata:
  name: prometheus
  namespace: tools
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: prometheus
rules:
- apiGroups:
  - ""
  resources:
  - nodes
  - services
  - endpoints
  - pods
  - nodes/proxy
  verbs:
  - get
  - list
  - watch
- apiGroups:
  - "extensions"
  resources:
    - ingresses
  verbs:
  - get
  - list
  - watch
- apiGroups:
  - ""
  resources:
  - configmaps
  - nodes/metrics
  verbs:
  - get
- nonResourceURLs:
  - /metrics
  verbs:
  - get
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: prometheus
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: prometheus
subjects:
- kind: ServiceAccount
  name: prometheus
  namespace: tools

这里有个踩坑点,再创建sa(ServiceAccount)文件的时候,使用下面命令查看的时候,没有token

$ kubectl describe secret prometheus -n tools

这是因为:k8s只有在1.23及之前版本的集群中,ServiceAccount才会自动创建Secret。之后的版本需要自己创建secret,然后在跟sa绑定上

apiVersion: v1
kind: Secret
type: kubernetes.io/service-account-token
metadata:
  name: prometheus
  namespace: tools
  annotations:
    kubernetes.io/service-account.name: "prometheus"

将上述文件进行执行:

[root@iZ2ze1ut8g7ndn5d2soajcZ ~]# kubectl apply -f prometheus-rbac.yaml
serviceaccount/prometheus created
clusterrole.rbac.authorization.k8s.io/prometheus created
clusterrolebinding.rbac.authorization.k8s.io/prometheus created

相关删除命令:

kubectl delete clusterrolebinding prometheus
kubectl delete serviceaccount prometheus -n tools
kubectl delete clusterrole prometheus

获取token

$ kubectl describe secrets  prometheus -n tools

Name:         prometheus-token-whlv8
Namespace:    tools
Labels:       <none>
Annotations:  kubernetes.io/service-account.name: prometheus
              kubernetes.io/service-account.uid: dcf1bf81-4636-4511-9332-293a320c3d60

Type:  kubernetes.io/service-account-token

Data
====
token:      eyJhbGciOiJSUzI1NiIsImtpZCI6IjdSMGhJMnhFaThNSjg3SFFsRGJ1bUljR0lMbWZCR0lGWUw3SjN3WVhPT1UifQ.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJ0b29scyIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VjcmV0Lm5hbWUiOiJwcm9tZXRoZXVzLXRva2VuLXdobHY4Iiwia3ViZXJuZXRlcy5pby9zZXJ291bnQubmFtZSI6InByb21ldGhldXMiLCJrdWJlcm5ldGVzLmlvL3NlcnZpY2VhY2NvdW50L3NlcnZpY2UtYWNjb3VudC51aWQiOiJkY2YxYmY4MS00Mi0yOTNhMzIwYzNkNjAiLCJzdWIiOiJzeXN0ZW06c2VydmljZWFjY291bnQ6dG9vbHM6cHJvbWV0aGV1cyJ9.zFLLIddFuk5CfjEyFWCcNguzzmutllhtYtfuybuQdx47lQ1R_iUdMUifhySICMVJ_XcPBx1wSNVRzbikQ3DRVp4RfwxJH1vWpvX0msHa_aDzQrniEwOcg9zMNTzczJq3L8d8VengSb1_Lpri4Qnk23XlfFj2f3zgmG91nzgW276nCF4cWZfIRlHYoHgkWipqJak_GdII7dIpBpEIdy9F98uKeDwQ-meMZnBF-_KqAiQkKnsswITJV-Wn3Aofbxygqh6q1dCKJ1SrU7DMqpSKmgPFiuPSb4qxg
ca.crt:     1180 bytes
namespace:  5 bytes

创建k8s.token

prometheus容器重创建k8s.token文件

[root@monitoring prometheus]# pwd
/opt/prometheus
[root@monitoring prometheus]# vim k8s.token 
[root@monitoring prometheus]# cat k8s.token 
eyJhbGciOiJSUzI1NiIsImtpZCI6IjdSMGhJMnhFaThNSjg3SFFsRGJ1bUljR0lMbWZCR0lGWUw3SjN3WVhPT1UifQ.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJ0b29scyIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VjcmV0Lm5hbWUiOiJwcm9tZXRoZXVzLXRva2VuLXdobHY4Iiwia3ViZXJuZXRlcy5pby9zZXJ291bnQubmFtZSI6InByb21ldGhldXMiLCJrdWJlcm5ldGVzLmlvL3NlcnZpY2VhY2NvdW50L3NlcnZpY2UtYWNjb3VudC51aWQiOiJkY2YxYmY4MS00Mi0yOTNhMzIwYzNkNjAiLCJzdWIiOiJzeXN0ZW06c2VydmljZWFjY291bnQ6dG9vbHM6cHJvbWV0aGV1cyJ9.zFLLIddFuk5CfjEyFWCcNguzzmutllhtYtfuybuQdx47lQ1R_iUdMUifhySICMVJ_XcPBx1wSNVRzbikQ3DRVp4RfwxJH1vWpvX0msHa_aDzQrniEwOcg9zMNTzczJq3L8d8VengSb1_Lpri4Qnk23XlfFj2f3zgmG91nzgW276nCF4cWZfIRlHYoHgkWipqJak_GdII7dIpBpEIdy9F98uKeDwQ-meMZnBF-_KqAiQkKnsswITJV-Wn3Aofbxygqh6q1dCKJ1SrU7DMqpSKmgPFiuPSb4qxg
[root@monitoring prometheus]# 

编写prometheus-server.yml

global:
  evaluation_interval: 1m
  scrape_interval: 1m
  scrape_timeout: 10s

scrape_configs:
- job_name: prometheus
  static_configs:
  - targets:
    - localhost:9090
- job_name: "metrics-data"
  scrape_interval: 15s
  scrape_timeout: 15s
  metrics_path: '/metrics'
  static_configs:
  file_sd_configs:
  - files:
     - prometheus-metrics.yml



#API Serevr节点发现
- job_name: "alik3-apiservers-monitor"
  kubernetes_sd_configs:
  - role: endpoints
    api_server: https://xx.xx.7.xx:6443
    tls_config:
      insecure_skip_verify: true
    bearer_token_file: /opt/prometheus/k8s.token
  scheme: https
  tls_config:
    insecure_skip_verify: true
  bearer_token_file: /opt/prometheus/k8s.token
  relabel_configs:
  - source_labels: [__meta_kubernetes_namespace,__meta_kubernetes_service_name,__meta_kubernetes_endpoint_port_name]
    action: keep
    regex: default;kubernetes;https

#node节点发现
- job_name: "alik3-nodes-monitor"
  scheme: https
  tls_config:
     insecure_skip_verify: true
  bearer_token_file: /opt/prometheus/k8s.token
  kubernetes_sd_configs:
  - role: node
    api_server: https://xx.xxx.xxx:xx
    tls_config:
      insecure_skip_verify: true
    bearer_token_file: /opt/prometheus/k8s.token
  relabel_configs:
  - source_labels: [__meta_kubernetes_node_label_failure_domain_beta_kubernetes_io_region]
    regex: "(.*)"
    replacement: "${1}"
    action: replace
    target_label: LOC
  - source_labels: [__meta_kubernetes_node_label_failure_domain_beta_kubernetes_io_region]
    regex: "(.*)"
    replacement: "NODE"
    action: replace
    target_label: Type
  - action: labelmap
    regex: __meta_kubernetes_node_label_(.+)

#指定namespace 的pod
- job_name: "alik3-发现指定namespace的所有pod"
  kubernetes_sd_configs:
  - role: pod
    api_server: https://xx.xx.7.xx:xxx
    tls_config:
      insecure_skip_verify: true
    bearer_token_file: /opt/prometheus/k8s.token
    namespaces:
      names:
      - kube-system
      - business
  relabel_configs:
  - action: keep
    regex: true
    source_labels:
    - __meta_kubernetes_pod_annotation_prometheus_io_scrape
  - action: drop
    regex: true
    source_labels:
    - __meta_kubernetes_pod_annotation_prometheus_io_scrape_slow
  - action: replace
    regex: (https?)
    source_labels:
    - __meta_kubernetes_pod_annotation_prometheus_io_scheme
    target_label: __scheme__
  - action: replace
    regex: (.+)
    source_labels:
    - __meta_kubernetes_pod_annotation_prometheus_io_path
    target_label: __metrics_path__
  - action: replace
    regex: ([^:]+)(?::\d+)?;(\d+)
    replacement: $1:$2
    source_labels:
    - __address__
    - __meta_kubernetes_pod_annotation_prometheus_io_port
    target_label: __address__
  - action: labelmap
    regex: __meta_kubernetes_pod_annotation_prometheus_io_param_(.+)
    replacement: __param_$1
  - action: labelmap
    regex: __meta_kubernetes_pod_label_(.+)
  - action: replace
    source_labels:
    - __meta_kubernetes_namespace
    target_label: namespace
  - action: replace
    source_labels:
    - __meta_kubernetes_pod_name
    target_label: pod
  - action: drop
    regex: Pending|Succeeded|Failed|Completed
    source_labels:
    - __meta_kubernetes_pod_phase

#指定Pod发现条件
- job_name: "alik3-指定发现条件的pod"
  kubernetes_sd_configs:
  - role: pod
    api_server: https://xx.xx.7.xx:xxx
    tls_config:
      insecure_skip_verify: true
    bearer_token_file: /opt/prometheus/k8s.token
  relabel_configs:
  - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
    action: keep
    regex: true
  - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
    action: replace
    target_label: __metrics_path__
    regex: (.+)
  - source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
    action: replace
    regex: ([^:]+)(?::\d+)?;(\d+)
    replacement: $1:$2
    target_label: __address__
  - action: labelmap
    regex: __meta_kubernetes_pod_label_(.+)
  - source_labels: [__meta_kubernetes_namespace]
    action: replace
    target_label: kubernetes_namespace
  - source_labels: [__meta_kubernetes_pod_name]
    action: replace
    target_label: kubernetes_pod_name

配置参考以及详细信息:prometheus理论+实践

Prometheus配置详解

重启prometheus服务

如下:

在这里插入图片描述

踩坑点

问题一 : context deadline exceeded

Get "https://192.xx.xx.xx:5444/metrics": context deadline exceeded

解决办法:有可能端口未开放,指定其他端口

还有一种可能是,集群的网段不同,api-server的地址能获取到alik3-apiservers-monitoralik3-nodes-monitoralik3-发现指定namespace的所有podalik3-指定发现条件的pod 这些信息,但是里面的详细内容都是通过内网访问的,如果prometheus与监控的k8s集群网络不通的话,那确实回报这个错

参考文献

基于外部prometheus监控k8s 集群及k8s应用服务

  • 22
    点赞
  • 18
    收藏
    觉得还不错? 一键收藏
  • 1
    评论
Prometheus是一款可以用于监测和告警的开源软件,可以用来监视从Kubernetes(K8s)集群发出的指标。外部Prometheus监控K8s 1.24版本可以通过以下步骤实现: 首先,需要在K8s集群中部署Prometheus Operator。这个操作符会创建一个Prometheus实例,同时会创建ServiceMonitor和PrometheusRule对象,用来自动发现需要监控的资源(Service、Pod、Endpoint等)并配置Prometheus收集这些资源的指标。 然后,在Prometheus的配置文件中指定要监测的K8s集群地址。可以通过kubelet的metrics和API Server的监控端点来收集K8s集群的指标。可以用以下类似的配置: ``` scrape_configs: - job_name: 'kubernetes-nodes' kubernetes_sd_configs: - role: node relabel_configs: - action: labelmap regex: __meta_kubernetes_node_label_(.+) - target_label: __address__ replacement: kubernetes.default.svc:443 - source_labels: [__meta_kubernetes_node_name] target_label: __metrics_path__ replacement: /api/v1/nodes/${1}/proxy/metrics ``` 这个配置会通过kubelet endpoint获取K8s节点的指标信息,并使用`labelmap`将`__meta_kubernetes_node_label_*`标签映射到标准标签上。同时,将`__address__`和`__metrics_path__`设置为指定的节点地址和metrics端口。 最后,在Prometheus实例中添加要监视的规则和警报,可以在PrometheusRule对象中定义这些规则。可以根据需要制定警报规则和处理逻辑,若超过某些阈值则触发报警。可以用以下类似的配置: ``` groups: - name: kubernetes.rules rules: - alert: PodDown expr: absent(kube_pod_info{job="kubelet"}) > 0 for: 5m labels: severity: critical annotations: summary: "Pod {{ $labels.namespace }}/{{ $labels.pod }} down" ``` 这个配置可以定义一个警报规则,即如果某个kubelet无法获取Pod的指标信息,则视为该Pod已关闭,若该警报持续5分钟以上,则视为触发了警报。可以将其严重程度标记为`critical`,并在`annotations`中制定触发警报时的摘要信息。 以上是外部Prometheus监控K8s 1.24版本实现的部分步骤,具体实现还需要根据具体的情况进行细节调整。

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值