Prometheus-Oerator

Prometheus Operator 是一个 Kubernetes 的运算符(Operator),它用于简化在 Kubernetes 上部署、管理和操作 Prometheus 及相关组件的过程。

Prometheus Operator 提供了一种声明式的方式来定义和管理 Prometheus 实例、ServiceMonitors、Alertmanagers 和其他与 Prometheus 相关的资源。它使用自定义资源定义(Custom Resource Definitions,CRDs)来扩展 Kubernetes API,并通过控制器(Controller)管理这些资源的生命周期。

以下是 Prometheus Operator 的一些主要功能和概念:

  • Prometheus 实例管理: Prometheus Operator 允许你通过创建 Prometheus 自定义资源(Prometheus CRD)来定义和部署 Prometheus 实例。你可以指定监测目标、配置规则和警报规则等信息。
  • ServiceMonitor 管理: 通过创建 ServiceMonitor 自定义资源,你可以定义要由 Prometheus 监测的服务和端点。Prometheus Operator 将根据这些定义自动配置 Prometheus 实例来收集指标。
  • Alertmanager 管理: Prometheus Operator 还支持创建和管理 Alertmanager 实例。你可以使用 Alertmanager CRD 来定义警报规则并配置警报通知的接收者。
  • 自动发现和自动配置: Prometheus Operator 使用 Kubernetes 的标签查询机制和服务发现特性,从而实现自动发现和配置需要监测的目标。当你添加或删除带有特定标签的 Pod 或服务时,Prometheus Operator 将相应地更新 Prometheus 配置。
  • 水平伸缩和高可用性: Prometheus Operator 具有内置的水平伸缩支持,可以自动根据工作负载的变化调整 Prometheus 实例的数量。此外,它还支持在集群中自动部署多个实例以实现高可用性。

使用 Prometheus Operator 可以简化 Prometheus 的运维过程,并提供了一种基于 Kubernetes 原生特性的方式来管理和监控应用程序。它使得在 Kubernetes 集群中部署和管理 Prometheus 变得更加方便、灵活和可靠。

其中Prometheus资源描述了 Prometheus部署,而ServiceMonitorPodMonitor资源 描述prometheus监控的服务。

版本支持

kube-prometheus版本

kubenetes版本

release-0.4

1.16、1.17

release-0.5

1.18

release-0.6

1.18、1.19

release-0.7

1.19、1.20

release-0.8

1.20、1.21

release-0.9

1.21、1.22

release-0.10

1.22、1.23

release-0.11

1.23、1.24

main

1.24

参考连接:https://github.com/prometheus-operator/kube-prometheus#compatibility

快速开始

环境概览
[root@master app]# kubectl get node
NAME     STATUS   ROLES                  AGE   VERSION
master   Ready    control-plane,master   54d   v1.20.0
node2    Ready    <none>                 11m   v1.20.0
node3    Ready    <none>                 11m   v1.20.0
下载kube-prometheus
git clone  -b release-0.8  https://github.com/prometheus-operator/kube-prometheus.git
kube-prometheus]# ls
build.sh            DCO   example.jsonnet  experimental  go.sum  jsonnet           jsonnetfile.lock.json  LICENSE   manifests  README.md  sync-to-internal-registry.jsonnet  test.sh
code-of-conduct.md  docs  examples         go.mod        hack    jsonnetfile.json  kustomization.yaml     Makefile  NOTICE     scripts    tests
部署文件清单
kube-prometheus]# tree manifests/ 
manifests/
├── alertmanager-alertmanager.yaml
├── alertmanager-podDisruptionBudget.yaml
├── alertmanager-prometheusRule.yaml
├── alertmanager-secret.yaml
├── alertmanager-serviceAccount.yaml
├── alertmanager-serviceMonitor.yaml
├── alertmanager-service.yaml
├── blackbox-exporter-clusterRoleBinding.yaml
├── blackbox-exporter-clusterRole.yaml
├── blackbox-exporter-configuration.yaml
├── blackbox-exporter-deployment.yaml
├── blackbox-exporter-serviceAccount.yaml
├── blackbox-exporter-serviceMonitor.yaml
├── blackbox-exporter-service.yaml
├── grafana-dashboardDatasources.yaml
├── grafana-dashboardDefinitions.yaml
├── grafana-dashboardSources.yaml
├── grafana-deployment.yaml
├── grafana-serviceAccount.yaml
├── grafana-serviceMonitor.yaml
├── grafana-service.yaml
├── kube-prometheus-prometheusRule.yaml
├── kubernetes-prometheusRule.yaml
├── kubernetes-serviceMonitorApiserver.yaml
├── kubernetes-serviceMonitorCoreDNS.yaml
├── kubernetes-serviceMonitorKubeControllerManager.yaml
├── kubernetes-serviceMonitorKubelet.yaml
├── kubernetes-serviceMonitorKubeScheduler.yaml
├── kube-state-metrics-clusterRoleBinding.yaml
├── kube-state-metrics-clusterRole.yaml
├── kube-state-metrics-deployment.yaml
├── kube-state-metrics-prometheusRule.yaml
├── kube-state-metrics-serviceAccount.yaml
├── kube-state-metrics-serviceMonitor.yaml
├── kube-state-metrics-service.yaml
├── node-exporter-clusterRoleBinding.yaml
├── node-exporter-clusterRole.yaml
├── node-exporter-daemonset.yaml
├── node-exporter-prometheusRule.yaml
├── node-exporter-serviceAccount.yaml
├── node-exporter-serviceMonitor.yaml
├── node-exporter-service.yaml
├── prometheus-adapter-apiService.yaml
├── prometheus-adapter-clusterRoleAggregatedMetricsReader.yaml
├── prometheus-adapter-clusterRoleBindingDelegator.yaml
├── prometheus-adapter-clusterRoleBinding.yaml
├── prometheus-adapter-clusterRoleServerResources.yaml
├── prometheus-adapter-clusterRole.yaml
├── prometheus-adapter-configMap.yaml
├── prometheus-adapter-deployment.yaml
├── prometheus-adapter-podDisruptionBudget.yaml
├── prometheus-adapter-roleBindingAuthReader.yaml
├── prometheus-adapter-serviceAccount.yaml
├── prometheus-adapter-serviceMonitor.yaml
├── prometheus-adapter-service.yaml
├── prometheus-clusterRoleBinding.yaml
├── prometheus-clusterRole.yaml
├── prometheus-operator-prometheusRule.yaml
├── prometheus-operator-serviceMonitor.yaml
├── prometheus-podDisruptionBudget.yaml
├── prometheus-prometheusRule.yaml
├── prometheus-prometheus.yaml
├── prometheus-roleBindingConfig.yaml
├── prometheus-roleBindingSpecificNamespaces.yaml
├── prometheus-roleConfig.yaml
├── prometheus-roleSpecificNamespaces.yaml
├── prometheus-serviceAccount.yaml
├── prometheus-serviceMonitor.yaml
├── prometheus-service.yaml
└── setup
    ├── 0namespace-namespace.yaml
    ├── prometheus-operator-0alertmanagerConfigCustomResourceDefinition.yaml
    ├── prometheus-operator-0alertmanagerCustomResourceDefinition.yaml
    ├── prometheus-operator-0podmonitorCustomResourceDefinition.yaml
    ├── prometheus-operator-0probeCustomResourceDefinition.yaml
    ├── prometheus-operator-0prometheusCustomResourceDefinition.yaml
    ├── prometheus-operator-0prometheusruleCustomResourceDefinition.yaml
    ├── prometheus-operator-0servicemonitorCustomResourceDefinition.yaml
    ├── prometheus-operator-0thanosrulerCustomResourceDefinition.yaml
    ├── prometheus-operator-clusterRoleBinding.yaml
    ├── prometheus-operator-clusterRole.yaml
    ├── prometheus-operator-deployment.yaml
    ├── prometheus-operator-serviceAccount.yaml
    └── prometheus-operator-service.yaml

1 directory, 83 files
执行编排文件
kube-prometheus]# kubectl apply -f  manifests/setup/
kube-prometheus]# kubectl apply -f  manifests/

首先创建名称空间和CRDs,避免在部署其他组件时出现竞争

查看pod状态
kube-prometheus]# kubectl get pod -n monitoring
NAME                                   READY   STATUS             RESTARTS   AGE
alertmanager-main-0                    2/2     Running            0          97m
alertmanager-main-1                    2/2     Running            0          97m
alertmanager-main-2                    2/2     Running            0          97m
blackbox-exporter-55c457d5fb-nnzrj     3/3     Running            0          143m
grafana-9df57cdc4-vfrtk                1/1     Running            0          143m
kube-state-metrics-76f6cb7996-lbzfn    2/3     ImagePullBackOff   0          143m
node-exporter-ks772                    2/2     Running            0          143m
node-exporter-lpdxd                    2/2     Running            0          143m
node-exporter-z5mwv                    2/2     Running            0          143m
prometheus-adapter-59df95d9f5-kh82l    1/1     Running            18         143m
prometheus-adapter-59df95d9f5-kmmpt    1/1     Running            18         143m
prometheus-k8s-0                       2/2     Running            1          97m
prometheus-k8s-1                       2/2     Running            1          97m
prometheus-operator-7775c66ccf-nwbrw   2/2     Running            18         143m

首先查看kube-state-metrics服务镜像没拉取成功

vim manifests/kube-state-metrics-deployment.yaml
bitnami/kube-state-metrics替代
k8s.gcr.io/kube-state-metrics/kube-state-metrics:v2.0.0
查看创建的CRD
manifests]# kubectl get crd | grep monitoring.coreos.com 
alertmanagerconfigs.monitoring.coreos.com   2023-12-04T06:08:05Z
alertmanagers.monitoring.coreos.com         2023-12-04T06:08:05Z
podmonitors.monitoring.coreos.com           2023-12-04T06:08:05Z
probes.monitoring.coreos.com                2023-12-04T06:08:05Z
prometheuses.monitoring.coreos.com          2023-12-04T06:08:05Z
prometheusrules.monitoring.coreos.com       2023-12-04T06:08:05Z
servicemonitors.monitoring.coreos.com       2023-12-04T06:08:05Z
thanosrulers.monitoring.coreos.com          2023-12-04T06:08:06Z

CRD说明:

alertmanagerconfigs:

  • 该CRD定义了alertmanager配置的一部分,主要是定义告警路由相关

alertmanagers:

  • 该CRD定义了在k8s集群中运行的Alertmanager的配置,同样提供了多种配置,包含持久化存储。
  • 对于每个Alertmanager资源,Operator都会在相同的名称空间中部署一个对应配置的sts,Alertmanager pod被配置为一个包含名为alertmanager-name的secret,该secret以alertmanager.yaml为key的方式保存使用的配置文件

podmonitors:

  • 该CRD用于定义如何监控一组动态pod,使用标签来定义那些pod被选择进行监控。

probes:

  • 该CRD用于定义如何监控一组Ingress和静态目标,除了target之外,Probe对象还需要一个Prober,它是监控的目标并为prometheus提供指标的服务,例如可以通过使用blackbox-exporter来提供这个服务

prometheuses:

  • 该CRD声明定义了Prometheus期望在k8s集群中运行的配置,提供了配置选项来配置副本、持久化、报警等
  • 对于每个Prometheus CRD资源,Operator都会以sts形式在形同的名称空间下部署对应配置,proemtheus pod的配置是通过一个包含prometheus配置的名为prometheus-name的secret对象声明挂载的
  • 该CRD根据标签选择来指定部署到prometheus实例应该覆盖那些ServiceMonitors,然后Operator会根据包含的ServiceMonitor生成配置,并在包含配置的secret中进行更新

prometheusrules:

  • 用于配置prometheus的rule规则文件,包括recording rule和alerting,可以自动被prometheus加载

servicemonitors:

  • 该CRD定义了如何监控一组动态的服务,使用标签来定义那些service被选择进行监控
  • 为了让Prometheus金控k8s内的任何应用,需要存在一个Endpoints对象,Endpoints对象本质上时IP地址的列表,通常Endpoints对象是由Service对象自动填充的,Service对象通过标签选择器匹配pod,并将其添加到Endpoints对象中,一个Service可以暴露一个或多个端口,这些端口由多个Endpoints列表支持,这些端点一般情况下都是指向一个pod
  • 注意:endpoints是ServiceMonitor CRD中的字段,Endpoints是k8s的一种对象

thanosrulers:

  • 该CRD定义了一个Thanos Ruler组件的配置,以方便在k8s集群中运行,通过Thanos Ruler,可以跨多个Proemtheus实例处理记录和报警规则
  • 一个ThanosRuler实例至少需要一个queryEndpoint,它指向Thanos Queriers或prometheus实例的位置,queryEndpoints用于配置Thanos运行时的--query参数
验证配置nodeport

为了可以从外部访问prometheus,alertmanager,grafana,我们这里修改promethes,alertmanager,grafana的service类型为NodePort类型。

修改prometheus的service

vim prometheus-service.yaml 
apiVersion: v1
kind: Service
metadata:
  labels:
    app.kubernetes.io/component: prometheus
    app.kubernetes.io/name: prometheus
    app.kubernetes.io/part-of: kube-prometheus
    app.kubernetes.io/version: 2.26.0
    prometheus: k8s
  name: prometheus-k8s
  namespace: monitoring
spec:
  type: NodePort # 新增
  ports:
  - name: web
    port: 9090
    targetPort: web
    nodePort: 30010 # 新增
  selector:
    app: prometheus
    app.kubernetes.io/component: prometheus
    app.kubernetes.io/name: prometheus
    app.kubernetes.io/part-of: kube-prometheus
    prometheus: k8s
  sessionAffinity: ClientIP

vim alertmanager-service.yaml
apiVersion: v1
kind: Service
metadata:
  labels:
    alertmanager: main
    app.kubernetes.io/component: alert-router
    app.kubernetes.io/name: alertmanager
    app.kubernetes.io/part-of: kube-prometheus
    app.kubernetes.io/version: 0.21.0
  name: alertmanager-main
  namespace: monitoring
spec:
  type: NodePort # 新增
  ports:
  - name: web
    port: 9093
    targetPort: web
    nodePort: 30020 # 新增
  selector:
    alertmanager: main
    app: alertmanager
    app.kubernetes.io/component: alert-router
    app.kubernetes.io/name: alertmanager
    app.kubernetes.io/part-of: kube-prometheus
  sessionAffinity: ClientIP

vim grafana-service.yaml 
apiVersion: v1
kind: Service
metadata:
  labels:
    app.kubernetes.io/component: grafana
    app.kubernetes.io/name: grafana
    app.kubernetes.io/part-of: kube-prometheus
    app.kubernetes.io/version: 7.5.4
  name: grafana
  namespace: monitoring
spec:
  type: NodePort # 新增
  ports:
  - name: http
    port: 3000
    targetPort: http
    nodePort: 30030 # 新增
  selector:
    app.kubernetes.io/component: grafana
    app.kubernetes.io/name: grafana
    app.kubernetes.io/part-of: kube-prometheus

kubectl apply -f prometheus-service.yaml
kubectl apply -f alertmanager-service.yaml
kubectl apply -f grafana-service.yaml

manifests]# kubectl get svc -n monitoring
NAME                    TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)                      AGE
alertmanager-main       NodePort    10.99.238.236    <none>        9093:30020/TCP               170m
alertmanager-operated   ClusterIP   None             <none>        9093/TCP,9094/TCP,9094/UDP   125m
blackbox-exporter       ClusterIP   10.102.248.95    <none>        9115/TCP,19115/TCP           170m
grafana                 NodePort    10.104.107.12    <none>        3000:30030/TCP               170m
kube-state-metrics      ClusterIP   None             <none>        8443/TCP,9443/TCP            170m
node-exporter           ClusterIP   None             <none>        9100/TCP                     170m
prometheus-adapter      ClusterIP   10.111.186.113   <none>        443/TCP                      170m
prometheus-k8s          NodePort    10.100.29.54     <none>        9090:30010/TCP               170m
prometheus-operated     ClusterIP   None             <none>        9090/TCP                     125m
prometheus-operator     ClusterIP   None             <none>        8443/TCP                     171m
访问Prometheus 都可以访问
 
Grafana对接Prometheus

点击Test出现Data source is working即为成功

Grafana加载Dashboard

拷贝dashboard面板ID,当然也可以下载该dashboard对应的json文件

然后到Grafana控制台进行导入模版

导入成功了,咱们可以看下效果,如下图!

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值