部署 Thanos Sidecar 模式实现 Prometheus 多集群管理

Thanos Sidecar 模式简介

Thanos 是具有长期存储功能的开源、高可用性 Prometheus 设置,Thanos 是 CNCF 孵化项目。

Thanos 具有指标的全局查询视图、无限保留指标、组件的高可用性等特征。

Thanos 有Sidecar和Receiver 两种运行模式,Sidecar模式下thanos每隔两小时将Prometheus 本地存储的TSDB块上传到对象存储中。

在这里插入图片描述

官方网站:https://thanos.io/

项目地址:https://github.com/thanos-io/thanos

官方文档:https://thanos.io/tip/components/sidecar.md/

Prometheus 集群规划

准备3个kubernetes集群,通过thanos sidecar模式统一收集3个集群指标在同一个grafana展示,并上传到对象存储长期存储,集群组件部署规划如下:

  • cluster-observer:通过kube-prometheus-stack部署监控相关组件及thanos-sidecar,并且启用grafana。部署bitnami-thanos,启用thanos和minio。
  • cluster-A:通过kube-prometheus-stack部署监控相关组件及thanos-sidecar。
  • cluster-B:通过kube-prometheus-stack部署监控相关组件及thanos-sidecar。

集群组件部署规划表如下:

集群名称节点名称节点IP部署组件
cluster-observernode33192.168.72.33prometheus-operator、prometheus、thanos-sidecar、alertmanager、kube-state-metrics、node-exporter、额外部署:grafana、thanos、minio
cluster-Anode40192.168.72.40prometheus-operator、prometheus、thanos-sidecar、alertmanager、kube-state-metrics、node-exporter
cluster-Bnode41192.168.72.41prometheus-operator、prometheus、thanos-sidecar、alertmanager、kube-state-metrics、node-exporter

查看集群节点信息
cluster-observer 集群

root@node33:~# kubectl get nodes -o wide
NAME     STATUS   ROLES           AGE    VERSION   INTERNAL-IP     EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION      CONTAINER-RUNTIME
node33   Ready    control-plane   153d   v1.27.6   192.168.72.33   <none>        Ubuntu 22.04.2 LTS   5.15.0-76-generic   containerd://1.6.24

cluster-A 集群

root@node40:~# kubectl get nodes -o wide
NAME     STATUS   ROLES           AGE    VERSION   INTERNAL-IP     EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION      CONTAINER-RUNTIME
node40   Ready    control-plane   107d   v1.27.7   192.168.72.40   <none>        Ubuntu 22.04.2 LTS   5.15.0-76-generic   containerd://1.6.24

cluster-B 集群

root@node41:~# kubectl get nodes -o wide
NAME     STATUS   ROLES           AGE     VERSION   INTERNAL-IP     EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION      CONTAINER-RUNTIME
node41   Ready    control-plane   4d21h   v1.27.7   192.168.72.41   <none>        Ubuntu 22.04.2 LTS   5.15.0-76-generic   containerd://1.6.24

3个集群需要提供默认storageclass,用于pod持久化存储,示例如下:

root@node33:~# kubectl get sc
NAME                         PROVISIONER        RECLAIMPOLICY   VOLUMEBINDINGMODE      ALLOWVOLUMEEXPANSION   AGE
openebs-hostpath (default)   openebs.io/local   Delete          WaitForFirstConsumer   false                  153d

在3个集群分别添加kube-prometheus-stack helm仓库

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts

基本架构
在这里插入图片描述
指标采集逻辑
在这里插入图片描述

cluster-A部署prometheus

创建kube-prometheus-stack values.yaml文件

$ cat values.yaml
prometheus:
  service:
    type: NodePort
  prometheusSpec:
    replicas: 2
    retention: 12h
    disableCompaction: true
    thanos:
      objectStorageConfig:
        existingSecret:
          name: thanos-objstore
          key: objstore.yml
    externalLabels:
      cluster: cluster-A
    storageSpec:
      volumeClaimTemplate:
        spec:
          accessModes: ["ReadWriteOnce"]
          resources:
            requests:
              storage: 50Gi
  thanosService:
    enabled: true
    type: NodePort
    clusterIP: ""
  thanosServiceMonitor:
    enabled: true
  extraSecret:
    name: thanos-objstore
    data:
      objstore.yml: |
        type: S3
        config:
          bucket: "thanos"
          endpoint: "192.168.72.33:32000"
          access_key: "admin"
          secret_key: "minio123"
          insecure: true
alertmanager:
  enabled: true
  service:
    type: "NodePort"
  alertmanagerSpec:
    replicas: 2
    storage:
      volumeClaimTemplate:
        spec:
          accessModes: ["ReadWriteOnce"]
          resources:
            requests:
              storage: 50Gi
grafana:
  enabled: false

配置参数说明:

  • prometheus.prometheusSpec.replicas参数:指定Prometheus 为2副本,采集同一个集群的两份相同指标,实现高可用,由thanos去重
  • prometheus.prometheusSpec.storageSpec参数:启用prometheus本地持久化存储,至少保留6h的指标数据,因为thanos sidecar 2h上传一次数据到对象存储,有丢失2h数据的风险,另外查询实时指标时thanos依然需要访问prometheus的本地数据
  • prometheus.prometheusSpec.disableCompaction参数:禁用prometheus本地的数据压缩功能,thanos也具备去重压缩功能,防止冲突混乱
  • prometheus.prometheusSpec.thanos.objectStorageConfig参数:thanos sidecar 上传数据到对象存储需要与对象存储实现对接访问,后面部署thanos时会启用minio对象存储组件,也可单独部署
  • prometheus.prometheusSpec.thanos.externalLabels参数:通过标签标记不同集群及prometheus副本,thanos通过externalLabels识别不同集群、不同副本的prometheus实例,集群标签手动配置,副本标签由 kube-prometheus-stack 自动添加
  • prometheus.thanosService参数:将thanos sidecar Service 以NodePort方式暴露出来,生产环境推荐使用ingress方式,用于observe集群的thanos query组件访问
  • prometheus.extraSecret参数:由kube-prometheus-stack自动创建访问对象存储的secret,供prometheus.prometheusSpec.thanos.objectStorageConfig参数使用,如果基于安全考虑,也可去除该配置手动创建secret
  • alertmanager.enabled参数:配置alertmanager双副本及数据持久化,可选
  • grafana.enabled参数:禁用grafana组件,只需在observe集群启用grafana即可

部署kube-prometheus-stack

helm install prometheus prometheus-community/kube-prometheus-stack \
  -n monitoring --create-namespace -f values.yaml

查看创建的所有pods,部署了prometheus-operator、prometheus、thanos-sidecar、alertmanager、kube-state-metrics、node-exporter组件。

root@node40:~# kubectl -n monitoring get pods
NAME                                                     READY   STATUS    RESTARTS   AGE
alertmanager-prometheus-kube-prometheus-alertmanager-0   2/2     Running   0          15m
alertmanager-prometheus-kube-prometheus-alertmanager-1   2/2     Running   0          15m
prometheus-kube-prometheus-operator-85656676ff-vt7gm     1/1     Running   0          4d19h
prometheus-kube-state-metrics-6b6ffbfdd6-v7z9r           1/1     Running   0          4d19h
prometheus-prometheus-kube-prometheus-prometheus-0       3/3     Running   0          4d19h
prometheus-prometheus-kube-prometheus-prometheus-1       3/3     Running   0          4d19h
prometheus-prometheus-node-exporter-nkpph                1/1     Running   0          4d19h

查看prometheus pods,新建了thanos-sidecar

$ kubectl -n monitoring get pods prometheus-prometheus-kube-prometheus-prometheus-0 -o jsonpath='{.spec.containers[*].name}'
prometheus config-reloader thanos-sidecar

查看prometheus service,新建了thanos-discovery

root@node40:~/kube-prometheus-stack# kubectl -n monitoring get svc
NAME                                          TYPE        CLUSTER-IP    EXTERNAL-IP   PORT(S)                           AGE
prometheus-kube-prometheus-thanos-discovery   NodePort    10.96.3.135   <none>        10901:30901/TCP,10902:30902/TCP   66s

访问prometheus console配置页面,查看两个副本external_labels标签自动添加的prometheus_replica标签配置:
在这里插入图片描述
重新刷新页面查看两个副本的标签配置:

# 副本1
global:
  external_labels:
    cluster: cluster-A
    prometheus: monitoring/prometheus-kube-prometheus-prometheus
    prometheus_replica: prometheus-prometheus-kube-prometheus-prometheus-0

# 副本2
global:
  external_labels:
    cluster: cluster-A
    prometheus: monitoring/prometheus-kube-prometheus-prometheus
    prometheus_replica: prometheus-prometheus-kube-prometheus-prometheus-1

thanos的--query.replica-label参数需要配置为external_labels中的prometheus_replica,我们将在后面的bitnami-thanos helm chart values.yaml中指定对应参数,实际配置示例如下:

thanos query \
  --store=<address_of_store_api> \
  --query.replica-label="prometheus_replica"

cluster-B部署prometheus

配置与cluster-A类似,注意修改externalLabels参数,创建kube-prometheus-stackvalues.yaml文件

$ cat values.yaml
prometheus:
  service:
    type: NodePort
  prometheusSpec:
    replicas: 2
    retention: 12h
    disableCompaction: true
    thanos:
      objectStorageConfig:
        existingSecret:
          name: thanos-objstore
          key: objstore.yml
    externalLabels:
      cluster: cluster-B
    storageSpec:
      volumeClaimTemplate:
        spec:
          accessModes: ["ReadWriteOnce"]
          resources:
            requests:
              storage: 50Gi
  thanosService:
    enabled: true
    type: NodePort
    clusterIP: ""
  thanosServiceMonitor:
    enabled: true
  extraSecret:
    name: thanos-objstore
    data:
      objstore.yml: |
        type: S3
        config:
          bucket: "thanos"
          endpoint: "192.168.72.33:32000"
          access_key: "admin"
          secret_key: "minio123"
          insecure: true
alertmanager:
  enabled: true
  service:
    type: "NodePort"
  alertmanagerSpec:
    replicas: 2
    storage:
      volumeClaimTemplate:
        spec:
          accessModes: ["ReadWriteOnce"]
          resources:
            requests:
              storage: 50Gi
grafana:
  enabled: false

部署kube-prometheus-stack

helm install prometheus prometheus-community/kube-prometheus-stack \
  -n monitoring --create-namespace -f values.yaml

查看创建的所有pods,部署了prometheus-operator、prometheus、thanos-sidecar、alertmanager、kube-state-metrics、node-exporter组件。

root@node41:~# kubectl -n monitoring get pods
NAME                                                     READY   STATUS    RESTARTS   AGE
alertmanager-prometheus-kube-prometheus-alertmanager-0   2/2     Running   0          18m
alertmanager-prometheus-kube-prometheus-alertmanager-1   2/2     Running   0          18m
prometheus-kube-prometheus-operator-85656676ff-slrmb     1/1     Running   0          4d19h
prometheus-kube-state-metrics-6b6ffbfdd6-2tdfb           1/1     Running   0          4d19h
prometheus-prometheus-kube-prometheus-prometheus-0       3/3     Running   0          4d19h
prometheus-prometheus-kube-prometheus-prometheus-1       3/3     Running   0          4d19h
prometheus-prometheus-node-exporter-t5x4s                1/1     Running   0          4d19h

cluster-observer部署prometheus

配置与cluster-A类似,注意修改externalLabels参数,创建kube-prometheus-stackvalues.yaml文件

$ cat values.yaml
prometheus:
  service:
    type: NodePort
  prometheusSpec:
    replicas: 2
    retention: 12h
    disableCompaction: true
    thanos:
      objectStorageConfig:
        existingSecret:
          name: thanos-objstore
          key: objstore.yml
    externalLabels:
      cluster: cluster-observer
    storageSpec:
      volumeClaimTemplate:
        spec:
          accessModes: ["ReadWriteOnce"]
          resources:
            requests:
              storage: 50Gi
  thanosService:
    enabled: true
    type: NodePort
    clusterIP: ""
  thanosServiceMonitor:
    enabled: true
  extraSecret:
    name: thanos-objstore
    data:
      objstore.yml: |
        type: S3
        config:
          bucket: "thanos"
          endpoint: "192.168.72.33:32000"
          access_key: "admin"
          secret_key: "minio123"
          insecure: true
alertmanager:
  enabled: true
  service:
    type: "NodePort"
  alertmanagerSpec:
    replicas: 2
    storage:
      volumeClaimTemplate:
        spec:
          accessModes: ["ReadWriteOnce"]
          resources:
            requests:
              storage: 50Gi
grafana:
  enabled: true
  service:
    type: "NodePort"
  persistence:
    enabled: true
  sidecar:
    dashboards:
      multicluster:
        global:
          enabled: true
  datasources:
    datasources.yaml:
      apiVersion: 1
      datasources:
      - name: Thanos
        type: prometheus
        url: http://thanos-query-frontend.thanos:9090
        access: proxy
        isDefault: true

部署kube-prometheus-stack

helm install prometheus prometheus-community/kube-prometheus-stack \
  -n monitoring --create-namespace -f values.yaml

查看创建的所有pods,部署了prometheus-operator、prometheus、thanos-sidecar、alertmanager、kube-state-metrics、node-exporter组件,以及grafana组件。

root@node33:~# kubectl -n monitoring get pods
NAME                                                     READY   STATUS    RESTARTS        AGE
alertmanager-prometheus-kube-prometheus-alertmanager-0   2/2     Running   0               15m
alertmanager-prometheus-kube-prometheus-alertmanager-1   2/2     Running   0               15m
prometheus-grafana-ff75dc75c-7nxzj                       3/3     Running   0               15m
prometheus-kube-prometheus-operator-85656676ff-bzpbv     1/1     Running   0               4d19h
prometheus-kube-state-metrics-6b6ffbfdd6-rn2q4           1/1     Running   0               4d19h
prometheus-prometheus-kube-prometheus-prometheus-0       3/3     Running   0               4d19h
prometheus-prometheus-kube-prometheus-prometheus-1       3/3     Running   0               4d19h
prometheus-prometheus-node-exporter-zrz86                1/1     Running   0               4d19h

cluster-observer部署thanos

bitnami-thanos 项目地址:https://github.com/bitnami/charts/tree/main/bitnami/thanos

Thanos 是一个高度可用的指标系统,可以添加到现有 Prometheus 部署之上,提供跨所有 Prometheus 安装的全局查询视图。该图表允许您安装多个 Thanos 组件,因此您可以部署如下所示的架构:

                       +--------------+                  +--------------+      +--------------+
                       | Thanos       |----------------> | Thanos Store |      | Thanos       |
                       | Query        |           |      | Gateway      |      | Compactor    |
                       +--------------+           |      +--------------+      +--------------+
                   push                           |             |                     |
+--------------+   alerts   +--------------+      |             | storages            | Downsample &
| Alertmanager | <----------| Thanos       | <----|             | query metrics       | compact blocks
| (*)          |            | Ruler        |      |             |                     |
+--------------+            +--------------+      |             \/                    |
      ^                            |              |      +----------------+           |
      | push alerts                +--------------|----> | MinIO&reg; (*) | <---------+
      |                                           |      |                |
+------------------------------+                  |      +----------------+
|+------------+  +------------+|                  |             ^
|| Prometheus |->| Thanos     || <----------------+             |
|| (*)        |<-| Sidecar (*)||    query                       | inspect
|+------------+  +------------+|    metrics                     | blocks
+------------------------------+                                |
                                                         +--------------+
                                                         | Thanos       |
                                                         | Bucket Web   |
                                                         +--------------+

添加bitnami-thanos helm仓库

helm repo add bitnami https://charts.bitnami.com/bitnami

创建bitnami-thanos values.yaml文件

$ cat value.yaml
objstoreConfig: |-
  type: s3
  config:
    bucket: thanos
    endpoint: thanos-minio.thanos:9000
    access_key: admin
    secret_key: minio123
    insecure: true
query:
  enabled: true
  replicaCount: 3
  replicaLabel: prometheus_replica
  stores:
    - "192.168.72.33:30901"
    - "192.168.72.40:30901"
    - "192.168.72.41:30901"
queryFrontend:
  enabled: true
  service:
    type: NodePort
bucketweb:
  enabled: true
  service:
    type: NodePort
compactor:
  enabled: true
  persistence:
    enabled: true
storegateway:
  enabled: true
  persistence:
    enabled: true
metrics:
  enabled: true
  serviceMonitor:
    enabled: true
receive:
  enabled: false
ruler:
  enabled: true
  replicaLabel: prometheus_replica
  serviceMonitor:
    enabled: true
  alertmanagers:
    - http://alertmanager-operated.monitoring:9093
  config:
    groups:
      - name: "metamonitoring"
        rules:
          - alert: "PrometheusDown"
            expr: absent(up{prometheus="monitoring/prometheus-kube-prometheus-prometheus"})
  persistence:
    enabled: true
minio:
  enabled: true
  auth:
    rootUser: admin
    rootPassword: "minio123"
  defaultBuckets: "thanos"
  service:
    type: NodePort
    nodePorts:
      api: "32000"
      console: "32001"

配置参数说明:

  • objstoreConfig参数:thanos query需访问对象存储查询数据
  • query.replicaLabel参数:需要与kube-prometheus-stack 生成的副本标签一致
  • query.stores参数:thanos对接每个集群的thanos sidecar service,已经通过NodePort方式暴露出来
  • receive.enabled参数:本地部署采用sidecar模式,禁用receive模式(默认禁用)
  • rule.enabled参数:可选,thanos也支持配置全局告警规则,无特殊全局告警需求建议告警在接近数据的地方实现,即使用每个集群部署的alertmanager进行告警配置
  • minio.enabled参数:启用thanos自带的minio对象存储,生产环境建议单独部署minio集群或者使用云上对象存储

部署thanos相关组件

helm install thanos bitnami/thanos \
  -n thanos --create-namespace -f values.yaml

查看创建的pods

root@node33:~# kubectl -n thanos get pods
NAME                                                     READY   STATUS    RESTARTS        AGE
thanos-bucketweb-8575fff6d9-9sxg2                        1/1     Running   0               4d19h
thanos-compactor-5cb6c57664-lgw2h                        1/1     Running   0               4d19h
thanos-minio-d8bb8598b-8kcl8                             1/1     Running   0               4d19h
thanos-query-75576bcd78-pjglm                            1/1     Running   0               14m
thanos-query-frontend-758f54d944-h6lgb                   1/1     Running   0               4d19h
thanos-storegateway-0                                    1/1     Running   1 (4d19h ago)   4d19h

使用 Thanos 代替 Prometheus

通过 WebUI 进行 PromQL 查询

Thanos 查询提供了类似于 Prometheus 的接口。可以通过端口转发或配置入口来访问它。

root@ubuntu:~# kubectl -n thanos get svc thanos-query-frontend
NAME                    TYPE       CLUSTER-IP    EXTERNAL-IP   PORT(S)          AGE
thanos-query-frontend   NodePort   10.96.0.117   <none>        9090:31264/TCP   4d20h

浏览器访问

http://192.168.72.33:31264

我们发现执行 PromQL 查询的能力,就像 Prometheus 一样:
在这里插入图片描述
来自不同 Prometheus 实例的警报、目标和规则也可以访问:
在这里插入图片描述
并且可以在查询中列出不同配置 StoreAPI:
在这里插入图片描述
在这里插入图片描述

Thanos 作为 Grafana 数据源

Thanos 最常与 Grafana 结合使用,Thanos query公开与 Prometheus 相同的 API,因此只需在 Grafana 中添加 Prometheus 类型数据源并定位 Thanos query或thanos-query-frontend。

查看grafana service nodeport

root@ubuntu:~# kubectl -n monitoring get svc prometheus-grafana 
NAME                 TYPE       CLUSTER-IP    EXTERNAL-IP   PORT(S)        AGE
prometheus-grafana   NodePort   10.96.0.167   <none>        80:30495/TCP   161m

登录到grafana,默认用户名密码为admin/prom-operator

http://192.168.72.33:30495

查看已配置的数据源为http://thanos-query-frontend.thanos:9090
在这里插入图片描述
查看dashboard,能够切换到不同集群查看指标
在这里插入图片描述

  • 32
    点赞
  • 23
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值