ubuntu grafana prometheus_Prometheus+Thanos实现多集群监控及告警

Thanos 组件介绍:

  • 边车组件(Sidecar):连接Prometheus,并把Prometheus暴露给查询网关(Querier/Query),以供实时查询,并且可以上传Prometheus数据给云存储,以供长期保存

  • 查询网关(Querier/Query):实现了Prometheus API,与汇集底层组件(如边车组件Sidecar,或是存储网关Store Gateway)的数据

  • 存储网关(Store Gateway):将云存储中的数据内容暴露出来

  • 压缩器(Compactor):将云存储中的数据进行压缩和下采样以及设置数据保存时间,并不是将数据进行压缩从而降低磁盘使用空间,反而会使用更多的空间,该组件的主要用途是用于提高跨长时间度的查询速度;它也不会直接删除云存储中的数据,只是会将过期的数据进行标记,并存储在 deletion-mark.json 文件中

  • 接收器(Receiver):从Prometheus’ remote-write WAL(Prometheus远程预写式日志)获取数据,暴露出去或者上传到云存储

  • 规则组件(Ruler/Rule):针对数据进行评估和报警

此次部署的架构图大概如下

d57ea097-6917-eb11-8da9-e4434bdf6706.jpeg

安装组件
集群A            Prometheus-operator、Prometheus、alertmanager、grafana、kube-state-metrics、node-exporter、prometheus-adapter、Query、Ruler、Store Gateway、Minio
集群BPrometheus-operator、Prometheus、kube-state-metrics、node-exporter、prometheus-adapter

如果条件允许的话,AlertManager、Grafana、Ruler、Query、Store Gateway、Minio组件可以安装在一个专门的监控集群中,这里因为资源有限,就将这些资源安装在了A集群中

集群A部署

对象存储

目前 thanos 支持大部分云厂商的对象存储服务,具体使用请参考 thanos 对象存储 ,这里使用 minio 代替 S3 对象存储

这里使用helm安装,具体安装方法可参考官方文档,登录 minio 创建一个 thanos

在每个集群中都创建一个存储 secret ,创建文件 thanos-store-secret.yaml

apiVersion: v1data: {}kind: Secretmetadata:  name: thanos-store  namespace: monitoringtype: OpaquestringData:  thanos-store-minio.yaml: |-    type: s3    config:      bucket: thanos      endpoint: minio.default.svc.cluster.local:9000      insecure: true      access_key: YOURACCESSKEY      secret_key: YOURSECRETKEY
Prometheus-operator

按照之前的文章 prometheus-operator 安装 使用kube-prometheus进行安装

需要修改的文件 prometheus-prometheus.yaml

添加如下内容

...externalLabels:    cluster: cluster-a   # 查询的数据中,会包含cluster标签,用于区分集群数据thanos:    image: quay.io/thanos/thanos:v0.15.0    version: v0.15.0    objectStorageConfig:      key: thanos-store-minio.yaml      name: thanos-store

去掉如下内容

alerting:    alertmanagers:    - name: alertmanager-main      namespace: monitoring      port: web

因为不需要Prometheus和AlertManager直接通信,后面改用Ruler组件

创建以上资源即可,Thanos的Sidecar组件就会自动部署上

配置AlertManager邮件告警或集成企业微信告警,可参考之前的文章:prometheus-operator 添加自定义告警

Store Gateway

编辑文件 thanos-store-gateway-statefulSet.yaml,创建store-gateway组件

apiVersion: apps/v1kind: StatefulSetmetadata:  labels:    app.kubernetes.io/component: object-store-gateway    app.kubernetes.io/instance: thanos-store    app.kubernetes.io/name: thanos-store    app.kubernetes.io/version: v0.15.0  name: thanos-store  namespace: monitoringspec:  replicas: 1  selector:    matchLabels:      app.kubernetes.io/component: object-store-gateway      app.kubernetes.io/instance: thanos-store      app.kubernetes.io/name: thanos-store  serviceName: thanos-store  template:    metadata:      labels:        app.kubernetes.io/component: object-store-gateway        app.kubernetes.io/instance: thanos-store        app.kubernetes.io/name: thanos-store        app.kubernetes.io/version: v0.15.0    spec:      containers:      - args:        - store        - --log.level=info        - --data-dir=/var/thanos/store        - --grpc-address=0.0.0.0:10901        - --http-address=0.0.0.0:10902        - --objstore.config=$(OBJSTORE_CONFIG)        env:        - name: OBJSTORE_CONFIG          valueFrom:            secretKeyRef:              key: thanos-store-minio.yaml              name: thanos-store        image: quay.io/thanos/thanos:v0.15.0        livenessProbe:          failureThreshold: 8          httpGet:            path: /-/healthy            port: 10902            scheme: HTTP          periodSeconds: 30        name: thanos-store-gateway        resources:          limits:            cpu: 500m            memory: 2000Mi          requests:            cpu: 100m            memory: 256Mi        ports:        - containerPort: 10901          name: grpc        - containerPort: 10902          name: http        readinessProbe:          failureThreshold: 20          httpGet:            path: /-/ready            port: 10902            scheme: HTTP          periodSeconds: 5        terminationMessagePolicy: FallbackToLogsOnError        volumeMounts:        - mountPath: /var/thanos/store          name: data          readOnly: false      terminationGracePeriodSeconds: 120      volumes: []  volumeClaimTemplates:  - metadata:      labels:        app.kubernetes.io/component: object-store-gateway        app.kubernetes.io/instance: thanos-store        app.kubernetes.io/name: thanos-store      name: data      annotations:        volume.beta.kubernetes.io/storage-class: nfs-storage    spec:      accessModes:      - ReadWriteOnce      resources:        requests:          storage: 10Gi

编辑文件 thanos-store-gateway-service.yaml,创建store-gateway服务,用于为query组件提供查询接口

apiVersion: v1kind: Servicemetadata:  labels:    app.kubernetes.io/component: object-store-gateway    app.kubernetes.io/instance: thanos-store    app.kubernetes.io/name: thanos-store    app.kubernetes.io/version: v0.15.0  name: thanos-store-gateway  namespace: monitoringspec:  clusterIP: None  ports:  - name: grpc    port: 10901    targetPort: 10901  - name: http    port: 10902    targetPort: 10902  selector:    app.kubernetes.io/component: object-store-gateway    app.kubernetes.io/instance: thanos-store    app.kubernetes.io/name: thanos-store

编辑文件 thanos-sidecar-service.yaml,将Prometheus数据通过sidecar组件暴露给query组件

apiVersion: v1kind: Servicemetadata:  name: thanos-sidecar  namespace: monitoring  labels:    app: thanos-sidecarspec:  clusterIP: None  selector:    prometheus: k8s  ports:  - name: grpc    port: 10901    targetPort: grpc

编辑文件 thanos-query.yaml,创建Query组件

apiVersion: apps/v1kind: Deploymentmetadata:  labels:    app.kubernetes.io/component: query-layer    app.kubernetes.io/instance: thanos-query    app.kubernetes.io/name: thanos-query    app.kubernetes.io/version: v0.15.0  name: thanos-query  namespace: monitoringspec:  replicas: 1  selector:    matchLabels:      app.kubernetes.io/component: query-layer      app.kubernetes.io/instance: thanos-query      app.kubernetes.io/name: thanos-query  template:    metadata:      labels:        app.kubernetes.io/component: query-layer        app.kubernetes.io/instance: thanos-query        app.kubernetes.io/name: thanos-query        app.kubernetes.io/version: v0.15.0    spec:      containers:      - args:        - query        - --log.level=info        - --grpc-address=0.0.0.0:10901        - --http-address=0.0.0.0:9090        - --query.replica-label=prometheus_replica        - --store=dnssrv+_grpc._tcp.thanos-sidecar.monitoring.svc.cluster.local # 这里是sidecar暴露的服务        - --store=dnssrv+_grpc._tcp.thanos-store-gateway.monitoring.svc.cluster.local # 这里是store-gateway暴露的服务        image: quay.io/thanos/thanos:v0.15.0        livenessProbe:          failureThreshold: 4          httpGet:            path: /-/healthy            port: 9090            scheme: HTTP          periodSeconds: 30        name: thanos-query        env:        - name: TZ          value: Asia/Shanghai        ports:        - containerPort: 10901          name: grpc        - containerPort: 9090          name: http        readinessProbe:          failureThreshold: 20          httpGet:            path: /-/ready            port: 9090            scheme: HTTP          periodSeconds: 5        terminationMessagePolicy: FallbackToLogsOnError      terminationGracePeriodSeconds: 120---apiVersion: v1kind: Servicemetadata:  name: thanos-query  namespace: monitoring  labels:    app: thanos-queryspec:  selector:    app.kubernetes.io/instance: thanos-query  ports:  - name: http    port: 9090    targetPort: http

编辑文件 thanos-alertmanagerConfig.yaml 创建alertmanager连接信息

apiVersion: v1data: {}kind: Secretmetadata:  name: thanos-alert-config  namespace: monitoringtype: OpaquestringData:  thanos-alert-config.yaml: |-    alertmanagers:    - static_configs:      - alertmanager-main.monitoring.svc:9093      scheme: http      path_prefix: "/"      timeout: 10s      api_version: v1

编辑文件 thanos-ruler.yaml 创建ruler组件

apiVersion: monitoring.coreos.com/v1kind: ThanosRulermetadata:  name: thanos-ruler  namespace: monitoring  labels:    app: thanos-rulerspec:  image: quay.io/thanos/thanos:v0.15.0  alertmanagersConfig:    key: thanos-alert-config.yaml    name: thanos-alert-config  replicas: 1  resources:    requests:      memory: 200Mi    limits:      memory: 1000Mi  ruleSelector:    matchLabels:      prometheus: k8s      role: alert-rules  queryEndpoints:  - dnssrv+_http._tcp.thanos-query.monitoring.svc.cluster.local  objectStorageConfig:    key: thanos-store-minio.yaml    name: thanos-store  storage:    volumeClaimTemplate:      spec:        storageClassName: nfs-storage         resources:          requests:            storage: 2Gi

创建上面所有资源即可

集群B部署

Prometheus-operator
  1. 按照集群A的步骤编辑 prometheus-prometheus.yaml

  2. 按照之前的文章 prometheus-operator 安装 即可,只需安装kube-state-metrics、node-exporter、prometheus、prometheus-adapter、prometheus-operator即可

  3. 按照集群A的步骤暴露sidecar接口,不过使用nodePort方式

  4. 将sidecar接口添加至thanos-query.yaml

至此基于thanos和Prometheus-operator的多集群监控就搭建完成了,再有其他集群想要接入的话,直接按照集群B的步骤操作即可

参考

https://github.com/prometheus-operator/prometheus-operator/tree/master/example/thanos

https://github.com/thanos-io/kube-thanos

https://medium.com/@mail2ramunakerikanti/thanos-for-prometheus-f7f111e3cb75

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值