Thanos 组件介绍:
边车组件(Sidecar):连接Prometheus,并把Prometheus暴露给查询网关(Querier/Query),以供实时查询,并且可以上传Prometheus数据给云存储,以供长期保存
查询网关(Querier/Query):实现了Prometheus API,与汇集底层组件(如边车组件Sidecar,或是存储网关Store Gateway)的数据
存储网关(Store Gateway):将云存储中的数据内容暴露出来
压缩器(Compactor):将云存储中的数据进行压缩和下采样以及设置数据保存时间,并不是将数据进行压缩从而降低磁盘使用空间,反而会使用更多的空间,该组件的主要用途是用于提高跨长时间度的查询速度;它也不会直接删除云存储中的数据,只是会将过期的数据进行标记,并存储在
deletion-mark.json
文件中接收器(Receiver):从Prometheus’ remote-write WAL(Prometheus远程预写式日志)获取数据,暴露出去或者上传到云存储
规则组件(Ruler/Rule):针对数据进行评估和报警
此次部署的架构图大概如下
安装组件 | |
---|---|
集群A | Prometheus-operator、Prometheus、alertmanager、grafana、kube-state-metrics、node-exporter、prometheus-adapter、Query、Ruler、Store Gateway、Minio |
集群B | Prometheus-operator、Prometheus、kube-state-metrics、node-exporter、prometheus-adapter |
如果条件允许的话,AlertManager、Grafana、Ruler、Query、Store Gateway、Minio组件可以安装在一个专门的监控集群中,这里因为资源有限,就将这些资源安装在了A集群中
集群A部署
对象存储
目前 thanos 支持大部分云厂商的对象存储服务,具体使用请参考 thanos 对象存储 ,这里使用 minio 代替 S3 对象存储
这里使用helm安装,具体安装方法可参考官方文档,登录 minio 创建一个 thanos
桶
在每个集群中都创建一个存储 secret
,创建文件 thanos-store-secret.yaml
apiVersion: v1data: {}kind: Secretmetadata: name: thanos-store namespace: monitoringtype: OpaquestringData: thanos-store-minio.yaml: |- type: s3 config: bucket: thanos endpoint: minio.default.svc.cluster.local:9000 insecure: true access_key: YOURACCESSKEY secret_key: YOURSECRETKEY
Prometheus-operator
按照之前的文章 prometheus-operator 安装 使用kube-prometheus进行安装
需要修改的文件 prometheus-prometheus.yaml
添加如下内容
...externalLabels: cluster: cluster-a # 查询的数据中,会包含cluster标签,用于区分集群数据thanos: image: quay.io/thanos/thanos:v0.15.0 version: v0.15.0 objectStorageConfig: key: thanos-store-minio.yaml name: thanos-store
去掉如下内容
alerting: alertmanagers: - name: alertmanager-main namespace: monitoring port: web
因为不需要Prometheus和AlertManager直接通信,后面改用Ruler组件
创建以上资源即可,Thanos的Sidecar组件就会自动部署上
配置AlertManager邮件告警或集成企业微信告警,可参考之前的文章:prometheus-operator 添加自定义告警
Store Gateway
编辑文件 thanos-store-gateway-statefulSet.yaml
,创建store-gateway组件
apiVersion: apps/v1kind: StatefulSetmetadata: labels: app.kubernetes.io/component: object-store-gateway app.kubernetes.io/instance: thanos-store app.kubernetes.io/name: thanos-store app.kubernetes.io/version: v0.15.0 name: thanos-store namespace: monitoringspec: replicas: 1 selector: matchLabels: app.kubernetes.io/component: object-store-gateway app.kubernetes.io/instance: thanos-store app.kubernetes.io/name: thanos-store serviceName: thanos-store template: metadata: labels: app.kubernetes.io/component: object-store-gateway app.kubernetes.io/instance: thanos-store app.kubernetes.io/name: thanos-store app.kubernetes.io/version: v0.15.0 spec: containers: - args: - store - --log.level=info - --data-dir=/var/thanos/store - --grpc-address=0.0.0.0:10901 - --http-address=0.0.0.0:10902 - --objstore.config=$(OBJSTORE_CONFIG) env: - name: OBJSTORE_CONFIG valueFrom: secretKeyRef: key: thanos-store-minio.yaml name: thanos-store image: quay.io/thanos/thanos:v0.15.0 livenessProbe: failureThreshold: 8 httpGet: path: /-/healthy port: 10902 scheme: HTTP periodSeconds: 30 name: thanos-store-gateway resources: limits: cpu: 500m memory: 2000Mi requests: cpu: 100m memory: 256Mi ports: - containerPort: 10901 name: grpc - containerPort: 10902 name: http readinessProbe: failureThreshold: 20 httpGet: path: /-/ready port: 10902 scheme: HTTP periodSeconds: 5 terminationMessagePolicy: FallbackToLogsOnError volumeMounts: - mountPath: /var/thanos/store name: data readOnly: false terminationGracePeriodSeconds: 120 volumes: [] volumeClaimTemplates: - metadata: labels: app.kubernetes.io/component: object-store-gateway app.kubernetes.io/instance: thanos-store app.kubernetes.io/name: thanos-store name: data annotations: volume.beta.kubernetes.io/storage-class: nfs-storage spec: accessModes: - ReadWriteOnce resources: requests: storage: 10Gi
编辑文件 thanos-store-gateway-service.yaml
,创建store-gateway服务,用于为query组件提供查询接口
apiVersion: v1kind: Servicemetadata: labels: app.kubernetes.io/component: object-store-gateway app.kubernetes.io/instance: thanos-store app.kubernetes.io/name: thanos-store app.kubernetes.io/version: v0.15.0 name: thanos-store-gateway namespace: monitoringspec: clusterIP: None ports: - name: grpc port: 10901 targetPort: 10901 - name: http port: 10902 targetPort: 10902 selector: app.kubernetes.io/component: object-store-gateway app.kubernetes.io/instance: thanos-store app.kubernetes.io/name: thanos-store
编辑文件 thanos-sidecar-service.yaml
,将Prometheus数据通过sidecar组件暴露给query组件
apiVersion: v1kind: Servicemetadata: name: thanos-sidecar namespace: monitoring labels: app: thanos-sidecarspec: clusterIP: None selector: prometheus: k8s ports: - name: grpc port: 10901 targetPort: grpc
编辑文件 thanos-query.yaml
,创建Query组件
apiVersion: apps/v1kind: Deploymentmetadata: labels: app.kubernetes.io/component: query-layer app.kubernetes.io/instance: thanos-query app.kubernetes.io/name: thanos-query app.kubernetes.io/version: v0.15.0 name: thanos-query namespace: monitoringspec: replicas: 1 selector: matchLabels: app.kubernetes.io/component: query-layer app.kubernetes.io/instance: thanos-query app.kubernetes.io/name: thanos-query template: metadata: labels: app.kubernetes.io/component: query-layer app.kubernetes.io/instance: thanos-query app.kubernetes.io/name: thanos-query app.kubernetes.io/version: v0.15.0 spec: containers: - args: - query - --log.level=info - --grpc-address=0.0.0.0:10901 - --http-address=0.0.0.0:9090 - --query.replica-label=prometheus_replica - --store=dnssrv+_grpc._tcp.thanos-sidecar.monitoring.svc.cluster.local # 这里是sidecar暴露的服务 - --store=dnssrv+_grpc._tcp.thanos-store-gateway.monitoring.svc.cluster.local # 这里是store-gateway暴露的服务 image: quay.io/thanos/thanos:v0.15.0 livenessProbe: failureThreshold: 4 httpGet: path: /-/healthy port: 9090 scheme: HTTP periodSeconds: 30 name: thanos-query env: - name: TZ value: Asia/Shanghai ports: - containerPort: 10901 name: grpc - containerPort: 9090 name: http readinessProbe: failureThreshold: 20 httpGet: path: /-/ready port: 9090 scheme: HTTP periodSeconds: 5 terminationMessagePolicy: FallbackToLogsOnError terminationGracePeriodSeconds: 120---apiVersion: v1kind: Servicemetadata: name: thanos-query namespace: monitoring labels: app: thanos-queryspec: selector: app.kubernetes.io/instance: thanos-query ports: - name: http port: 9090 targetPort: http
编辑文件 thanos-alertmanagerConfig.yaml
创建alertmanager连接信息
apiVersion: v1data: {}kind: Secretmetadata: name: thanos-alert-config namespace: monitoringtype: OpaquestringData: thanos-alert-config.yaml: |- alertmanagers: - static_configs: - alertmanager-main.monitoring.svc:9093 scheme: http path_prefix: "/" timeout: 10s api_version: v1
编辑文件 thanos-ruler.yaml
创建ruler组件
apiVersion: monitoring.coreos.com/v1kind: ThanosRulermetadata: name: thanos-ruler namespace: monitoring labels: app: thanos-rulerspec: image: quay.io/thanos/thanos:v0.15.0 alertmanagersConfig: key: thanos-alert-config.yaml name: thanos-alert-config replicas: 1 resources: requests: memory: 200Mi limits: memory: 1000Mi ruleSelector: matchLabels: prometheus: k8s role: alert-rules queryEndpoints: - dnssrv+_http._tcp.thanos-query.monitoring.svc.cluster.local objectStorageConfig: key: thanos-store-minio.yaml name: thanos-store storage: volumeClaimTemplate: spec: storageClassName: nfs-storage resources: requests: storage: 2Gi
创建上面所有资源即可
集群B部署
Prometheus-operator
按照集群A的步骤编辑
prometheus-prometheus.yaml
按照之前的文章 prometheus-operator 安装 即可,只需安装kube-state-metrics、node-exporter、prometheus、prometheus-adapter、prometheus-operator即可
按照集群A的步骤暴露sidecar接口,不过使用nodePort方式
将sidecar接口添加至
thanos-query.yaml
中
至此基于thanos和Prometheus-operator的多集群监控就搭建完成了,再有其他集群想要接入的话,直接按照集群B的步骤操作即可
参考
https://github.com/prometheus-operator/prometheus-operator/tree/master/example/thanos
https://github.com/thanos-io/kube-thanos
https://medium.com/@mail2ramunakerikanti/thanos-for-prometheus-f7f111e3cb75