thanos架构详解
thanos是prometheus的高可用解决方案之一,thanos与prometheus无缝集成,并提高了一些高级特性,满足了长期存储 + 无限拓展 + 全局视图 + >无侵入性的需求
这张图中包含了 Thanos 的几个核心组件,但并不包括所有组件,简单介绍下图中几个组件:
Thanos Sidecar:连接 Prometheus,将其数据提供给 Thanos Query 查询,并且/或者将其上传到对象存储,以供长期存储
Thanos Query:实现了 Prometheus API,提供全局查询视图,将来StoreAPI提供的数据进行聚合最终返回给查询数据的client(如grafana)
Thanos Store Gateway:将对象存储的数据暴露给 Thanos Query 去查询。
Thanos Ruler:对监控数据进行评估和告警,还可以计算出新的监控数据,将这些新数据提供给 Thanos Query 查询并且/或者上传到对象存储,以>供长期存储。
Thanos Compact:将对象存储中的数据进行压缩和降低采样率,加速大时间区间监控数据查询的速度
Thanos Receiver:从 Prometheus 的远程写入 WAL 接收数据,将其公开和/或上传到云存储。
环境规划
主机 | 组件 |
Prometheus01 | Prometheus,alertmanager,thanos_sidecar,thanos_rule,thanos_query,thanos_compact,thanos_store |
Prometheus02 | Prometheus,alertmanager,thanos_sidecar,thanos_rule,thanos_query,thanos_compact,thanos_store |
Prometheus03 | Prometheus,alertmanager,thanos_sidecar,thanos_rule,thanos_query,thanos_compact,thanos_store |
prometheus部署
prometheus配置文件(prometheus01)
[root@prometheus01 prometheus-2.36.1]# vi prometheus.yml
global:
scrape_interval: 15s
evaluation_interval: 15s
external_labels:
cluster: sincercloud_iaas
replica: '0' #用于数据去重
alerting:
alertmanagers:
- follow_redirects: true
enable_http2: true
scheme: http
timeout: 10s
api_version: v2
static_configs:
- targets:
- 10.250.38.201:9093
- 10.250.38.202:9093
- 10.250.38.203:9093
labels:
env: dev_sit
prometheus配置文件(prometheus02)
[root@prometheus02 prometheus-2.36.1]# vi prometheus.yml
global:
scrape_interval: 15s
evaluation_interval: 15s
external_labels:
cluster: sincercloud_iaas
replica: '1' #用于数据去重
alerting:
alertmanagers:
- follow_redirects: true
enable_http2: true
scheme: http
timeout: 10s
api_version: v2
static_configs:
- targets:
- 10.250.38.201:9093
- 10.250.38.202:9093
- 10.250.38.203:9093
labels:
env: dev_sit
prometheus配置文件(prometheus03)
[root@prometheus03 prometheus-2.36.1]# vi prometheus.yml
global:
scrape_interval: 15s
evaluation_interval: 15s
external_labels:
cluster: sincercloud_iaas
replica: '2' #用于数据去重
alerting:
alertmanagers:
- follow_redirects: true
enable_http2: true
scheme: http
timeout: 10s
api_version: v2
static_configs:
- targets:
- 10.250.38.201:9093
- 10.250.38.202:9093
- 10.250.38.203:9093
labels:
env: dev_sit
Prometheus service文件(所有节点)
[root@prometheus03 prometheus-2.36.1]# cat /usr/lib/systemd/system/prometheus.service
[Unit]
Description=Prometheus
Wants=network-online.target
After=network-online.target
[Service]
User=prometheus
Group=prometheus
Type=simple
ExecStartPre=/service/software/prometheus-2.36.1/promtool check config /service/software/prometheus-2.36.1/prometheus.yml
ExecStart=/service/software/prometheus-2.36.1/prometheus \
--config.file /service/software/prometheus-2.36.1/prometheus.yml \
--storage.tsdb.path /thanos-data/prometheus \
--storage.tsdb.min-block-duration=2h --storage.tsdb.max-block-duration=2h \
--storage.tsdb.retention.time=2h \
--web.console.templates=/service/software/prometheus-2.36.1/consoles \
--web.console.libraries=/service/software/prometheus-2.36.1/console_libraries \
--web.listen-address=:9090 \
--web.enable-lifecycle \
--web.enable-admin-api
ExecReload=/usr/bin/curl -XPOST http://127.0.0.1:9090/-/reload
[Install]
WantedBy=multi-user.target
Thanos_sidecar部署
Thanos_sidecar service文件(所有节点)
[root@prometheus03 prometheus-2.36.1]# cat /usr/lib/systemd/system/thanos_sidecar.service
[Unit]
Description=Thanos-Sidecar
Wants=network-online.target
After=network-online.target prometheus.service
[Service]
User=prometheus
Group=prometheus
Type=simple
Restart=on-failure
RestartSec=5s
ExecStart=/service/software/thanos-0.25.0_sidecar/thanos sidecar \
--tsdb.path /thanos-data/prometheus \
--grpc-address 0.0.0.0:10901 \
--http-address 0.0.0.0:10902 \
--reloader.config-file /service/software/prometheus-2.36.1/prometheus.yml \
--prometheus.url http://localhost:9090 \
--objstore.config-file /service/software/thanos-0.25.0_store/ceph-oss.yaml \
--log.level info \
--log.format json
[Install]
WantedBy=multi-user.target
Thanos_query部署
thanos_query service 文件(所有节点)
[root@prometheus03 prometheus-2.36.1]# cat /usr/lib/systemd/system/thanos_query.service
[Unit]
Description=Thanos-Qurey
Wants=network-online.target
After=network-online.target prometheus.service
[Service]
User=prometheus
Group=prometheus
Type=simple
Restart=on-failure
RestartSec=5s
ExecStart=/service/software/thanos-0.25.0_sidecar/thanos query \
--grpc-address 0.0.0.0:10951 \
--http-address 0.0.0.0:10952 \
--query.replica-label replica \
--query.replica-label rule_replica \
--query.replica-label receive_replica \
--query.auto-downsampling \
--store=10.250.38.201:10901 \
--store=10.250.38.202:10901 \
--store=10.250.38.203:10901 \
--store=10.250.38.202:10911 \
--store=10.250.38.201:10911 \
--store=10.250.38.201:10921 \
--store=10.250.38.202:10921 \
--store=10.250.38.203:10921 \
--log.level=info \
--log.format=json
[Install]
WantedBy=multi-user.target
[root@prometheus03 prometheu
Thanos_rule部署
创建目录(所有主机)
# mkdir /service/software/thanos-0.25.0_rule/rules /thanos-data/thanos_rule
Thanos_rule service 文件(所有主机)
[root@prometheus03 prometheus-2.36.1]# cat /usr/lib/systemd/system/thanos_rule.service
[Unit]
Description=Thanos-rule
Wants=network-online.target
After=network-online.target prometheus.service
[Service]
User=prometheus
Group=prometheus
Type=simple
Restart=on-failure
RestartSec=5s
ExecStart=/service/software/thanos-0.25.0_rule/thanos rule \
--data-dir "/thanos-data/thanos_rule" \
--http-address "0.0.0.0:10912" \
--grpc-address "0.0.0.0:10911" \
--eval-interval "30s" \
--rule-file "/service/software/thanos-0.25.0_rule/rules/*rules.yaml" \
--alert.query-url "http://0.0.0.0:9090" \
--alertmanagers.url "http://10.250.38.201:9093" \
--alertmanagers.url "http://10.250.38.202:9093" \
--alertmanagers.url "http://10.250.38.203:9093" \
--query "10.250.38.201:10952" \
--query "10.250.38.202:10952" \
--objstore.config-file "/service/software/thanos-0.25.0_store/ceph-oss.yaml" \
- -label 'rule_replica="2"' \ #三台主机分别0 1 2以区分,避免重复告警
-alert.label-drop "rule_replica" \
--log.format json \
--log.level info
[Install]
WantedBy=multi-user.target
Thanos_store部署
创建目录(所有节点)
# mkdir /thanos-data/thanos_store /thanos-data/ceph-oss
配置文件(所有节点)
[root@prometheus03 prometheus-2.36.1]# cat /service/software/thanos-0.25.0_store/ceph-oss.yaml
type: FILESYSTEM
config:
directory: "/thanos-data/ceph-oss"
Thanos_store service文件
[root@prometheus03 prometheus-2.36.1]# cat /usr/lib/systemd/system/thanos_store.service
[Unit]
Description=Thanos-Store
Wants=network-online.target
After=network-online.target prometheus.service
[Service]
User=prometheus
Group=prometheus
Type=simple
LimitNOFILE=262144
Restart=on-failure
RestartSec=5s
ExecStart=/service/software/thanos-0.25.0_store/thanos store \
--grpc-address 0.0.0.0:10921 \
--http-address 0.0.0.0:10922 \
--data-dir=/thanos-data/thanos_store \
--objstore.config-file=/service/software/thanos-0.25.0_store/ceph-oss.yaml \
--log.level=info \
--log.format=json
[Install]
WantedBy=multi-user.target
thanos_compact 部署
thanos_compact service文件(所有节点)
[root@prometheus03 prometheus-2.36.1]# cat /usr/lib/systemd/system/thanos_compact.service
[Unit]
Description=Thanos-Compact
Wants=network-online.target
After=network-online.target prometheus.service
[Service]
User=root
Group=root
Type=simple
Restart=on-failure
RestartSec=5s
ExecStart=/service/software/thanos-0.25.0_compact/thanos compact 、
--wait --http-address 0.0.0.0:10932 \
--debug.accept-malformed-index \
--data-dir=/thanos-data/thanos_compact \
--objstore.config-file=/service/software/thanos-0.25.0_store/ceph-oss.yaml \
--retention.resolution-raw=90d \
--retention.resolution-5m=180d \
--retention.resolution-1h=360d
[Install]
WantedBy=multi-user.target
alertmanager部署
省略
# systemctl start prometheus
# systemctl start thanos_*
# systemctl start alertmanager