Prometheus Thanos部署手册

一、官方架构图

二、前置条件

Kubernetes

1.22+(建议)
Helm3.0+(建议)
Miniolatest(建议)
kube-prometheus-stacklatest(建议)
thanoslatest(建议)

三、thanos组件介绍

thanos安装以下组件:

query        #查询 (通过prometheus和storegateway)
compactor    #压缩和降采样
storegateway #为query提供查询objstore
sidecar      #在kube-prometheus-stack安装时已安装, 用于数据上传和query查询
ruler        #(可不安装)

四、清理环境

  • 需要将之前安装的prometheus删除;
  • 清理相关的webhook、crd、deployment等资源;

4.1 卸载kube-prometheus-stack

helm uninstall kube-prometheus-stack

4.2 手动卸载Deployment下面的prometheus-operator

kubectl delete deployment -n kube-plugin promstack-kube-prometheus-operator

4.3 卸载prom相关的MutatingWebhookConfiguration

[root@h07b13158.sqa.eu95 /root]
#kubectl get MutatingWebhookConfiguration -n kube-plgin
NAME                                    WEBHOOKS   AGE
cert-manager-webhook                    1          2d
chaosblade-operator                     1          44d
env-injector-webhook-cfg                1          88d
kruise-mutating-webhook-configuration   2          80d
promstack-kube-prometheus-admission     1          58d
rama-mutating-webhook                   1          88d
sidecar-inject-webhook                  1          42d
sidecar-operator-webhook                1          49d
tair-mutating-webhook-configuration     1          2d
thanos-kube-prometheus-sta-admission    1          91s

[root@h07b13158.sqa.eu95 /root]
#kubectl delete MutatingWebhookConfiguration promstack-kube-prometheus-admission -n kube-plugin
warning: deleting cluster-scoped resources, not scoped to the provided namespace
mutatingwebhookconfiguration.admissionregistration.k8s.io "promstack-kube-prometheus-admission" deleted

4.4 卸载prom相关的ValidatingWebhookConfiguration

[root@h07b13158.sqa.eu95 /root]
#kubectl get ValidatingWebhookConfiguration -n kube-plgin
NAME                                      WEBHOOKS   AGE
cert-manager-webhook                      1          2d
galley                                    1          45d
kruise-validating-webhook-configuration   1          80d
promstack-kube-prometheus-admission       1          58d
rama-validating-webhook                   2          88d
tair-validating-webhook-configuration     1          2d

[root@h07b13158.sqa.eu95 /root]
#kubectl delete ValidatingWebhookConfiguration -n kube-plgin promstack-kube-prometheus-admission
warning: deleting cluster-scoped resources, not scoped to the provided namespace
validatingwebhookconfiguration.admissionregistration.k8s.io "promstack-kube-prometheus-admission" deleted

4.5 如果安装thanos失败执行uninstall

#helm uninstall thanos
WARNING: Kubernetes configuration file is group-readable. This is insecure. Location: /root/.kube/config
WARNING: Kubernetes configuration file is world-readable. This is insecure. Location: /root/.kube/config
release "thanos" uninstalled

4.6 手动删除CRD

kubectl delete crd alertmanagerconfigs.monitoring.coreos.com
kubectl delete crd alertmanagers.monitoring.coreos.com
kubectl delete crd podmonitors.monitoring.coreos.com
kubectl delete crd probes.monitoring.coreos.com
kubectl delete crd prometheuses.monitoring.coreos.com
kubectl delete crd prometheusrules.monitoring.coreos.com
kubectl delete crd servicemonitors.monitoring.coreos.com
kubectl delete crd thanosrulers.monitoring.coreos.com

五、操作步骤

5.1 helm安装kube-prometheus-stack

这里直接通过helm包进行安装!

$ helm repo add prometheus-community https://prometheus-community.github.io/helm-charts

$ helm repo update

$ helm install my-kube-prometheus-stack prometheus-community/kube-prometheus-stack --version 39.7.0

helm安装完成后,会提示两个镜像无法下载:

k8s.gcr.io/ingress-nginx/kube-webhook-certgen:v1.2.0
registry.k8s.io/kube-state-metrics/kube-state-metrics:v2.5.0

# 替换成以下镜像
docker.io/jettech/kube-webhook-certgen:v1.2.0
kubesphere/kube-state-metrics:v2.5.0

安装完成后,如图所示:

image

 5.2 helm安装minio(生产最好选择多台主机多硬盘方式)

# helm repo add minio https://charts.min.io
# kubectl create ns minio
# helm install --set accessKey=admin,secretKey=12341234,rootUser=admin,rootPassword=12341234,mode=distributed,replicas=4,service.type=NodePort,persistence.storageClass=nfs-storage,persistence.size=500Gi,resources.requests.memory=4Gi -name minio minio/minio --debug --wait --timeout 10m

参数详见

https://github.com/minio/minio/blob/master/helm/minio/values.yaml

5.3 helm安装thanos

5.3.1 下载charts

#指定变量
pro=thanos
chart_version=10.3.6

mkdir -p /data/$pro
cd /data/$pro

#下载charts
helm pull bitnami/$pro --version=$chart_version

#提取values.yaml文件
tar zxvf $pro-$chart_version.tgz --strip-components 1 $pro/values.yaml 

cat > /data/$pro/start.sh << EOF
kubectl get ns monitoring||kubectl create ns monitoring
helm install $pro $pro-$chart_version.tgz \
-f values.yaml \
-n monitoring
EOF

#helm3 升级
cat > /data/$pro/upgrade.sh << EOF
helm install $pro $pro-$chart_version.tgz \
-f values.yaml \
-n monitoring
cp values.yaml values.yaml.bak_`date +%F_%R`
EOF

5.3.2 配置values.yaml

#此处对应kube-prometheus-stack的values.yaml配置中的prometheus.extraSecret.name
existingObjstoreSecret: "bucket-config"
query:
  enabled: true
  replicaLabel: [prometheus_replica]                          #去重标记
  dnsDiscovery:
    enabled: true
    sidecarsService: "kube-prometheus-stack-thanos-discovery"  #kube-prometheus-stack的thanos-servicename
    sidecarsNamespace: "monitoring"                            #kube-prometheus-stack部暑空间   
  ingress:
    enabled: true
    hostname: thanos.lady.cn    #域名,自定义就好
queryFrontend:                 #提供给grafana查询使用,看下图
  enabled: true 
compactor:
  enabled: true
  persistence:
    enabled: false          #生产环境设为true,持久化
storegateway:
  enabled: true 
  persistence:
    enabled: false           #生产环境设为true,持久化
ruler:
  enabled: true
  replicaLabel: prometheus_replica              #去重标记
  alertmanagers:
  - kube-prometheus-stack-alertmanager:9093       #kube-prometheus-stack的servicename地址
  existingConfigmap: "kube-prometheus-stack-alertmanager-overview"   ##kube-prometheus-stack的alertmanagers配置
  persistence:
    enabled: false             #生产环境设为true,持久化
  ingress:
    enabled: true
    hostname: thanos-ruler.lady.cn     #域名,自定义就好

5.4 配置Thanos

5.4.1 minio 创建一个thanos桶

①登陆minio web UI

ACCESS_KEY=$(kubectl get secret minio -o jsonpath="{.data.accesskey}" -n kube-system | base64 --decode)
SECRET_KEY=$(kubectl get secret minio -o jsonpath="{.data.secretkey}" -n kube-system | base64 --decode)

访问http://{minio_pod_ip}:9000

②创建bucket

手动创建:bucket:thanos

 5.4.2 在集群中都创建一个存储secret

①配置文件thanos-storage-minio.yaml

参数获取:
ACCESS_KEY=$(kubectl get secret minio -o jsonpath="{.data.accesskey}" -n kube-system | base64 --decode)
SECRET_KEY=$(kubectl get secret minio -o jsonpath="{.data.secretkey}" -n kube-system | base64 --decode)
CLUSTER_SERVICE_IP=$(kubectl get service -n kube-system minio  -o jsonpath='{.spec.clusterIP}')
替换下面@前缀字符串。
#cat thanos-storage-minio.yaml
type: s3
config:
  bucket: thanos
  endpoint: @CLUSTER_SERVICE_IP:9000
  access_key: @ACCESS_KEY
  secret_key: @SECRET_KEY
  insecure: true
  signature_version2: true

②创建存储的secret

kubectl -n kube-plugin create secret generic thanos-objectstorage --from-file=thanos.yaml=thanos-storage-minio.yaml

六、检查

[root@host-192-168-11-100 kube-thanos]# kubectl get pod -A
NAMESPACE     NAME                                                        READY   STATUS    RESTARTS   AGE
kube-system   calico-kube-controllers-6d948fc787-gpsw2                    1/1     Running   0          2d
kube-system   calico-node-9h4nq                                           1/1     Running   0          2d
kube-system   coredns-c79c957c8-96jf5                                     1/1     Running   0          2d
kube-system   dashboard-metrics-scraper-8464848978-86gqb                  1/1     Running   0          2d
kube-system   kubernetes-dashboard-9457cbb47-7hq2r                        1/1     Running   0          2d
kube-system   metrics-server-6d6786c9db-clstn                             1/1     Running   0          2d
kube-system   traefik2-74f8d7b659-z5ldb                                   1/1     Running   0          4h37m
monitoring    alertmanager-kube-prometheus-stack-alertmanager-0           2/2     Running   0          30h
monitoring    kube-prometheus-stack-grafana-799446c5b9-q5hhh              3/3     Running   0          30h
monitoring    kube-prometheus-stack-kube-state-metrics-6c5d86887c-bzlfm   1/1     Running   0          30h
monitoring    kube-prometheus-stack-operator-5bbb5f4f64-bpvkv             1/1     Running   0          30h
monitoring    kube-prometheus-stack-prometheus-node-exporter-xt966        1/1     Running   0          30h
monitoring    prometheus-kube-prometheus-stack-prometheus-0               3/3     Running   0          28h
monitoring    thanos-compactor-674c68cfcc-9cwsd                           1/1     Running   0          5m31s
monitoring    thanos-query-65ff7b4f98-j2mc4                               1/1     Running   0          5m31s
monitoring    thanos-query-frontend-59df69d5c-nbhgh                       1/1     Running   0          5m31s
monitoring    thanos-ruler-0                                              1/1     Running   0          5m31s
monitoring    thanos-storegateway-0                                       1/1     Running   0          5m31s

七、WEBUI检查

在这里插入图片描述

 八、Grafana变更数据源

8.1 添加最新的数据源

8.2 查看dashboard是否OK

8.3 USE Dashboard没有数据显示

暂时先将这两个dashboard手动删除,这个依赖prom的rule配置。这两个dashboard和NodeExporter和Promtheus内容同质,可以暂时先删除。

九、Q&A

9.1 helm安装失败

查看卸载是否成功,特别是步骤3/4,还有5的残留。

9.2 thanos-store启动失败

  1. secret是否创建,minio的配置是否正确
  2. thanos的bucket是否创建成功
  3. service是否配置正确

9.3 thanos-query的UI查看Stores状态不对

如果不存在上面的配置项,查看query的启动参数,是否配置正确。

对应项目的service名字

kubectl get service -n kube-plugin 查看serviceName

附录:其他快捷命令

kubectl delete secret -n kube-plugin thanos-objectstorage
kubectl -n kube-plugin create secret generic thanos-objectstorage --from-file=thanos.yaml=thanos-storage-minio.yaml
kubectl delete pod -n kube-plugin prometheus-thanos-kube-prometheus-sta-prometheus-0
kubectl delete -f thanos-store.yaml
kubectl apply -f thanos-store.yaml
kubectl get pod -n kube-plugin
kubectl get pod -n kube-plugin thanos-store-1 -o yaml
kubectl delete -f thanos-query.yaml
kubectl apply -f thanos-query.yaml

  • 0
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

CN-FuWei

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值