prometheus-k8s中安装部署

四、k8s中安装

Prometheus安装方式较多,比如:

  • GitEe: https://gitee.com/liugpwwwroot/k8s-prometheus-grafana/tree/master/prometheus;
  • GItHub: https://github.com/prometheus-operator/kube-prometheus
  • Helm: https://artifacthub.io/packages/helm/grafana/grafana

4.1、prometheus-operator简介

创建Operator关键是对CRD(自定义k8s资源对象)的扩展。Operator的作用是将运维人员的知识给代码化,其核心主要有两个概念:资源: 对象的状态定义; 控制器:观测、分析和行动,以调节资源的分布

Operator会负责创建Prometheus,ServiceMonitor,Prometheus,Alertmanager,prometheusrules等对象,并一直监控和维护这些对象的状态。ServiceMonitor为各Exporter的抽象。service和serviceMonitor都是k8s的资源对象。一个ServiceMonitor通过label selector去匹配一类的service。部署后生成的crd有:

[root@master1 manifests]# kubectl get crd 
NAME                                        CREATED AT
alertmanagerconfigs.monitoring.coreos.com   2021-06-30T09:55:49Z
alertmanagers.monitoring.coreos.com         2021-06-30T09:55:49Z
podmonitors.monitoring.coreos.com           2021-06-30T09:55:49Z
probes.monitoring.coreos.com                2021-06-30T09:55:49Z
prometheuses.monitoring.coreos.com          2021-06-30T09:55:49Z
prometheusrules.monitoring.coreos.com       2021-06-30T09:55:50Z
servicemonitors.monitoring.coreos.com       2021-06-30T09:55:50Z
thanosrulers.monitoring.coreos.com          2021-06-30T09:55:50Z

[root@master1 manifests]# kubectl api-resources |grep monitoring.coreos.com
alertmanagerconfigs                            monitoring.coreos.com          true        AlertmanagerConfig
alertmanagers                                  monitoring.coreos.com          true         Alertmanager
podmonitors                                    monitoring.coreos.com          true         PodMonitor
probes                                         monitoring.coreos.com          true         Probe
prometheuses                                   monitoring.coreos.com          true         Prometheus
prometheusrules                                monitoring.coreos.com          true         PrometheusRule
servicemonitors                                monitoring.coreos.com          true         ServiceMonitor
thanosrulers                                   monitoring.coreos.com          true         ThanosRuler
其中prometheus是为prometheus server存在的,servicemonitor是exporter(提供metric的接口)的各种抽象。prometheus通过ServiceMonitor提供的metrics数据接口去pull数据的。alertmanager对应的是alertmanager server,prometheusrules对应的是prometheus定义的各种告警文件

官网地址:https://prometheus-operator.dev/docs/prologue/introduction/

4.2、prometheus-operaotr安装

#1、安装
[root@master1 prometheus-yaml]# wget https://github.com/prometheus-operator/kube-prometheus/archive/refs/heads/main.zip
[root@master1 prometheus-yaml]# unzip main.zip ;cd kube-prometheus-main/manifests/setup
[root@master1 setup]# kubectl create -f .
[root@master1 setup]# cd ../; kubectl apply -f .

#2、暴露service,修改为nodePort
[root@master1 manifests]# kubectl edit svc/grafana -n monitoring 
[root@master1 manifests]# kubectl edit svc/prometheus-k8s -n monitoring 

查看prometheus

查看grafana

注意:如果集群没有安装dns插件,建议在grafana web界面添加datasource 使用ip方式添加。并设置为default

4.3、监控scheduler和controller

安装后查看grafana搭配那中 controller manager和scheudler grafana大盘无数据问题处理

1、监控scheduler

[root@master1 manifests]# kubectl delete servicemonitor/kube-scheduler -n monitoring
[root@master1 manifests]# vim kubernetes-serviceMonitorKubeScheduler.yaml
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  labels:
    app.kubernetes.io/name: kube-scheduler
  name: kube-scheduler
  namespace: kube-system
spec:
  endpoints:
  - bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
    interval: 30s
    port: http-metrics    #这里改为http-metrics
    scheme: http  #这里改为http
    tlsConfig:
      insecureSkipVerify: true
  jobLabel: app.kubernetes.io/name
  namespaceSelector:  #表示去匹配某一命名空间中的service,如果想从所有的namespace中匹配用any: true
    matchNames:
    - kube-system
  selector: # 匹配的 Service 的labels,如果使用mathLabels,则下面的所有标签都匹配时才会匹配该service,如果使用matchExpressions,则至少匹配一个标签的service都会被选择
    matchLabels:
      app.kubernetes.io/name: kube-scheduler
[root@master1 manifests]# kubectl apply -f kubernetes-serviceMonitorKubeScheduler.yaml

[root@master1 my-yaml]# cat scheduler.yaml  #为scheduler创建service
apiVersion: v1
kind: Service
metadata:
  namespace: kube-system
  name: kube-scheduler
  labels:
    app.kubernetes.io/name: kube-scheduler
spec:
  selector:
    component: kube-scheduler
  ports:
  - name: http-metrics
    port: 10251
    targetPort: 10251
    protocol: TCP

2、监控controller-manager

这里使用另外一种方法,servicemonitor(kube-controller-manager)仍然使用monitoring的namespace的
#1、查看默认的ServiceMonitor
[root@master1 manifests]# cat kubernetes-serviceMonitorKubeControllerManager.yaml 
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  labels:
    app.kubernetes.io/name: kube-controller-manager
  name: kube-controller-manager
  namespace: monitoring
spec:
  endpoints:
  - bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
    interval: 30s
    metricRelabelings:
...
部分内容省略
...
    port: http-metrics  #因为我本地的为controller manger暴露的为http 的metric,这里修改
    scheme: http  #这里也要修改
    tlsConfig:
      insecureSkipVerify: true
  jobLabel: app.kubernetes.io/name
  namespaceSelector:
    matchNames:
    - kube-system
  selector:
    matchLabels:
      app.kubernetes.io/name: kube-controller-manager
[root@master1 manifests]# kubectl apply -f kubernetes-serviceMonitorKubeControllerManager.yaml

#2、创建service
[root@master1 my-yaml]# vi controller-manager.yaml
apiVersion: v1
kind: Service
metadata:
  namespace: kube-system
  name: kube-controller-manager
  labels:
    app.kubernetes.io/name: kube-controller-manager
spec:
  selector:
    component: kube-controller-manager
  ports:
  - name: http-metrics
    port: 10252
    targetPort: 10252
    protocol: TCP
[root@master1 my-yaml]# kubectl apply -f controller-manager.yaml 
[root@master1 my-yaml]# kubectl get svc -l app.kubernetes.io/name=kube-controller-manager -n kube-system
NAME                      TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)     AGE
kube-controller-manager   ClusterIP   10.102.211.176   <none>        10252/TCP   46s
[root@master1 my-yaml]# kubectl describe  svc -l app.kubernetes.io/name=kube-controller-manager -n kube-system
Name:              kube-controller-manager
Namespace:         kube-system
Labels:            app.kubernetes.io/name=kube-controller-manager
Annotations:       Selector:  component=kube-controller-manager
Type:              ClusterIP
IP:                10.102.211.176
Port:              http-metrics  10252/TCP
TargetPort:        10252/TCP
Endpoints:         192.168.56.101:10252
Session Affinity:  None
Events:            <none>

4.4、自定义监控etcd

把etcd看作一个集群外部的应用

#1、创建访问etcd的secret
[root@master1 my-yaml]# cd /etc/kubernetes/pki/etcd/
[root@master1 etcd]# ls
ca.crt  ca.key  healthcheck-client.crt  healthcheck-client.key  peer.crt  peer.key  server.crt  server.key
[root@master1 etcd]# kubectl -n monitoring create secret generic etcd-certs --from-file=./healthcheck-client.key --from-file=./healthcheck-client.crt  --from-file=./ca.crt  
secret/etcd-certs created

#2、prometheus加载secret
[root@master1 manifests]# vim prometheus-prometheus.yaml 
...
  image: www.mt.com:9500/prometheus/prometheus:v2.28.0
  secrets:
  - etcd-certs
...
[root@master1 manifests]# kubectl apply -f prometheus-prometheus.yaml
[root@master1 etcd]# kubectl exec  prometheus-k8s-0 -n monitoring -- /bin/ls "/etc/prometheus/secrets/etcd-certs" 2> /dev/null  #证书存放位置
ca.crt
healthcheck-client.crt
healthcheck-client.key

#3、创建servicemonitor
[root@master1 my-yaml]# vim prometheus-serviceMonitorEtcd.yaml
[root@master1 my-yaml]# cat prometheus-serviceMonitorEtcd.yaml 
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  labels:
    app.kubernetes.io/name: etcd-k8s
  name: etcd-k8s
  namespace: kube-system
spec:
  jobLabel: etcd-k8s
  endpoints: 
  - port: port
    interval: 3s
    scheme: https
    tlsConfig:
      caFile: /etc/prometheus/secrets/etcd-certs/ca.crt
      certFile: /etc/prometheus/secrets/etcd-certs/healthcheck-client.crt
      keyFile: /etc/prometheus/secrets/etcd-certs/healthcheck-client.key
      insecureSkipVerify: true
  selector:
    matchLabels:
      etcd-k8s: etcd
  namespaceSelector:
    matchNames:
    - kube-system
[root@master1 my-yaml]# kubectl create -f prometheus-serviceMonitorEtcd.yaml --dry-run

#4、创建service匹配serviceMonitor
注意:这里把etcd当作一个外部的应用来部署,不通过labelseletor,通过手动创建一个endpoints去指定etcd的访问地址,
[root@master1 my-yaml]# vim etcd-service.yaml 
apiVersion: v1
kind: Service
metadata:
  namespace: kube-system
  name: etcd-k8s
  labels:
    etcd-k8s: etcd
spec:
  type: ClusterIP
  clusterIP: None
  ports:
  - name: port
    port: 2379
---
apiVersion: v1
kind: Endpoints
metadata:
  name: etcd-k8s
  namespace: kube-system
  labels:
    etcd-k8s: etcd
subsets:
- addresses:
  - ip: 192.168.56.101
    nodeName: etcd-master1
  ports:
  - name: port
    port: 2379
[root@master1 my-yaml]# kubectl apply -f etcd-service.yaml 
service/etcd-k8s created
endpoints/etcd-k8s created
[root@master1 my-yaml]#  kubectl describe svc -n kube-system -l etcd-k8s=etcd  #确认是否关联成功
Name:              etcd-k8s
Namespace:         kube-system
Labels:            etcd-k8s=etcd
Annotations:       Selector:  <none>
Type:              ClusterIP
IP:                None
Port:              port  2379/TCP
TargetPort:        2379/TCP
Endpoints:         192.168.56.101:2379
Session Affinity:  None
Events:            <none>

4.5、告警推送注意事项

高版本对不同namespace进行了隔离

alertManagerConfig的动态发现是只会发现当前namespace下面的,如果需要推送不同namespace的告警信息,注意事项:
#1、告警信息确认
1)确认当前有告警内容,确认告警内容的所属的namespace
2)确认alertmanager已经收到告警信息,可通过kubectl logs 确认

#2、确认alertmanager配置信息
alertmanager.spec.alertmanagerConfigNamespaceSelector  还是 alertmanager.spec.alertmanagerConfigSelector
这里对alertmanagerConfigNamespaceSelector 说明
...
spec:
  alertmanagerConfigNamespaceSelector:
    matchLabels:
       alertmanagerconfig: enabled #匹配有该label的namepsace告警才会被发送,相应的需要在对应的namespace上打上该标签
...

#3、确认prometheusrules的rule有对应的标签
[root@master1 manifests]# kubectl get prometheusrules/alertmanager-main-rules   -n monitoring  -o yaml
spec:
  groups:
  - name: alertmanager.rules
    rules:
    - alert: AlertmanagerFailedReload
      annotations:
        description: Configuration has failed to load for {{ $labels.namespace }}/{{
          $labels.pod}}.
        runbook_url: https://github.com/prometheus-operator/kube-prometheus/wiki/alertmanagerfailedreload
        summary: Reloading an Alertmanager configuration has failed.
      expr: |
        # Without max_over_time, failed scrapes could create false negatives, see
        # https://www.robustperception.io/alerting-on-gauges-in-prometheus-2-0 for details.
        max_over_time(alertmanager_config_last_reload_successful{job="alertmanager-main",namespace="monitoring"}[5m]) == 0
      for: 10m
      labels:
        severity: critical
        namespace: monitoring  #这里要加上自己所在namespace的标签才会被alertmanager推送

#4、创建alertmanagerconfig在需要推送告警的namespace
[root@master1 my-yaml]# cat alertmanagerconfig.yaml  #如果所有目标namespace配置都一样,则需要在不同的namespace apply以下,如果不同namespace有不同的alertmanagerconfig则需要单独创建,并在不同的namespace apply
apiVersion: monitoring.coreos.com/v1alpha1
kind: AlertmanagerConfig
metadata:
  name: alertmanager-config
  labels:
    alertmanagerconfig: example
spec:
  route:
    groupBy: ['alertname','job','severity']
    groupWait: 30s
    groupInterval: 2m
    repeatInterval: 5m
    receiver: 'webhook'
  receivers:
  - name: 'webhook'
    webhookConfigs:
    - url: 'http://127.0.0.1:8086/dingtalk/webhook1/send'
注意:当前版本alertmanagerconfig定义成了一个secret,修改 manifests/alertmanager-secret.yaml 即可,如果在其他集群中使用alertmanagerconfig这个k8s crd作为配置文件,则可通过apply的方式调整配置信息

4.6、其他

#1、规则文件
prometheu pod内查看
/prometheus $ ls /etc/prometheus/rules/prometheus-k8s-rulefiles-0/  #这里有所有定义的规则文件,或者自己创建prometheusrules 这个资源对象来创建
monitoring-alertmanager-main-rules.yaml          monitoring-node-exporter-rules.yaml
monitoring-kube-prometheus-rules.yaml            monitoring-prometheus-k8s-prometheus-rules.yaml
monitoring-kube-state-metrics-rules.yaml         monitoring-prometheus-operator-rules.yaml
monitoring-kubernetes-monitoring-rules.yaml

[root@master1 my-yaml]# kubectl get prometheus/k8s -n monitoring  -o yaml
...
  ruleSelector: #匹配这些标签的才会被,如果要自己创建prometheusrules则需要添加这两个标签
    matchLabels:
      prometheus: k8s
      role: alert-rules
...
[root@master1 my-yaml]# kubectl get prometheusrules -l prometheus=k8s,role=alert-rules -A
NAMESPACE    NAME                              AGE
monitoring   alertmanager-main-rules           45h
monitoring   kube-prometheus-rules             45h
monitoring   kube-state-metrics-rules          45h
monitoring   kubernetes-monitoring-rules       45h
monitoring   node-exporter-rules               45h
monitoring   prometheus-k8s-prometheus-rules   45h
monitoring   prometheus-operator-rules         45h
  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

CN-FuWei

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值