prometheus2.28基于k8s部署安装

配置文件: promethues 在 k8s 中部署,配置文件要做到灵活改动,可以使用 configmap 、挂载宿主机文件等方式,本文采用 configmap 的方式加载配置文件。之前写过一篇 Prometheus2.6.0监控——部署篇 博客,configmap 是采用编辑好的 configmap yaml 格式文件做的,在后期的维护中,要加的监控都需要在这个文件内加,这样做这个文件太长,太大。本篇文档我们换一种方式来做configmap,把 rules 类型的文件全部拆分出来,这样做的好处是添加监控修改监控参数都只需要改其中一个文件即可。

promethues 启动方式: 因为是在 k8s 中启动,promethues又是监控服务,为了让数据更好的落盘,最好是让程序直接落盘,不走网络盘,所以我还是认为落到本地最好。由于它是容器启动,落盘到本地就需要保证容器不被 k8s 调度到其他节点,所以我们用 nodeselect 的方式给它限制在某一个节点启动才妥当,然后就是内存、cpu的限制给大点儿。

监控 pod 资源插件: 这里我们用 kube-state-metrics ,github地址: https://github.com/kubernetes/kube-state-metrics/tags ,我使用了最新版 kube-state-metrics-1.9.8 ,先将 tar 包下载下来,其他的都不需要,只用 kube-state-metrics-1.9.8/examples/standard 下面的文件。

鉴权:rbac

一、创建目录结构

mkdir -p prometheus/{configmap,prometheus}

二、创建promethues服务的配置文件

到刚刚创建的 configmap 目录中,执行以下命令

创建 promethues.yml 主配置文件,贴入下面内容

global:
  #每15s采集一次数据和15s做一次告警检测
  scrape_interval: 15s
  scrape_timeout: 10s
  evaluation_interval: 15s
#指定加载的告警规则文件
rule_files:
  - "/etc/prometheus/*.rules.yml"
#将报警送至何地进行报警
alerting:
  alertmanagers:
    - static_configs:
      - targets: ["192.168.60.110:9093"]
#指定prometheus要监控的目标
scrape_configs:
  - job_name: 'k8s-master'
    scrape_interval: 10s
    static_configs:
    - targets:
      - '192.168.60.101:9100'
      - '192.168.60.102:9100'
      - '192.168.60.103:9100'

  - job_name: 'k8s-node'
    scrape_interval: 10s
    static_configs:
    - targets:
      - '192.168.60.120:9100'
      - '192.168.60.121:9100'
      - '192.168.60.122:9100'

  - job_name: 'kubernetes-apiservers'
    kubernetes_sd_configs:
    - role: endpoints
    scheme: https
    tls_config:
      ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
    bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
    relabel_configs:
    - source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
      action: keep
      regex: default;kubernetes;https
  
  - job_name: 'kubernetes-nodes'
    kubernetes_sd_configs:
    - role: node
    scheme: https
    tls_config:
      ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
    bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
    relabel_configs:
    - action: labelmap
      regex: __meta_kubernetes_node_label_(.+)
    - target_label: __address__
      replacement: kubernetes.default.svc:443
    - source_labels: [__meta_kubernetes_node_name]
      regex: (.+)
      target_label: __metrics_path__
      replacement: /api/v1/nodes/${1}/proxy/metrics
  
  - job_name: 'kubernetes-cadvisor'
    kubernetes_sd_configs:
    - role: node
    scheme: https
    tls_config:
      ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
    bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
    relabel_configs:
    - action: labelmap
      regex: __meta_kubernetes_node_label_(.+)
    - target_label: __address__
      replacement: kubernetes.default.svc:443
    - source_labels: [__meta_kubernetes_node_name]
      regex: (.+)
      target_label: __metrics_path__
      replacement: /api/v1/nodes/${1}/proxy/metrics/cadvisor
  
  - job_name: 'kubernetes-services'
    kubernetes_sd_configs:
    - role: service
    metrics_path: /probe
    params:
      module: [http_2xx]
    relabel_configs:
    - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_probe]
      action: keep
      regex: true
    - source_labels: [__address__]
      target_label: __param_target
    - target_label: __address__
      replacement: blackbox-exporter.example.com:9115
    - source_labels: [__param_target]
      target_label: instance
    - action: labelmap
      regex: __meta_kubernetes_service_label_(.+)
    - source_labels: [__meta_kubernetes_namespace]
      target_label: kubernetes_namespace
    - source_labels: [__meta_kubernetes_service_name]
      target_label: kubernetes_name
  
  - job_name: 'kubernetes-ingresses'
    kubernetes_sd_configs:
    - role: ingress
    relabel_configs:
    - source_labels: [__meta_kubernetes_ingress_annotation_prometheus_io_probe]
      action: keep
      regex: true
    - source_labels: [__meta_kubernetes_ingress_scheme,__address__,__meta_kubernetes_ingress_path]
      regex: (.+);(.+);(.+)
      replacement: ${1}://${2}${3}
      target_label: __param_target
    - target_label: __address__
      replacement: blackbox-exporter.example.com:9115
    - source_labels: [__param_target]
      target_label: instance
    - action: labelmap
      regex: __meta_kubernetes_ingress_label_(.+)
    - source_labels: [__meta_kubernetes_namespace]
      target_label: kubernetes_namespace
    - source_labels: [__meta_kubernetes_ingress_name]
      target_label: kubernetes_name

  - job_name: 'kubernetes-pods'
    kubernetes_sd_configs:
    - role: pod
    relabel_configs:
    - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
      action: keep
      regex: true
    - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
      action: replace
      target_label: __metrics_path__
      regex: (.+)
    - source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
      action: replace
      regex: ([^:]+)(?::\d+)?;(\d+)
      replacement: $1:$2
      target_label: __address__
    - action: labelmap
      regex: __meta_kubernetes_pod_label_(.+)
    - source_labels: [__meta_kubernetes_namespace]
      action: replace
      target_label: kubernetes_namespace
    - source_labels: [__meta_kubernetes_pod_name]
      action: replace
      target_label: kubernetes_pod_name

  - job_name: 'kubernetes-service-endpoints'
    kubernetes_sd_configs:
    - role: endpoints
    relabel_configs:
    - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
      action: keep
      regex: true
    - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]
      action: replace
      target_label: __scheme__
      regex: (https?)
    - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
      action: replace
      target_label: __metrics_path__
      regex: (.+)
    - source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]
      action: replace
      target_label: __address__
      regex: ([^:]+)(?::\d+)?;(\d+)
      replacement: $1:$2
    - action: labelmap
      regex: __meta_kubernetes_service_label_(.+)
    - source_labels: [__meta_kubernetes_namespace]
      action: replace
      target_label: kubernetes_namespace
    - source_labels: [__meta_kubernetes_service_name]
      action: replace
      target_label: kubernetes_name
    - source_labels: [__meta_kubernetes_pod_container_port_number]
      action: replace
      target_label: container_port

创建 k8s master 的 rules 监控规则文件 k8s-master.rules.yml ,将下面内容贴入

groups:
- name: k8s-master
  rules:
  - alert: k8s-master状态异常
    expr: up{job="k8s-master"} != 1
    for: 3m
    labels:
      severity: 非常严重
    annotations:
      summary: "k8s-master节点可能宕了"
      description: "{{$labels.instance}}: master节点状态异常"

  - alert: k8s-master节点CPU使用率
    expr: (1 - avg(irate(node_cpu_seconds_total{job="k8s-master",mode="idle"}[1m])) by (instance)) * 100  > 95
    for: 3m
    labels:
      severity: 一般
    annotations:
      summary: "master节点CPU使用率超过95%"
      description: "{{$labels.instance}}: master节点当前CPU使用率为: {{ $value }}"

 创建 k8s node 的监控规则文件k8s-node.rules.yml ,将下面内容贴入

groups:
- name: k8s-nodes
  rules:
  - alert: k8s-node状态异常
    expr: up{job="k8s-node"} != 1
    for: 3m
    labels:
      severity: 非常严重
    annotations:
      summary: "k8s节点可能宕了"
      description: "{{$labels.instance}}: Node节点状态异常"

  - alert: k8s-node节点CPU使用率
    expr: (1 - avg(irate(node_cpu_seconds_total{job="k8s-node",mode="idle"}[1m])) by (instance)) * 100  > 95
    for: 3m
    labels:
      severity: 一般
    annotations:
      summary: "Node节点CPU使用率超过95%"
      description: "{{$labels.instance}}: Node节点当前CPU使用率为: {{ $value }}"

 三、在 prometheus 目录内创建 k8s 启动 pod 的文件

创建 namespace.yaml 文件

cat > namespace.yaml <<EOF
apiVersion: v1
kind: Namespace
metadata:
   name: prometheus
   labels:
     name: prometheus
EOF

创建 rbac-setup.yaml 文件

cat > rbac-setup.yaml <<EOF
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: prometheus
rules:
- apiGroups: [""]
  resources:
  - nodes
  - nodes/proxy
  - services
  - endpoints
  - pods
  verbs: ["get", "list", "watch"]
- apiGroups:
  - extensions
  resources:
  - ingresses
  verbs: ["get", "list", "watch"]
- nonResourceURLs: ["/metrics"]
  verbs: ["get"]
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: prometheus
  namespace: prometheus
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: prometheus
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: prometheus
subjects:
- kind: ServiceAccount
  name: prometheus
  namespace: prometheus
EOF

创建 prometheus-deploy.yaml 文件

cat > prometheus-deploy.yaml <<EOF
apiVersion: v1
kind: Service
metadata:
  name: prometheus-svc
  namespace: prometheus
  labels:
    name: prometheus-deployment
spec:
  type: NodePort
  ports:
  - name: http
    port: 9090
    targetPort: 9090
    protocol: TCP
    nodePort: 30003
  selector:
    app: prometheus

---
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    name: prometheus-deployment
  name: prometheus
  namespace: prometheus
spec:
  replicas: 1
  selector:
    matchLabels:
      app: prometheus
  template:
    metadata:
      labels:
        app: prometheus
    spec:
      nodeSelector:
        nodename: prometheus
      serviceAccountName: prometheus    
      containers:
      - image: prom/prometheus:latest
        name: prometheus
        command:
        - "/bin/prometheus"
        args:
        - "--config.file=/etc/prometheus/prometheus.yml"
        - "--storage.tsdb.path=/home/prometheus"
        - "--storage.tsdb.retention=15d"
        - "--web.enable-lifecycle"
        ports:
        - containerPort: 9090
          protocol: TCP
        volumeMounts:
        - mountPath: "/home/prometheus"
          name: data
        - mountPath: "/etc/prometheus"
          name: prome-configmap
        - mountPath: "/etc/localtime"
          readOnly: false
          name: localtime
        resources:
          requests:
            cpu: 2000m
            memory: 4096Mi
          limits:
            cpu: 6000m
            memory: 6168Mi
      volumes:
      - name: data
        hostPath:
          path: "/data/prometheus/data"
      - name: prome-configmap
        configMap:
          name: prometheus-config
      - name: localtime
        hostPath:
          path: "/etc/localtime"
          type: File
EOF

四、解压下载好的 kube-state-metrics-1.9.8.tgz 文件到主目录 prometheus 下。然后将 standard 目录内的几个文件的 namespace 名字改成 promethues。

tar xvf kube-state-metrics-1.9.8.tar.gz
mv kube-state-metrics-1.9.8/examples/standard kube-state-metrics

五、创建命名空间

kubectl apply -f namespace.yaml

六、创建 rbac

kubectl apply -f rbac-setup.yaml

七、创建 configmap ,注意自己的configmap 目录路径

kubectl create configmap prometheus-config -n prometheus --from-file=/data/prometheus/configmap

八、创建 promethues 的 pod 和 service

kubectl apply -f prometheus-deploy.yaml

九、创建 kube-state-metrics

kubectl apply -f kube-state-metrics/

等待启动完毕后

 十、打开 ui 页面查看,我的 pod 网络和办公网是打通的,直接访问

 在 graph 页面输入 pod 关键字查看是否有 pod 的详细数据

grafana 、 报警等,请查看文档   服务监控栏

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值