prometheus/grafana监控数据收集与展示——k8s从入门到高并发系列教程(九)

 ads:

关注以下公众号查看更多文章

 

      我们用自动化流程把我们提交的代码打包成镜像部署到k8s集群中后,经过jmeter压测发现其实很不理想,在接口返回的正确性和响应时间上都有很大的问题。这并不是我们代码本身写错了什么,因为同样的代码有一半概率是成功执行的。代码的尽头是神学?错,代码的尽头是运维!要想找出原因,我们先建立我们的检测系统。从docker容器监控、phpfpm监控、nginx监控图标中找出问题的所在

Prometheus安装

prometheus内置一个时序数据库,用于对系统运行数据的收集与展示

prometheus amd版本的docker镜像为 prom/prometheus,而arm64处理器的docker镜像为prom/prometheus-linux-arm64,数据存储目录为 /prometheus,需要暴露端口号 9090 供外部访问,配置文件为 /etc/prometheus/prometheus.yml

先创建一个存储卷prometheus-data

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: promethues-data
  namespace: promethues
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 250Mi
  storageClassName: local-path
  volumeMode: Filesystem

创建一个初始化的prometheus配置文件

apiVersion: v1
data:
  prometheus.yml: |-
    global:
      scrape_interval: 2s
      evaluation_interval: 2s
    scrape_configs:
kind: ConfigMap
metadata:
  name: prometheus-config
  namespace: promethues

因为监控系统的特殊权限要求,需要先设置一个prometheus的账户

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: promethues
rules:
- apiGroups: [""]
  resources:
  - nodes
  - nodes/proxy
  - services
  - endpoints
  - pods
  verbs: ["get", "list", "watch"]
- apiGroups:
  - extensions
  resources:
  - ingresses
  verbs: ["get", "list", "watch"]
- nonResourceURLs: ["/metrics"]
  verbs: ["get"]
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: promethues
  namespace: promethues
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: promethues
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: promethues
subjects:
- kind: ServiceAccount
  name: promethues
  namespace: promethues

创建prometheus的deployment,这个deployment是使用上面创建的service account,并挂载凭证到容器中,容器中的路径为 /var/run/secrets/kubernetes.io/serviceaccount/

apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    k8s.kuboard.cn/layer: monitor
    k8s.kuboard.cn/name: promethues-k8s
  name: promethues-k8s
  namespace: promethues
spec:
  selector:
    matchLabels:
      k8s.kuboard.cn/layer: monitor
      k8s.kuboard.cn/name: promethues-k8s
  template:
    metadata:
      labels:
        k8s.kuboard.cn/layer: monitor
        k8s.kuboard.cn/name: promethues-k8s
    spec:
      automountServiceAccountToken: true
      containers:
        - image: prom/prometheus-linux-arm64
          name: promethues
          ports:
            - containerPort: 9090
              name: api
              protocol: TCP
          volumeMounts:
            - mountPath: /etc/prometheus
              name: volume-jpcw8
      serviceAccount: promethues
      serviceAccountName: promethues
      volumes:
        - configMap:
            defaultMode: 420
            name: prometheus-config
          name: volume-jpcw8

开放9090端口外网访问,外部端口30044

apiVersion: v1
kind: Service
metadata:
  labels:
    k8s.kuboard.cn/layer: monitor
    k8s.kuboard.cn/name: promethues-k8s
  name: promethues-k8s
  namespace: promethues
spec:
  ports:
    - name: 8jmgrm
      nodePort: 30044
      port: 9090
      protocol: TCP
      targetPort: 9090
  selector:
    k8s.kuboard.cn/layer: monitor
    k8s.kuboard.cn/name: promethues-k8s
  type: NodePort

这样,可以访问 http://127.0.0.1:30044/ 查看prometheus界面了

cadvisor抓取节点容器中的cpu内存信息

prometheus.yml文件中增加以下job,通过cadvisor抓取节点容器中的cpu内存信息

- job_name: 'kubernetes-pods'
  scheme: https
  tls_config:
    ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
  bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
  kubernetes_sd_configs:
  - role: node
  relabel_configs:
  - target_label: __address__
    replacement: kubernetes.default.svc:443
  - source_labels: [__meta_kubernetes_node_name]
    regex: (.+)
    target_label: __metrics_path__
    replacement: /api/v1/nodes/${1}/proxy/metrics/cadvisor
  • 使用promethues这个service account访问抓取节点
  • 通过k8s的/proxy/metrics/cadvisor这个api抓取容器cpu内存信息,使用https协议抓取
  • __address__ 当前Target实例的访问地址<host>:<port>
  • __metrics_path__:采集目标服务访问地址的访问路径,从 __meta_kubernetes_node_name 中提取数值填充节点名称
  • 把__meta_kubernetes_node_label_ 开头的标签全部入库

重启deployment,在promethues的 status->targes可以看到如下内容

 

可以看到kubernetes这个job的数据抓取地址为 https://kubernetes.default.svc/api/v1/nodes/primary/proxy/metrics/cadvisor 

打开kubectl proxy,看到输出 

Starting to serve on 127.0.0.1:8001

新开一个窗口,把 https://kubernetes.default.svc 替换成 http://127.0.0.1:8001, curl访问试试

curl http://127.0.0.1:8001/api/v1/nodes/primary/proxy/metrics/cadvisor | grep HELP | grep cpu

最后找到cpu占用比率的百分比查询字段为

container_cpu_load_average_10s{namespace="test-project1",image=~".*mustafa_project.*"}

 从图像上来看,即使请求失败,cpu并没有产生什么变化

内存占用比率的百分比查询字段为

container_memory_usage_bytes{namespace="test-project1",image=~".*mustafa_project.*"}/container_spec_memory_limit_bytes{namespace="test-project1",image=~".*mustafa_project.*"}

 从这张图可以看出,内存占比不到10%,但接口请求已经出现失败的情况了,所以失败的原因目前并不在cpu和内存。

下面我们开始监控php-fpm

安装php-fpm-exporter

phpfpm-exporter 的github地址为 https://github.com/bakins/php-fpm-exporter.git ,我们使用镜像多阶段构建,启动一个go语言的容器,进行编译,把编译好的可执行文件放到自己phpfpm镜像的/usr/local/bin目录下

phpfpm项目的dockerfile文件头部增加以下内容

FROM golang:buster as builder-golang

RUN git clone https://ghproxy.com/https://github.com/bakins/php-fpm-exporter.git /tmp/php-fpm-exporter \
    && cd /tmp/php-fpm-exporter && sed -i 's/amd64/arm64/g' script/build \
    && ./script/build && chmod +x php-fpm-exporter.linux.arm64

FROM php:7.2-fpm as final

COPY --from=builder-golang /tmp/php-fpm-exporter/php-fpm-exporter.linux.arm64 /usr/local/bin/php-fpm-exporter

就是修改那个git项目的script/build文件,啊amd64换成arm64进行编译,最后把编译好的可执行文件拷贝的自己的镜像中

phpfpm开启监控

修改phpfpm的www.conf文件,修改以下内容

pm.status_path = /php_status
ping.path = /ping

这样访问php_status可以抓取到php的状态信息

启动phpfom-export向外部发送php_status信息,修改entry.sh文件

#!/bin/sh

php-fpm -D

nginx

php-fpm-exporter --addr="0.0.0.0:9190" --fastcgi="tcp://127.0.0.1:9000/php_status"

9190端口抓取php_status信息

需要一个service公开9190端口给promethus查询

apiVersion: v1
kind: Service
metadata:
  name: test-client1
spec:
  ports:
    - name: http-api
      protocol: TCP
      port: 80
      targetPort: 80
    - name: http-php-fpm
      protocol: TCP
      port: 9190
      targetPort: 9190
  selector:
    app: test-client1

配置prometheus抓取php-fpm信息

上面prometheus是自动发现nodes,然后接口拿nodes上的cadvisor接口内容获取容器的cpu、内存信息,这次是prometheus自动发现pods,获取pod的9190端口内容来抓取php-fpm,并且只抓取project1的命令空间

- job_name: 'php-fpm'
  scheme: http
  tls_config:
    ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
    insecure_skip_verify: true
  bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
  kubernetes_sd_configs:
  - role: pod
  relabel_configs:
  - action: labelmap
    regex: __meta_kubernetes_pod_label_(.+)
  - source_labels: [__meta_kubernetes_namespace]
    action: keep
    regex: .*project1.*
  - source_labels: [__meta_kubernetes_namespace]
    action: replace
    target_label: kubernetes_namespace
  - source_labels: [__meta_kubernetes_pod_ip]
    action: replace
    regex: (.+)
    target_label: __address__
    replacement: ${1}:9190

实际上抓了两个pod的phpfpm信息

 访问了一下这个endporints,看看返回

➜  ~ curl http://10.42.0.20:9190/metrics
# HELP phpfpm_accepted_connections_total Total number of accepted connections
# TYPE phpfpm_accepted_connections_total counter
phpfpm_accepted_connections_total 145
# HELP phpfpm_active_max_processes Maximum active process count
# TYPE phpfpm_active_max_processes counter
phpfpm_active_max_processes 1
# HELP phpfpm_listen_queue_connections Number of connections that have been initiated but not yet accepted
# TYPE phpfpm_listen_queue_connections gauge
phpfpm_listen_queue_connections 0
# HELP phpfpm_listen_queue_length_connections The length of the socket queue, dictating maximum number of pending connections
# TYPE phpfpm_listen_queue_length_connections gauge
phpfpm_listen_queue_length_connections 511
# HELP phpfpm_listen_queue_max_connections Max number of connections the listen queue has reached since FPM start
# TYPE phpfpm_listen_queue_max_connections counter
phpfpm_listen_queue_max_connections 0
# HELP phpfpm_max_children_reached_total Number of times the process limit has been reached
# TYPE phpfpm_max_children_reached_total counter
phpfpm_max_children_reached_total 0
# HELP phpfpm_processes_total process count
# TYPE phpfpm_processes_total gauge
phpfpm_processes_total{state="active"} 1
phpfpm_processes_total{state="idle"} 1
# HELP phpfpm_scrape_failures_total Number of errors while scraping php_fpm
# TYPE phpfpm_scrape_failures_total counter
phpfpm_scrape_failures_total 0
# HELP phpfpm_slow_requests_total Number of requests that exceed request_slowlog_timeout
# TYPE phpfpm_slow_requests_total counter
phpfpm_slow_requests_total 0
# HELP phpfpm_up able to contact php-fpm
# TYPE phpfpm_up gauge
phpfpm_up 1

查看请求量变化 

irate(phpfpm_accepted_connections_total{app="test-client1"}[1m])

查看phpfpm等待队列的长度

phpfpm_listen_queue_connections

查看活跃的php-fpm进程数

phpfpm_processes_total{state="active"}

api项目偶尔有5个php-fpm进程在运行,而client1项目则始终只有一个php-fpm进程运行,这造成了php-fpm有时候来不及处理接口调用,导致微服务超时

phpfpm进程数优化 

        单个docker容器中phpfpm进程数设为固定数量,当请求量增加时可以使用k8s的自动扩容提升并发处理的能力,phpfpm进程数的设置数量为 内存容量/30M 大约4个到5个

pm = static
pm.max_children = 5
sum(phpfpm_processes_total{app="test-client1"})

这样查询phpfpm进程数始终是5个了

安装nginx-exporter

监控nginx需要安装nginx-exporter

# 安装nginx-exporter
RUN curl https://ghproxy.com/https://github.com/nginxinc/nginx-prometheus-exporter/releases/download/v0.11.0/nginx-prometheus-exporter_0.11.0_linux_arm64.tar.gz -o /tmp/nginx-prometheus-exporter.tar.gz \
    && cd /tmp && tar zxvf nginx-prometheus-exporter.tar.gz \
    && mv nginx-prometheus-exporter /usr/local/bin/nginx-prometheus-exporter \
    && rm -rf /tmp/*

nginx开启监控需要在站点配置文件中增加网络入口

	location /nginx-status {
        stub_status;

        access_log off;
        allow 127.0.0.1;
        deny all;
    }

修改启动脚本,在 9113 端口抓取nginx状态描述

#!/bin/sh

php-fpm -D

nginx

nohup php-fpm-exporter --addr="0.0.0.0:9190" --fastcgi="tcp://127.0.0.1:9000/php_status" &

nginx-prometheus-exporter -nginx.scrape-uri=http://127.0.0.1/stub_status

需要一个service公开9113端口给promethus查询

    - name: http-nginx-exporter
      protocol: TCP
      port: 9113
      targetPort: 9113

配合prometheus抓取nginx-export的状态信息

- job_name: 'nginx-exporter'
  scheme: http
  tls_config:
    ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
    insecure_skip_verify: true
  bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
  kubernetes_sd_configs:
  - role: pod
  relabel_configs:
  - action: labelmap
    regex: __meta_kubernetes_pod_label_(.+)
  - source_labels: [__meta_kubernetes_namespace]
    action: keep
    regex: .*project1.*
  - source_labels: [__meta_kubernetes_namespace]
    action: replace
    target_label: kubernetes_namespace
  - source_labels: [__meta_kubernetes_pod_ip]
    action: replace
    regex: (.+)
    target_label: __address__
    replacement: ${1}:9113

nginx-exporter抓出来的内容

# HELP nginx_connections_accepted Accepted client connections
# TYPE nginx_connections_accepted counter
nginx_connections_accepted 2
# HELP nginx_connections_active Active client connections
# TYPE nginx_connections_active gauge
nginx_connections_active 1
# HELP nginx_connections_handled Handled client connections
# TYPE nginx_connections_handled counter
nginx_connections_handled 2
# HELP nginx_connections_reading Connections where NGINX is reading the request header
# TYPE nginx_connections_reading gauge
nginx_connections_reading 0
# HELP nginx_connections_waiting Idle client connections
# TYPE nginx_connections_waiting gauge
nginx_connections_waiting 0
# HELP nginx_connections_writing Connections where NGINX is writing the response back to the client
# TYPE nginx_connections_writing gauge
nginx_connections_writing 1
# HELP nginx_http_requests_total Total http requests
# TYPE nginx_http_requests_total counter
nginx_http_requests_total 23
# HELP nginx_up Status of the last metric scrape
# TYPE nginx_up gauge
nginx_up 1
# HELP nginxexporter_build_info Exporter build information
# TYPE nginxexporter_build_info gauge
nginxexporter_build_info{arch="linux/arm64",commit="e4a6810d4f0b776f7fde37fea1d84e4c7284b72a",date="2022-09-07T21:09:51Z",dirty="false",go="go1.19",version="0.11.0"} 1

查询nginx接口调用量

irate(nginx_http_requests_total{app="test-api"}[1m])

查询使用中的连接数

nginx_connections_active{app="test-api"}

grafana制作可视化面板

创建grafana的数据存储卷

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  annotations:
    k8s.kuboard.cn/pvcType: Dynamic
  name: grafana
  namespace: promethues
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 50Mi
  storageClassName: local-path
  volumeMode: Filesystem

创建grafana deployment

apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    k8s.kuboard.cn/layer: web
    k8s.kuboard.cn/name: grafana-k8s
  name: grafana-k8s
  namespace: promethues
spec:
  selector:
    matchLabels:
      k8s.kuboard.cn/layer: web
      k8s.kuboard.cn/name: grafana-k8s
  template:
    metadata:
      labels:
        k8s.kuboard.cn/layer: web
        k8s.kuboard.cn/name: grafana-k8s
    spec:
      containers:
        - image: grafana/grafana
          imagePullPolicy: IfNotPresent
          name: grafana
          ports:
            - containerPort: 3000
              name: grafana
              protocol: TCP
          volumeMounts:
            - mountPath: /var/lib/grafana
              name: volume-62hxi
      volumes:
        - name: volume-62hxi
          persistentVolumeClaim:
            claimName: grafana

创建grafana service

apiVersion: v1
kind: Service
metadata:
  labels:
    k8s.kuboard.cn/layer: web
    k8s.kuboard.cn/name: grafana-k8s
  name: grafana-k8s
  namespace: promethues
spec:
  ports:
    - name: ytfnyw
      nodePort: 31968
      port: 3000
      protocol: TCP
      targetPort: 3000
  selector:
    k8s.kuboard.cn/layer: web
    k8s.kuboard.cn/name: grafana-k8s
  type: NodePort

可以打开 http://127.0.0.1:31968/login 访问grafana,登陆账号 admin 密码 admin

grafana配置prometheus数据源

点击配置 -> 数据源,选择 prometheus,数据源地址写 http://promethues-k8s:9090

 新增看板 new dashboard,选择 add new panel

内存监控

接口调用量监控

处于等待状态phpfpm连接监控

 

phpfpm数量监控

 面板总体效果

 相关链接

手把手教你部署nginx+php

php和nginx镜像合并 && 代码打包到镜像

nginx-php镜像安装常用软件

yaf && yar微服务/hprose微服务 镜像初始化

常用开发工具:php_codesniffer代码规范检查&修复、phpstan语法检查、phpunit单元测试

.gitlab-ci.yaml自动镜像打包&&互联网企业规范化上线流程(上)

kustomize/kubectl自动镜像部署&&互联网企业规范化上线流程(下)

apisix网关、JMeter压测  

prometheus/grafana监控数据收集与展示

k8s容器网络性能调优

supervisor进程管理

安装opcache和apcu

APM性能监测工具skywalking

链路跟踪工具zipkin

phpfpm和nginx配置

php整合apollo配置中心

php rdkafka操作kafka消息队列

  • 0
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

fanghailiang2016

扔个包子砸我一下吧~

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值