文章目录
部署Prometheus
前提:需要k8s集群环境 部署k8s集群博客
初识Prometheus监控平台
创建命名空间
$ kubectl create namespace monitor
创建RBAC规则
创建RBAC规则,包含
ServiceAccount
、ClusterRole
、ClusterRoleBinding
三类YAML文件
vim prometheus-rabc.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
name: prometheus
namespace: monitor
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: prometheus
rules:
- apiGroups: [""]
resources: ["nodes","nodes/proxy","services","endpoints","pods"]
verbs: ["get", "list", "watch"]
- apiGroups: ["extensions"]
resources: ["ingress"]
verbs: ["get", "list", "watch"]
- nonResourceURLs: ["/metrics"]
verbs: ["get"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: prometheus
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: cluster-admin
subjects:
- kind: ServiceAccount
name: prometheus
namespace: monitor
验证
$ kubectl apply -f prometheus-rabc.yaml
$ kubectl get sa prometheus -n monitor
NAME SECRETS AGE
prometheus 0 52s
$ kubectl get clusterrole prometheus
NAME CREATED AT
prometheus 2024-08-19T20:06:06Z
$ kubectl get clusterrolebinding prometheus
NAME ROLE AGE
prometheus ClusterRole/cluster-admin 9m15s
创建ConfigMap类型的prometheus配置文件
vim prometheus-cm.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: prometheus-config
namespace: monitor
data:
prometheus.yml: |
global:
scrape_interval: 15s
evaluation_interval: 15s
external_labels:
cluster: "kubernetes"
############ 数据采集job ###################
scrape_configs:
- job_name: prometheus
static_configs:
- targets: ['127.0.0.1:9090']
labels:
instance: prometheus
############ 指定告警规则文件路径位置 ###################
rule_files:
- /etc/prometheus/rules/*.rules
验证
$ kubectl apply -f prometheus-cm.yaml
$ kubectl get cm prometheus-config -n monitor
NAME DATA AGE
prometheus-config 1 4s
创建ConfigMap类型的prometheus rules配置文件
使用ConfigMap方式创建prometheus rules配置文件
包含的内容是两块,分别是
general.rules
和node.rules
使用以下命令创建Prometheus的另外两个配置文件
vim prometheus-rules.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: prometheus-rules
namespace: monitor
data:
general.rules: |
groups:
- name: general.rules
rules:
- alert: InstanceDown
expr: |
up{job=~"k8s-nodes|prometheus"} == 0
for: 1m
labels:
severity: critical
annotations:
summary: "Instance {{ $labels.instance }} 停止工作"
description: "{{ $labels.instance }} 主机名:{{ $labels.hostname }} 已经停止1分钟以上."
node.rules: |
groups:
- name: node.rules
rules:
- alert: NodeFilesystemUsage
expr: |
100 - (node_filesystem_avail_bytes / node_filesystem_size_bytes) * 100 > 85
for: 1m
labels:
severity: warning
annotations:
summary: "Instance {{ $labels.instance }} : {{ $labels.mountpoint }} 分区使用率过高"
description: "{{ $labels.instance }} 主机名:{{ $labels.hostname }} : {{ $labels.mountpoint }} 分区使用大于85% (当前值: {{ $value }})"
验证
$ kubectl apply -f prometheus-rules.yaml
$ kubectl get cm -n monitor prometheus-rules
NAME DATA AGE
prometheus-rules 2 11s
创建prometheus svc
vim prometheus-svc.yaml
apiVersion: v1
kind: Service
metadata:
name: prometheus
namespace: monitor
labels:
k8s-app: prometheus
spec:
type: ClusterIP
ports:
- name: http
port: 9090
targetPort: 9090
selector:
k8s-app: prometheus
验证
$ kubectl apply -f prometheus-svc.yaml
$ kubectl get svc -n monitor prometheus
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
prometheus ClusterIP 10.1.8.76 <none> 9090/TCP 9m29s
创建prometheus deploy
由于Prometheus需要对数据进行持久化,以便在重启后能够恢复历史数据。所以这边我们通过早先课程部署的NFS做存储来实现持久化。
当前我们使用NFS提供的StorageClass来做数据存储
创建
sc
可以看这个博客
vim prometheus-pvc.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: prometheus-data-pvc
namespace: monitor
spec:
accessModes:
- ReadWriteMany
storageClassName: "nfs-storage"
resources:
requests:
storage: 10Gi
vim prometheus-deploy.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: prometheus
namespace: monitor
labels:
k8s-app: prometheus
spec:
replicas: 1
selector:
matchLabels:
k8s-app: prometheus
template:
metadata:
labels:
k8s-app: prometheus
spec:
serviceAccountName: prometheus
containers:
- name: prometheus
image: docker.m.daocloud.io/prom/prometheus:v2.36.0
imagePullPolicy: IfNotPresent
ports:
- name: http
containerPort: 9090
securityContext:
runAsUser: 65534
privileged: true
command:
- "/bin/prometheus"
args:
- "--config.file=/etc/prometheus/prometheus.yml"
- "--web.enable-lifecycle"
- "--storage.tsdb.path=/prometheus"
- "--storage.tsdb.retention.time=10d"
- "--web.console.libraries=/etc/prometheus/console_libraries"
- "--web.console.templates=/etc/prometheus/consoles"
resources:
limits:
cpu: 2000m
memory: 2048Mi
requests:
cpu: 1000m
memory: 512Mi
readinessProbe:
httpGet:
path: /-/ready
port: 9090
initialDelaySeconds: 5
timeoutSeconds: 10
livenessProbe:
httpGet:
path: /-/healthy
port: 9090
initialDelaySeconds: 30
timeoutSeconds: 30
volumeMounts:
- name: data
mountPath: /prometheus
subPath: prometheus
- name: config
mountPath: /etc/prometheus
- name: prometheus-rules
mountPath: /etc/prometheus/rules
- name: configmap-reload
image: jimmidyson/configmap-reload:v0.5.0
imagePullPolicy: IfNotPresent
args:
- "--volume-dir=/etc/config"
- "--webhook-url=http://localhost:9090/-/reload"
resources:
limits:
cpu: 100m
memory: 100Mi
requests:
cpu: 10m
memory: 10Mi
volumeMounts:
- name: config
mountPath: /etc/config
readOnly: true
volumes:
- name: data
persistentVolumeClaim:
claimName: prometheus-data-pvc
- name: prometheus-rules
configMap:
name: prometheus-rules
- name: config
configMap:
name: prometheus-config
部署的
Deployment
资源文件中的containers
部分配置了两个容器,分别是
- prometheus: Prometheus 容器是主容器,用于运行 Prometheus 进程
- configmap-reload: 用于监听指定的 ConfigMap 文件中的内容,如果内容发生更改,则执行 webhook url 请求,因为 Prometheus 支持通过接口重新加载配置文件,所以这里使用这个容器提供的机制来完成 Prometheus ConfigMap 配置文件内容一有更改,就执行 Prometheus 的 /-/reload 接口,进行更新配置操作
上面资源文件中 Prometheus 参数说明:
- –web.enable-lifecycle: 启用 Prometheus 用于重新加载配置的 /-/reload 接口
- –config.file: 指定 Prometheus 配置文件所在地址,这个地址是相对于容器内部而言的
- –storage.tsdb.path: 指定 Prometheus 数据存储目录地址,这个地址是相对于容器而言的
- –storage.tsdb.retention.time: 指定删除旧数据的时间,默认为 15d
- –web.console.libraries: 指定控制台组件依赖的存储路径
- –web.console.templates: 指定控制台模板的存储路径
验证
$ kubectl apply -f prometheus-pvc.yaml
$ kubectl apply -f prometheus-deploy.yaml
$ kubectl get pvc -n monitor
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
prometheus-data-pvc Bound pvc-95786ed1-2d43-46ca-b15c-b3dcf958a6b6 10Gi RWX nfs-storage 38s
$ kubectl get deploy -n monitor
NAME READY UP-TO-DATE AVAILABLE AGE
prometheus 1/1 1 1 83s
$ kubectl get pods -n monitor
NAME READY STATUS RESTARTS AGE
prometheus-58cf9d5989-sttk2 2/2 Running 0 100s
创建prometheus ingress实现外部域名访问
ingress 部署可以看Ingress部署
vim prometheus-ing.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
namespace: monitor
name: prometheus-ingress
spec:
ingressClassName: nginx
rules:
- host: prometheus.kubernets.cn
http:
paths:
- pathType: Prefix
backend:
service:
name: prometheus
port:
number: 9090
path: /
验证
$ kubectl apply -f prometheus-ing.yaml
$ kubectl get ing -n monitor prometheus-ingress
NAME CLASS HOSTS ADDRESS PORTS AGE
prometheus-ingress nginx prometheus.kubernets.cn 80 28s
$ kubectl get svc -n ingress-nginx
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
ingress-nginx-controller NodePort 10.1.231.117 <none> 80:30186/TCP,443:30153/TCP 32h
$ echo '11.0.1.92 prometheus.kubernets.cn' > /ect/hosts
$ curl prometheus.kubernets.cn:30186
<a href="/graph">Found</a>
浏览器访问 web UI界面
http://prometheus.kubernets.cn
prometheus
监控平台
- Graph:
用于绘制图表
,可以选择不同的时间范围、指标和标签,还可以添加多个图表进行比较。 - Alert:
用于设置告警规则
,当指标达到设定的阈值时,会发送告警通知。 - Explore:
用于查询和浏览指标数据
,可以通过查询表达式或者标签过滤器来查找数据。 - **Status: **用于查看prometheus的状态信息,包括当前的targets、rules、alerts等
- **Config:**用于编辑
prometheus
的配置文件,可以添加、修改和删除配置项
`