背景
vCenter本身有监控和告警功能,监控是针对独立资源的展示,邮件告警仅支持windows邮件系统,所以我们通过Prometheus实现对vSphere资源的监控和告警功能。
技术架构:
部署
vmware-exporter
Github项目:pryorda/vmware_exporter
apiVersion: v1
kind: Secret
metadata:
name: vmware-config
type: Opaque
data:
VSPHERE_USER: xxxxxxxxxxxxxxxxxx
VSPHERE_PASSWORD: xxxxxxxxxxxxxxxxxx
VSPHERE_HOST: xxxxxxxxxxxxxxxx
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: vmware-exporter
labels:
app: vmware-exporter
spec:
selector:
matchLabels:
app: vmware-exporter
replicas: 1
template:
metadata:
labels:
app: vmware-exporter
spec:
containers:
- name: vmware-exporter
image: pryorda/vmware_exporter:v0.18.2
resources:
requests:
cpu: 100m
memory: 100Mi
limits:
cpu: 2
memory: 500Mi
livenessProbe:
tcpSocket:
port: 9272
initialDelaySeconds: 5
timeoutSeconds: 5
successThreshold: 1
failureThreshold: 3
periodSeconds: 10
env:
- name: VSPHERE_IGNORE_SSL
value: "TRUE"
- name: VSPHERE_SPECS_SIZE
value: "2000"
- name: VSPHERE_HOST
valueFrom:
secretKeyRef:
name: vmware-config
key: VSPHERE_HOST
- name: VSPHERE_USER
valueFrom:
secretKeyRef:
name: vmware-config
key: VSPHERE_USER
- name: VSPHERE_PASSWORD
valueFrom:
secretKeyRef:
name: vmware-config
key: VSPHERE_PASSWORD
ports:
- containerPort: 9272
name: http
volumeMounts:
- name: localtime
mountPath: /etc/localtime
volumes:
- name: localtime
hostPath:
path: /usr/share/zoneinfo/Asia/Shanghai
restartPolicy: Always
---
apiVersion: v1
kind: Service
metadata:
name: vmware-exporter
spec:
selector:
app: vmware-exporter
type: ClusterIP
ports:
- name: http
protocol: TCP
port: 9272
prometheus
kind: ConfigMap
apiVersion: v1
metadata:
name: prom-config
data:
prometheus.yml: |
# my global config
global:
scrape_interval: 15s
evaluation_interval: 15s
scrape_timeout: 15s
# Alertmanager configuration
alerting:
alertmanagers:
- static_configs:
- targets:
# - alertmanager:9093
rule_files:
# - "first_rules.yml"
# - "second_rules.yml"
scrape_configs:
- job_name: "prometheus"
static_configs:
- targets: ["localhost:9090"]
- job_name: "vmware"
metrics_path: /metrics
scheme: http
scrape_interval: 60s # vmware-exporter响应时间较长,收集时间周期配置长点
scrape_timeout: 30s # 超时时间
static_configs:
- targets: ["vmware-exporter.prometheus:9272"]
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: prometheus
spec:
selector:
matchLabels:
app: prometheus
serviceName: "prom-headless"
replicas: 1
template:
metadata:
labels:
app: prometheus
spec:
initContainers:
- name: chown
image: busybox:1.32
imagePullPolicy: IfNotPresent
command:
["sh", "-c", "chown -R nobody:nobody /prometheus"]
securityContext:
privileged: true
volumeMounts:
- name: data
mountPath: /prometheus
containers:
- name: prometheus
image: prom/prometheus:v2.33.3
ports:
- containerPort: 9090
name: http
volumeMounts:
- name: data
mountPath: /prometheus
- name: config
mountPath: /etc/prometheus/prometheus.yml
subPath: prometheus.yml
volumes:
- name: config
configMap:
name: prom-config
volumeClaimTemplates:
- metadata:
name: data
spec:
accessModes: [ "ReadWriteOnce" ]
resources:
requests:
storage: 100Gi
---
apiVersion: v1
kind: Service
metadata:
name: prom-headless
spec:
selector:
app: prometheus
type: ClusterIP
clusterIP: None
ports:
- name: http
protocol: TCP
port: 9090
---
apiVersion: v1
kind: Service
metadata:
name: prometheus
spec:
selector:
app: prometheus
type: NodePort
ports:
- name: http
protocol: TCP
port: 9090
grafana
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: grafana
spec:
selector:
matchLabels:
app: grafana
serviceName: "grafana-headless"
replicas: 1
template:
metadata:
labels:
app: grafana
spec:
containers:
- name: grafana
image: grafana:8.3.6
ports:
- containerPort: 3000
name: http
volumeMounts:
- name: data
mountPath: /var/lib/grafana
- name: config
mountPath: /etc/grafana/grafana.ini
subPath: grafana.ini
volumes:
- name: config
configMap:
name: grafana-config
volumeClaimTemplates:
- metadata:
name: data
spec:
accessModes: [ "ReadWriteOnce" ]
resources:
requests:
storage: 10Gi
---
apiVersion: v1
kind: Service
metadata:
name: grafana
spec:
selector:
app: grafana
type: ClusterIP
ports:
- name: http
protocol: TCP
port: 3000
---
apiVersion: v1
kind: Service
metadata:
name: grafana-headless
spec:
selector:
app: grafana
type: ClusterIP
clusterIP: None
ports:
- name: http
protocol: TCP
port: 3000
---
kind: ConfigMap
apiVersion: v1
metadata:
name: grafana-config
data:
grafana.ini: |
[smtp]
enabled = true
host = xxxxxxxxxx
user = xxxxxxxxxx
password = xxxxxxxxxxx
skip_verify = false
from_address = xxxxxxxxxxx
from_name = Grafana
展示
dashboard文件:https://github.com/pryorda/vmware_exporter/tree/main/dashboards
集群监控
节点监控
虚拟机监控
聚合监控
告警
告警规则
通过具体指标和表达式定义规则,内存定义示例: