VMware vSphere监控和告警

背景

vCenter本身有监控和告警功能,监控是针对独立资源的展示,邮件告警仅支持windows邮件系统,所以我们通过Prometheus实现对vSphere资源的监控和告警功能。
技术架构:image.png

部署

vmware-exporter

Github项目:pryorda/vmware_exporter

apiVersion: v1
kind: Secret
metadata:
  name: vmware-config
type: Opaque
data:
  VSPHERE_USER: xxxxxxxxxxxxxxxxxx
  VSPHERE_PASSWORD: xxxxxxxxxxxxxxxxxx
  VSPHERE_HOST: xxxxxxxxxxxxxxxx

---
apiVersion: apps/v1
kind: Deployment
metadata:
  name:  vmware-exporter
  labels:
    app:  vmware-exporter
spec:
  selector:
    matchLabels:
      app: vmware-exporter
  replicas: 1
  template:
    metadata:
      labels:
        app: vmware-exporter
    spec:
      containers:
      - name: vmware-exporter
        image: pryorda/vmware_exporter:v0.18.2
        resources:
          requests:
            cpu: 100m
            memory: 100Mi
          limits:
            cpu: 2
            memory: 500Mi
        livenessProbe:
          tcpSocket:
            port: 9272
          initialDelaySeconds: 5
          timeoutSeconds: 5
          successThreshold: 1
          failureThreshold: 3
          periodSeconds: 10
        env:
        - name: VSPHERE_IGNORE_SSL
          value: "TRUE"
        - name: VSPHERE_SPECS_SIZE
          value: "2000"
        - name: VSPHERE_HOST
          valueFrom:
            secretKeyRef:
              name: vmware-config
              key: VSPHERE_HOST
        - name: VSPHERE_USER
          valueFrom:
            secretKeyRef:
              name: vmware-config
              key: VSPHERE_USER
        - name: VSPHERE_PASSWORD
          valueFrom:
            secretKeyRef:
              name: vmware-config
              key: VSPHERE_PASSWORD
        ports:
        - containerPort: 9272
          name: http
        volumeMounts:
        - name: localtime
          mountPath: /etc/localtime
      volumes:
        - name: localtime
          hostPath:
            path: /usr/share/zoneinfo/Asia/Shanghai
      restartPolicy: Always
---
apiVersion: v1
kind: Service
metadata:
  name: vmware-exporter
spec:
  selector:
    app: vmware-exporter
  type: ClusterIP
  ports:
  - name: http
    protocol: TCP
    port: 9272
prometheus
kind: ConfigMap
apiVersion: v1
metadata:
  name: prom-config
data:
  prometheus.yml: |
    # my global config
    global:
      scrape_interval: 15s
      evaluation_interval: 15s
      scrape_timeout: 15s

    # Alertmanager configuration
    alerting:
      alertmanagers:
        - static_configs:
            - targets:
              # - alertmanager:9093

    rule_files:
      # - "first_rules.yml"
      # - "second_rules.yml"

    scrape_configs:
      - job_name: "prometheus"
        static_configs:
          - targets: ["localhost:9090"]
      - job_name: "vmware"
        metrics_path: /metrics
        scheme: http
        scrape_interval: 60s      # vmware-exporter响应时间较长,收集时间周期配置长点
        scrape_timeout: 30s				# 超时时间
        static_configs:
          - targets: ["vmware-exporter.prometheus:9272"]

---
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: prometheus
spec:
  selector:
    matchLabels:
      app: prometheus
  serviceName: "prom-headless"
  replicas: 1
  template:
    metadata:
      labels:
        app: prometheus
    spec:
      initContainers:
        - name: chown
          image: busybox:1.32
          imagePullPolicy: IfNotPresent
          command:
            ["sh", "-c", "chown -R nobody:nobody /prometheus"]
          securityContext:
            privileged: true
          volumeMounts:
            - name: data
              mountPath: /prometheus
      containers:
      - name: prometheus
        image: prom/prometheus:v2.33.3
        ports:
        - containerPort: 9090
          name: http
        volumeMounts:
        - name: data
          mountPath: /prometheus
        - name: config
          mountPath: /etc/prometheus/prometheus.yml
          subPath: prometheus.yml
      volumes:
      - name: config
        configMap:
          name: prom-config
  volumeClaimTemplates:
  - metadata:
      name: data
    spec:
      accessModes: [ "ReadWriteOnce" ]
      resources:
        requests:
          storage: 100Gi
---
apiVersion: v1
kind: Service
metadata:
  name: prom-headless
spec:
  selector:
    app: prometheus
  type: ClusterIP
  clusterIP: None
  ports:
  - name: http
    protocol: TCP
    port: 9090
---
apiVersion: v1
kind: Service
metadata:
  name: prometheus
spec:
  selector:
    app: prometheus
  type: NodePort
  ports:
  - name: http
    protocol: TCP
    port: 9090
grafana
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: grafana
spec:
  selector:
    matchLabels:
      app: grafana
  serviceName: "grafana-headless"
  replicas: 1
  template:
    metadata:
      labels:
        app: grafana
    spec:
      containers:
      - name: grafana
        image: grafana:8.3.6
        ports:
        - containerPort: 3000
          name: http
        volumeMounts:
        - name: data
          mountPath: /var/lib/grafana
        - name: config
          mountPath: /etc/grafana/grafana.ini
          subPath: grafana.ini
      volumes:
      - name: config
        configMap:
          name: grafana-config
  volumeClaimTemplates:
  - metadata:
      name: data
    spec:
      accessModes: [ "ReadWriteOnce" ]
      resources:
        requests:
          storage: 10Gi
---
apiVersion: v1
kind: Service
metadata:
  name: grafana
spec:
  selector:
    app: grafana
  type: ClusterIP
  ports:
  - name: http
    protocol: TCP
    port: 3000

---
apiVersion: v1
kind: Service
metadata:
  name: grafana-headless
spec:
  selector:
    app: grafana
  type: ClusterIP
  clusterIP: None
  ports:
  - name: http
    protocol: TCP
    port: 3000

---
kind: ConfigMap
apiVersion: v1
metadata:
  name: grafana-config
data:
  grafana.ini: |
    [smtp]
    enabled = true
    host = xxxxxxxxxx
    user = xxxxxxxxxx
    password = xxxxxxxxxxx
    skip_verify = false
    from_address = xxxxxxxxxxx
    from_name = Grafana

展示

dashboard文件:https://github.com/pryorda/vmware_exporter/tree/main/dashboards

集群监控

image.png

节点监控

image.png

虚拟机监控
image.png
聚合监控
image.png

告警

image.png

告警规则

通过具体指标和表达式定义规则,内存定义示例:image.pngimage.png

告警方式

image.png

告警策略

image.png

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

小森饭

你的鼓励是我最大的创作动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值