▶ Export Metrics
1、Prerequisites
NVIDIA Tesla drivers = R384+ (download from NVIDIA Driver Downloads page)
nvidia-docker version > 2.0 (see how to install and it's prerequisites)
Optionally configure docker to set your default runtime to nvidia
NVIDIA device plugin for Kubernetes (see how to install)
2、Create PVC
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: prometheus-gpu-pvc
namespace: kube-system
spec:
accessModes:
- ReadWriteMany
volumeMode: Filesystem
resources:
requests:
storage: 10Gi
3、Run DaementSet, Run Pod On GPU Node
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: prometheus-gpu
namespace: kube-system
spec:
revisionHistoryLimit: 3
selector:
matchLabels:
k8s-app: prometheus-gpu
template:
metadata:
labels:
k8s-app: prometheus-gpu
spec:
nodeSelector:
kubernetes.io/hostname: gpu
volumes:
- name: prometheus
persistentVolumeClaim:
claimName: prometheus-gpu-pvc
- name: proc
hostPath:
path: /proc
- name: sys
hostPath:
path: /sys
serviceAccountName: admin-user
containers:
- name: dcgm-exporter
image: "nvidia/dcgm-exporter"
volumeMounts:
- name: prometheus
mountPath: /run/prometheus/
imagePullPolicy: Always
securityContext:
runAsNonRoot: false
runAsUser: 0
env:
- name: DEPLOY_TIME
value: { { ansible_date_time.iso8601 }}
- name: node-exporter
image: "q