背景
产线有一套用RKE搭建的K8S集群,由于业务需要,需要通过GPU来运行一些业务,所以需要集群中添加GPU节点
现有环境
RKE: Running RKE version: v1.1.2
Kubernetes: 1.17
- Master节点: 3个
- Worker节点: 6个(全部为CPU节点)
新节点信息
操作系统: CentOS 7.6
GPU 卡数: 1张 (已安装驱动版本: 440.95.01)
IP: 10.5.0.112
步骤
1.初始化配置新节点
a.安装docker
#这个步骤是安装docker 19.03版本
curl https://releases.rancher.com/install-docker/19.03.sh | sh
b.安装nvidia-docker2
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.repo | sudo tee /etc/yum.repos.d/nvidia-docker.repo
yum install -y nvidia-docker2
c. 配置docker的默认运行时
vi /etc/docker/daemon.json
文件内容
{
"registry-mirrors": [
"https://dockerhub.azk8s.cn",
"https://docker.mirrors.ustc.edu.cn",
"http://hub-mirror.c.163.com"
],
"max-concurrent-downloads": 10,
"log-driver": "json-file",
"log-level": "warn",
"log-opts": {
"max-size": "10m",
"max-file": "3"
},
"data-root": "/data/docker",
"group": "docker",
"default-runtime": "nvidia",
"runtimes": {
"nvidia": {
"path": "nvidia-container-runtime",
"runtimeArgs": []
}
}
}
d. 添加rke使用的账号,并配置rke控制节点到新节点账号的免密( 我的rke控制节点IP为: 10.4.0.57 ,执行 ssh-copy-id docker@10.5.0.112 就可以完成免密配置了)
useradd docker -g docker
2. 将新的节点加入到rke搭建的集群配置中
rke --debug up --update-only --config rancher_v2.yaml
rke配置文件内容
ssh_key_path: ~/.ssh/id_rsa
nodes:
- address: 10.159.1.247
internal_address: 10.4.0.37
user: docker
role: [controlplane, etcd]
- address: 10.159.1.67
internal_address: 10.4.0.24
user: docker
role: [controlplane, etcd]
- address: 10.159.1.242
internal_address: 10.4.0.38
user: docker
role: [controlplane, etcd]
- address: 10.4.0.63
internal_address: 10.4.0.63
user: docker
role: [worker]
- address: 10.4.0.18
internal_address: 10.4.0.18
user: docker
role: [worker]
- address: 10.4.0.43
internal_address: 10.4.0.43
user: docker
role: [worker]
- address: 10.4.0.80
internal_address: 10.4.0.80
user: docker
role: [worker]
- address: 10.4.0.26
internal_address: 10.4.0.26
user: docker
role: [worker]
- address: 10.4.0.111
internal_address: 10.4.0.111
user: docker
role: [worker]
- address: 10.5.0.112 #新节点
internal_address: 10.5.0.112
user: docker
role: [worker]
services:
etcd:
snapshot: true
creation: 6h
retention: 24
backup_config:
enabled: true
interval_hours: 12
retention: 6
kube-api:
service_node_port_range: 20000-60000
# kubelet:
# extra_binds:
# - "/data:/data:rshared"
# 禁用RKE默认的nginx-ingress,我喜欢使用traefik-ingress
ingress:
provider: none
# options:
# use-forwarded-headers: 'true'
network:
mtu: 1450
plugin: canal
options:
flannel_backend_type: "vxlan"
# 这个地址是一个SLB的地址用于做API Server的高可用
authentication:
sans:
- "10.159.1.163"
kubernetes_version: "v1.17.6-rancher2-1"
3. 更新rke集群,并等待新的节点处于ready状态
4. 安装nvidia-gpu-plugin,这个服务是将GPU节点的GPU信息汇报给K8S集群,以提供给调度器使用
kubectl apply -f nvidia-gpu-plugin.yaml
nvidia-gpu-plugin.yaml 内容
# Copyright (c) 2019, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: nvidia-device-plugin-daemonset
namespace: kube-system
spec:
selector:
matchLabels:
name: nvidia-device-plugin-ds
updateStrategy:
type: RollingUpdate
template:
metadata:
# This annotation is deprecated. Kept here for backward compatibility
# See https://kubernetes.io/docs/tasks/administer-cluster/guaranteed-scheduling-critical-addon-pods/
annotations:
scheduler.alpha.kubernetes.io/critical-pod: ""
labels:
name: nvidia-device-plugin-ds
spec:
tolerations:
# This toleration is deprecated. Kept here for backward compatibility
# See https://kubernetes.io/docs/tasks/administer-cluster/guaranteed-scheduling-critical-addon-pods/
- key: CriticalAddonsOnly
operator: Exists
- key: nvidia.com/gpu
operator: Exists
effect: NoSchedule
# Mark this pod as a critical add-on; when enabled, the critical add-on
# scheduler reserves resources for critical add-on pods so that they can
# be rescheduled after a failure.
# See https://kubernetes.io/docs/tasks/administer-cluster/guaranteed-scheduling-critical-addon-pods/
priorityClassName: "system-node-critical"
nodeSelector:
nvidia.com/gpu.present: 'true'
containers:
- image: nvcr.io/nvidia/k8s-device-plugin:v0.9.0
name: nvidia-device-plugin-ctr
args: ["--fail-on-init-error=false"]
securityContext:
allowPrivilegeEscalation: false
capabilities:
drop: ["ALL"]
volumeMounts:
- name: device-plugin
mountPath: /var/lib/kubelet/device-plugins
volumes:
- name: device-plugin
hostPath:
path: /var/lib/kubelet/device-plugins
5. 等nvidia-gpu-plugin容器启动完成以后,查看日志,看看GPU信息是不是被正确识别了,正常的输出如下
[root@rke-controller ]# kubectl get po -n kube-system |grep nvidia
nvidia-device-plugin-daemonset-7hrk5 1/1 Running 0 28m
[root@rke-controller ]# kubectl logs nvidia-device-plugin-daemonset-7hrk5 -n kube-system
2021/05/19 08:15:49 Loading NVML
2021/05/19 08:15:49 Starting FS watcher.
2021/05/19 08:15:49 Starting OS watcher.
2021/05/19 08:15:49 Retreiving plugins.
2021/05/19 08:15:49 Starting GRPC server for 'nvidia.com/gpu'
2021/05/19 08:15:49 Starting to serve 'nvidia.com/gpu' on /var/lib/kubelet/device-plugins/nvidia-gpu.sock
2021/05/19 08:15:49 Registered device plugin for 'nvidia.com/gpu' with Kubelet
6.创建一个pod来验证在K8S中运行容器
kubectl apply -f gpu_test.yaml
gpu_test.yaml内容
apiVersion: v1
kind: Pod
metadata:
name: dcgmproftester
spec:
restartPolicy: OnFailure
containers:
- name: dcgmproftester11
image: nvidia/samples:dcgmproftester-2.0.10-cuda11.0-ubuntu18.04
args: ["--no-dcgm-validation", "-t 1004", "-d 120"]
resources:
limits:
nvidia.com/gpu: 1
securityContext:
capabilities:
add: ["SYS_ADMIN"]
7.Pod运行成功,并正确打印信息
[root@rke-controller]# kubectl get po
NAME READY STATUS RESTARTS AGE
dcgmproftester 1/1 Running 0 73s
details-v1-5974b67c8-rgbgb 2/2 Running 0 132d
[root@rke-controller]# kubectl logs dcgmproftester -f
Skipping CreateDcgmGroups() since DCGM validation is disabled
CU_DEVICE_ATTRIBUTE_MAX_THREADS_PER_MULTIPROCESSOR: 1024
CU_DEVICE_ATTRIBUTE_MULTIPROCESSOR_COUNT: 40
CU_DEVICE_ATTRIBUTE_MAX_SHARED_MEMORY_PER_MULTIPROCESSOR: 65536
CU_DEVICE_ATTRIBUTE_COMPUTE_CAPABILITY_MAJOR: 7
CU_DEVICE_ATTRIBUTE_COMPUTE_CAPABILITY_MINOR: 5
CU_DEVICE_ATTRIBUTE_GLOBAL_MEMORY_BUS_WIDTH: 256
CU_DEVICE_ATTRIBUTE_MEMORY_CLOCK_RATE: 5001000
Max Memory bandwidth: 320064000000 bytes (320.06 GiB)
CudaInit completed successfully.
Skipping WatchFields() since DCGM validation is disabled
TensorEngineActive: generated ???, dcgm 0.000 (27804.3 gflops)
TensorEngineActive: generated ???, dcgm 0.000 (28499.7 gflops)
TensorEngineActive: generated ???, dcgm 0.000 (28529.5 gflops)
TensorEngineActive: generated ???, dcgm 0.000 (28576.3 gflops)
TensorEngineActive: generated ???, dcgm 0.000 (28385.3 gflops)
TensorEngineActive: generated ???, dcgm 0.000 (28379.4 gflops)
TensorEngineActive: generated ???, dcgm 0.000 (28755.1 gflops)
TensorEngineActive: generated ???, dcgm 0.000 (29019.5 gflops)
TensorEngineActive: generated ???, dcgm 0.000 (28880.3 gflops)
TensorEngineActive: generated ???, dcgm 0.000 (28932.4 gflops)
TensorEngineActive: generated ???, dcgm 0.000 (28704.2 gflops)
TensorEngineActive: generated ???, dcgm 0.000 (28844.0 gflops)