关闭显卡的僵尸进程

1、问题描述

gpu没有运行进程,但是显存一直占用,2号显卡存在僵尸进程,占用6679G显存

[root@node-01 ~]# nvidia-smi 
Wed Apr 12 16:41:08 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.65.01    Driver Version: 515.65.01    CUDA Version: 11.7     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla T4            Off  | 00000000:5E:00.0 Off |                    0 |
| N/A   50C    P0    26W /  70W |   3901MiB / 15360MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  Tesla T4            Off  | 00000000:5F:00.0 Off |                    0 |
| N/A   38C    P8    14W /  70W |      4MiB / 15360MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   2  Tesla T4            Off  | 00000000:86:00.0 Off |                    0 |
| N/A   68C    P0    45W /  70W |   6679MiB / 15360MiB |    100%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   3  Tesla T4            Off  | 00000000:D8:00.0 Off |                    0 |
| N/A   41C    P8    15W /  70W |     60MiB / 15360MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

2、解决方案

2.1 安装查找进程包

[root@node-01 ~]# yum install -y psmisc

2.2 查找僵尸进程

[root@node-01 ~]# fuser -v /dev/nvidia*
                     USER        PID ACCESS COMMAND
/dev/nvidia0:        spark     72859 F...m python
                     root      188098 F.... kubelet
                     root      214247 F...m python
                     root      217073 F.... nvidia-device-p
/dev/nvidia1:        root      188098 F.... kubelet
                     root      214247 F...m python
                     root      217073 F.... nvidia-device-p
/dev/nvidia2:        root      188098 F.... kubelet
                     root      214247 F...m python
                     root      217073 F.... nvidia-device-p
/dev/nvidia3:        root      188098 F.... kubelet
                     root      214247 F...m python
                     root      217073 F.... nvidia-device-p
/dev/nvidiactl:      spark     72859 F...m python
                     root      188098 F.... kubelet
                     root      214247 F...m python
                     root      217073 F.... nvidia-device-p
/dev/nvidia-uvm:     spark     72859 F...m python
                     root      214247 F...m python

2号显卡存在僵尸进程214247

2.3 杀掉僵尸进程

[root@node-01 ~]# kill -7 214247

2.4 确认显存

[root@node-01 ~]# nvidia-smi 
Wed Apr 12 16:50:17 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.65.01    Driver Version: 515.65.01    CUDA Version: 11.7     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla T4            Off  | 00000000:5E:00.0 Off |                    0 |
| N/A   50C    P0    26W /  70W |   3901MiB / 15360MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  Tesla T4            Off  | 00000000:5F:00.0 Off |                    0 |
| N/A   38C    P8    15W /  70W |      2MiB / 15360MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   2  Tesla T4            Off  | 00000000:86:00.0 Off |                    0 |
| N/A   59C    P8    17W /  70W |      2MiB / 15360MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   3  Tesla T4            Off  | 00000000:D8:00.0 Off |                    0 |
| N/A   41C    P8    15W /  70W |      2MiB / 15360MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

参考资料:
http://www.taodudu.cc/news/show-4248859.html

  • 1
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值