k8s 中一容器 始终占用显卡不释放,相关占用显卡进程已 kill
通过dmesg 查看 报 Unable to allocate memory on node -1 ,治标不治本的办法 重启对应的容器
通过搜索 要最终解决该问题, 当前系统内核 4.4.0-xxxx 该版本问题,导致k8s上出问题,
解决办法升级ubuntu 系统内核,该内核升级不要手动随意下载一高版本deb安装, 通过相关命令升级
anon:0KB active_anon:11652KB inactive_file:516KB active_file:180KB unevictable:0KB
[233318.319275] [ pid ] uid tgid total_vm rss nr_ptes nr_pmds swapents oom_score_adj name
[233318.319513] [45521] 0 45521 255 1 5 2 0 -998 pause
[233318.319520] [46984] 0 46984 359100 2503 82 6 0 -998 flanneld
[233318.319527] [ 6504] 0 6504 2550 191 10 3 0 -998 iptables
[233318.319569] [ 6631] 0 6631 302 6 4 3 0 -998 iptables
[233318.319583] Memory cgroup out of memory: Kill process 45521 (pause) score 0 or sacrifice child
[233318.321901] Killed process 45521 (pause) total-vm:1020kB, anon-rss:4kB, file-rss:0kB
[233335.103348] NVRM: RmInitAdapter failed! (0x26:0x65:1106)
[233335.103397] NVRM: rm_init_adapter failed for device bearing minor number 5
[233395.716627] SLUB: Unable to allocate memory on node -1 (gfp=0x2088020)
[233395.716635] cache: mnt_cache(14988:e196f5f19fcea94079334d52d6fbb730dc94693de78a9902be307037e5eb5a0c), object size: 384, buffer size: 384, default order: 2, min order: 0
[233395.716639] node 0: slabs: 18, objs: 756, free: 0
[233395.716641] node 1: slabs: 8, objs: 336, free: 0
[233429.318915] NVRM: RmInitAdapter failed! (0x26:0x65:1106)
[233429.318990] NVRM: rm_init_adapter failed for device bearing minor number 5