管理员解决深度学习服务器GPU内存占用问题

版权声明:本文为博主原创文章,遵循 CC 4.0 by-sa 版权协议,转载请附上原文出处链接和本声明。
本文链接:https://blog.csdn.net/qq965194745/article/details/87940213

问题

在这里插入图片描述

解决方法

利用Linux自带的管道命令
nvidia-smi | grep python
获取到GPU中的进程号

然后通过
ps -lA | grep pid
找到进程的状态

若为S(sleep),并且超过容忍时间,则kill掉

测试python执行centos7系统命令的情况

import os
gpu_status = os.popen("nvidia-smi | grep python").readlines()
gpu_status
['|    0     27523      C   python                                       399MiB |\n',
 '|    0     31299      C   python                                     10371MiB |\n']
for status in gpu_status:
    id = status.split()[2]
    processes = os.popen("ps -lA | grep {}".format(id)).readlines()
    print("ps lA->",processes)
    for p in processes:
        p = p.split()
        print(p[3],id)
        if p[3] == id:
            print("find->",p)
            if p[1] == 'S':
                # 这里需要管理员来运行,状态码为0则执行成功
                print("kill",os.system("kill -9 {}".format(id)))
#                 os.system("kill -9 {}".format(id))
ps lA-> ['0 S  1002 27523 28715  0  80   0 - 4709685 ep_pol ?      00:02:31 python\n']
27523 27523
find-> ['0', 'S', '1002', '27523', '28715', '0', '80', '0', '-', '4709685', 'ep_pol', '?', '00:02:31', 'python']
kill 256
ps lA-> ['0 S  1002 31299 28715  0  80   0 - 7749808 ep_pol ?      00:07:22 python\n']
31299 31299
find-> ['0', 'S', '1002', '31299', '28715', '0', '80', '0', '-', '7749808', 'ep_pol', '?', '00:07:22', 'python']
kill 256

测试python3的写法

实际使用中,因为管理员自带的是python2,所以还需要改成python2的代码

import os
import time
record = [0]*100000 # 用于记录进程睡眠次数 pid的最大值默认是32767
while True:
    time.sleep(1)
    gpu_status = os.popen("nvidia-smi | grep python").readlines()
    for status in gpu_status:
        id = status.split()[2]
        processes = os.popen("ps -lA | grep {}".format(id)).readlines()
        for p in processes:
            p = p.split()
            if p[3] == id:
                if p[1] == 'S':
                    record[int(id)] += 1
                    if record[int(id)] >= 60: 
                        '''若进程超过60秒出于sleep状态将被kill掉'''
#                         print("s",id,os.system("kill -9 {}".format(id)))
                        os.system("kill -9 {}".format(id))
                else:
                    record[int(id)] = 0

实际部署到服务器的代码

这里用python2写了一个死循环,可以配合Linux的screen命令或者是nohup &命令来执行这个程序

# -*- coding:utf-8 -*-
import os
import time
record = [0]*100000 # 用于记录进程睡眠次数
while True:
    time.sleep(1)
    gpu_status = os.popen("nvidia-smi | grep python").readlines()
    for status in gpu_status:
        id = status.split()[2]
        processes = os.popen("ps -lA | grep {}".format(id)).readlines()
        for p in processes:
            p = p.split()
            if p[3] == id:
                '''
                根据ps -lA命令的位置,得找到对应的PID位置
                '''
                if p[1] == 'S':
                    record[int(id)] += 1
                    print id,"已经睡眠了",record[int(id)],"次" 
                    if record[int(id)] >= 300 : 
                        '''若进程超过300秒出于sleep状态将被kill掉'''
                        print "kill->",id,os.system("kill -9 {}".format(id)) 
                else:
                    '''
                    好好跑程序就清空惩罚
                    '''
                    print id,"正在GPU中运行"
                    record[int(id)] = 0

测试效果

在这里插入图片描述

可以看到,如果连续睡眠超过300秒,将被kill掉

在这里插入图片描述在这里插入图片描述

有用的参考

https://blog.csdn.net/kwsy2008/article/details/50906935
https://www.jb51.net/article/103092.htm

展开阅读全文

[急][高分]服务器内存大量占用问题

10-30

系统描述:rn---------------------------------------------rn服务器配置:rnP41.7 CUP x 4rn1G 内存rn100G 硬盘rnrn系统配置:rnLinux 7.2 + 双Apache + PHP + MySQLrnrnApache[1]管理新闻发布系统rnApache[2]管理新闻web页面rnrnMySQL 存放新闻管理系统配置以及新闻内容rnrnrnrn服务器流量: 15万/日rnrnrn使用范围:rn1、PHP新闻管理系统rn2、新闻静态页存放rn3、新闻流量统计rnrn全部新闻都以静态页面方式对外发布,仅剩余点击统计查询数据库rnrnrnrn现象表述:rn---------------------------------------------rn1、内存大量占用得不到释放,最少时1G 内存仅剩余不足百兆可用[Apache占用高达80%以上]rn2、目前新闻管理系统速度奇慢,但静态新闻页面不受影响rnrnrnrn需要解决:rn---------------------------------------------rn1、需要找出新闻管理系统速度变慢的原因和解决方案rn2、需要找到占用大量内存的原因rnrnrnrn个人分析:rn---------------------------------------------rn1、首先怀疑是否MySQL数据库因数据量的问题导致管理系统速度奇慢。rn2、怀疑由于流量不断增加导致频繁查询数据库进行点击统计造成内存占用rnrnrn个人考虑测试方案:rn---------------------------------------------rn采用排除法,依次关闭Apache,区分两个Apache系统究竟是哪一个占用了大量的内存rnrnrnrn呵呵,我能描述的和想到的就这么多了rnrn请各位依据自己的经验帮我分析判断好么?rnrn谢谢了! 论坛

内存占用问题

07-11

[code=C#]rn rn private void Runsoft()rn rn string[] ini = this.GetIni(); //得到要检测要启动软件的配置rn while (true)rn rn string processlist = GetCurProccesslist();rn for (int i = 0; i < ini.Length; i++)rn rn string initest = ini[i];rn string filename = GetFileName(initest).ToLower().Replace(".exe","") ;rn if (processlist.IndexOf(filename) == -1) //判断是否启动rn rn StratPorcess(initest); //启动软件rn rn rn Thread.Sleep(60000);rn rn rnrn /// rn /// 启动一个程序rn /// rn /// rn private void StratPorcess(string filepath)rn rn tryrn rn //Process.Start(filepath);rn ProcessStartInfo info = new ProcessStartInfo();rn info.FileName = filepath;rn //info.CreateNoWindow = true;rn info.UseShellExecute = true;rn Process.Start(info);rn //info.rn rn catch rn rnrn /// rn /// 根据路径得到文件名rn /// rn /// rn /// rn private string GetFileName(string path)rn rn FileInfo ff = new FileInfo(path); rn return ff.Name;rn rnrn /// rn /// 得到进程列表。用 | 线分开rn /// rn /// rn private string GetCurProccesslist()rn rn Process[] process = Process.GetProcesses();rn string psoftname = "";rn foreach (Process p in process)rn rn psoftname += "|" + p.ProcessName.ToLower();rn rn return psoftname;rn rnrn /// rn /// 读取配置文件rn /// rn /// rn private string[] GetIni()rn rn string inipath = @"C:\a.txt";rn StreamReader sr = new StreamReader(inipath);rn string inistr = sr.ReadToEnd();rn sr.Close();rn sr.Dispose();rnrn string[] spstr = "\r\n";rn return inistr.Split(spstr,StringSplitOptions.RemoveEmptyEntries);rn rn[/code]rnrn我做了一个系统服务,就是阁段时间去检测我的另外几个软件是否已经运行,如果没有则运行它,rn但是我发现,我的服务随着时间的增加内存占有量会越来越达,rn还有一个问题是,我用Process.Start(info);去启动外部程序,进程里面有进程了,但是我在桌面上看不到那个软件...rnrn 麻烦各位大虾指点哈 论坛

jquery 内存占用问题

10-27

现在尝试着在用jquery做一个异步加载的菜单,采用的JQUERY中的.append()方法动态生成数据,发现cpu的占用率很高,不知道是不是我的代码哪里存在着问题。请各位高手帮忙看一下。rn下面是html代码:rn[code=Java] rn rn all sportsrn rn rn[/code]rnrnrn下面是js代码:rn[code=JScript]function showChildNode(menuId, menuLevel, sportsID, leagueID, eventID,className) rn var ul = $("#"+menuId).siblings("ul"); rn if(ul.is(":visible"))rn ul.parent("li").siblings("li").show(); rn ul.hide();rn $("#"+menuId).removeClass(className);rn elsern var action;rn if (menuLevel == 1) rn action = "getAllSportCategory.action"; rn loadSportsCategoryData(action, "", $("#js_data"));rn else if (menuLevel == 2) rn action = "getLeagueBySportCategory.action";rn loadLeagueData(action, sportsID, $("#js_data"));rn else if (menuLevel == 3) rn action = "getEventListByLeague.action";rn loadEventData(action, sportsID, leagueID, $("#js_data"));rn rn if(leagueID != "0")rn if(menuLevel < 2)rn $("#"+menuId).parent("li").parent("ul").prev().hide();rn rn rn ul.parent("li").siblings("li").hide(); rn ul.show();rn $("#"+menuId).addClass(className); rn rnrnrnfunction loadSportsCategoryData(action,param,obj)rn $.ajax(rn url:action,rn type:"post",rn data:searchSportsID:param,rn dataType:"json",rn success:function(data)rn $("#gameCategory").find("#sportsCategoryLevel").remove();rn var html = "";rn if(data.length > 0 )rn for (var i=0; i"+data[i].sportsName+" ";rn rn $("#gameCategory").append(html);rn html = "";rn rn ,error:function(data)rn alert("系统有误!");rn rn ); rnrnrnfunction loadLeagueData(action, sportsID, obj) rn $.ajax(rn url:action,rn type:"post",rn data:searchSportsID:sportsID,rn dataType:"json",rn success:function(data)rn var id = "#subSportsCategory" + sportsID;rn $(id).find("#leagueLevel").remove();rn var html = "";rn if(data.length > 0 )rn for (var i=0; i"+data[i].leagueName+" ";rn rn $(id).append(html);rn html = "";rn rn ,error:function(data)rn alert("系统有误!");rn ulLen = 0; rn rn );rnrnrnfunction loadEventData(action, sportsID, leagueID, obj)rn $.ajax(rn url:action,rn type:"post",rn data:searchSportsID:sportsID, searchLeagueID:leagueID,rn dataType:"json",rn success:function(data)rn var id = "#subLeague" + leagueID;rn $(id).find("#eventLevel").remove();rn var html = "";rn if(data.length > 0 )rn for (var i=0; i"+data[i].homeTeamName+" V"+data[i].awayTeamName+" ";rn rn $(id).append(html);rn html = "";rn rn ,error:function(data)rn alert("系统有误!");rn rn );rn[/code] 论坛

没有更多推荐了,返回首页