在服务器上面跑代码时,怕代码跑完了,云平台服务器还一直开着收费,特别是晚上离开实验室回去睡觉
这时候就需要有一个脚本实时监控云端服务器代码是否还在一直跑,如果中途退出训练,那么让服务器自动关机。
以下为python代码实现:
import subprocess import re import time import smtplib from email.mime.multipart import MIMEMultipart from email.mime.text import MIMEText import os import pycuda.driver as cuda import pycuda.autoinit # 连续低占用率计数器 low_utilization_count = 0 def get_gpu_utilization(): try: output = subprocess.check_output(['nvidia-smi', '--query-gpu=utilization.gpu', '--format=csv,noheader,nounits']) gpu_utilization = [int(x) for x in output.decode('utf-8').strip().split('\n')] return gpu_utilization except subprocess.CalledProcessError: return None def get_gpu_info(): num_devices = cuda.Device.count() gpu_info_list = [] for i in range(num_devices): device = cuda.Device(i) gpu_info = { "Device ID": i, "Name": device.name(), "Total Memory": device.total_memo