Slurm查看作业CPU和MEM
Slurm中使用 squeue 和 scontrol show job 命令查询作业的时候是看不到作业的CPU和MEM使用情况的,此时我们可以使用sstat和sacct来查看作业的CPU和MEM使用情况。
配置
要能查看CPU和MEM,首先我们需要修改/etc/slurm/slurm.conf文件中的下面两项
SelectType=select/cons_res
SelectTypeParameters=CR_CPU_Memory
另外需要打开Slurm的AccountingStorage功能,这里以使用文件存储为例,可以参考 …
# ACCOUNTING
AccountingStorageEnforce=1
AccountingStorageLoc=/opt/slurm/acct
AccountingStorageType=accounting_storage/filetxt
JobCompLoc=/opt/slurm/jobcomp
JobCompType=jobcomp/filetxt
JobAcctGatherFrequency=30
JobAcctGatherType=jobacct_gather/linux
然后重启slurm集群。
sstat 查看RUNNING作业
$ sstat -a --format="JobId,Pids,AveCPU,AveRSS,MaxRSS" 132
JobID Pids AveCPU AveRSS MaxRSS
------------ -------------------- ---------- ---------- ----------
132.0 12078,12079 00:00.000 576K 576K
具体有哪些信息可以查看,可以使用下面命令来查看
$ sstat -e
AveCPU AveCPUFreq AveDiskRead AveDiskWrite
AvePages AveRSS AveVMSize ConsumedEnergy
ConsumedEnergyRaw JobID MaxDiskRead MaxDiskReadNode
MaxDiskReadTask MaxDiskWrite MaxDiskWriteNode MaxDiskWriteTask
MaxPages MaxPagesNode MaxPagesTask MaxRSS
MaxRSSNode MaxRSSTask MaxVMSize MaxVMSizeNode
MaxVMSizeTask MinCPU MinCPUNode MinCPUTask
Nodelist NTasks Pids ReqCPUFreq
ReqCPUFreqMin ReqCPUFreqMax ReqCPUFreqGov
sacct 查看FINISHED作业
$ sacct --format="JobId,Elapsed,CPUTime,CPUTimeRAW,AveCPU,TotalCPU,UserCPU,SystemCPU,AveRSS,MaxRSS" -j 131
JobID Elapsed CPUTime CPUTimeRAW AveCPU TotalCPU UserCPU SystemCPU AveRSS MaxRSS
------------ ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ----------
131 00:02:03 00:00:00 0 00:00:00 00:00.001 00:00.001 00:00:00 576K 576K
具体有哪些信息可以查看,可以使用下面命令来查看
$ sacct -e
AllocCPUS AllocGRES AllocNodes AllocTRES
Account AssocID AveCPU AveCPUFreq
AveDiskRead AveDiskWrite AvePages AveRSS
AveVMSize BlockID Cluster Comment
ConsumedEnergy ConsumedEnergyRaw CPUTime CPUTimeRAW
DerivedExitCode Elapsed Eligible End
ExitCode GID Group JobID
JobIDRaw JobName Layout MaxDiskRead
MaxDiskReadNode MaxDiskReadTask MaxDiskWrite MaxDiskWriteNode
MaxDiskWriteTask MaxPages MaxPagesNode MaxPagesTask
MaxRSS MaxRSSNode MaxRSSTask MaxVMSize
MaxVMSizeNode MaxVMSizeTask MinCPU MinCPUNode
MinCPUTask NCPUS NNodes NodeList
NTasks Priority Partition QOS
QOSRAW ReqCPUFreq ReqCPUFreqMin ReqCPUFreqMax
ReqCPUFreqGov ReqCPUS ReqGRES ReqMem
ReqNodes ReqTRES Reservation ReservationId
Reserved ResvCPU ResvCPURAW Start
State Submit Suspended SystemCPU
Timelimit TotalCPU UID User
UserCPU WCKey WCKeyID
转载请以链接形式标明本文链接
本文链接:http://blog.csdn.net/kongxx/article/details/52556943