我正在使用Slurm来管理一些计算,但是有时作业会因内存不足错误而被杀死,即使事实并非如此。这个奇怪的问题特别是使用多处理的python作业。
这是重现此行为的最小示例
#!/usr/bin/python
from time import sleep
nmem = int(3e7) # this will amount to ~1GB of numbers
nprocs = 200 # will create this many workers later
nsleep = 5 # sleep seconds
array = list(range(nmem)) # allocate some memory
print("done allocating memory")
sleep(nsleep)
print("continuing with multiple processes (" + str(nprocs) + ")")
from multiprocessing import Pool
def f(i):
sleep(nsleep)
# this will create a pool of workers, each of which "seem" to use 1GB
# even though the individual processes don't actually allocate any memory
p = Pool(nprocs)
p.map(f,list(range(nprocs)))
print("finished successfully")
即使这可能在本地运行良