虽然脚本确实使用了相当多的内存,即使使用“较小”的示例值,答案也是如此
Does Python clone my entire environment each time the parallel
processes are run, including the variables not required by f()? How
can I prevent this behaviour?
它是以某种方式克隆环境与forking一个新进程,但如果copy-on-write语义可用,在写入之前不需要复制实际的物理内存.例如在这个系统上
% uname -a
Linux mypc 4.2.0-27-generic #32-Ubuntu SMP Fri Jan 22 04:49:08 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
COW似乎可以使用,但在其他系统上可能并非如此.在Windows上,这是完全不同的,因为从.exe而不是forking执行新的Python解释器.既然你提到使用htop,你就会使用UNIX或UNIX之类的系统,并获得COW语义.
For each iteration of the for loop the processes in simulation() each
have a memory usage equal to the total memory used by my code.
生成的进程将显示几乎相同的值RSS,但这可能会产生误导,因为如果不发生写操作,它们大多会占用映射到多个进程的相同实际物理内存.在Pool.map中,故事有点复杂,因为它“将可迭代的内容划分为多个块,并将其作为单独的任务提交给进程池”.此提交在IPC进行,提交的数据将被复制.在您的示例中,IPC和2 ** 20函数调用也支配CPU使用率.在模拟中用单个矢量化乘法替换映射使得该机器上的脚本运行时间从大约150s到0.66s.
我们可以通过(某种程度上)简化的示例来观察COW,该示例分配一个大型数组并将其传递给生成的进程以进行只读处理:
import numpy as np
from multiprocessing import Process, Condition, Event
from time import sleep
import psutil
def read_arr(arr, done, stop):
with done:
S = np.sum(arr)
print(S)
done.notify()
while not stop.is_set():
sleep(1)
def main():
# Create a large array
print('Available before A (MiB):', psutil.virtual_memory().available / 1024 ** 2)
input("Press Enter...")
A = np.random.random(2**28)
print('Available before Process (MiB):', psutil.virtual_memory().available / 1024 ** 2)
input("Press Enter...")
done = Condition()
stop = Event()
p = Process(target=read_arr, args=(A, done, stop))
with done:
p.start()
done.wait()
print('Available with Process (MiB):', psutil.virtual_memory().available / 1024 ** 2)
input("Press Enter...")
stop.set()
p.join()
if __name__ == '__main__':
main()
本机输出:
% python3 test.py
Available before A (MiB): 7779.25
Press Enter...
Available before Process (MiB): 5726.125
Press Enter...
134221579.355
Available with Process (MiB): 5720.79296875
Press Enter...
现在,如果我们用一个修改数组的函数替换函数read_arr:
def mutate_arr(arr, done, stop):
with done:
arr[::4096] = 1
S = np.sum(arr)
print(S)
done.notify()
while not stop.is_set():
sleep(1)
结果完全不同:
Available before A (MiB): 7626.12109375
Press Enter...
Available before Process (MiB): 5571.82421875
Press Enter...
134247509.654
Available with Process (MiB): 3518.453125
Press Enter...
for循环在每次迭代后确实需要更多内存,但这显而易见:它从映射中堆叠total_results,因此它必须为新数组分配空间以保存旧结果和新结果以及释放现在未使用的数组旧的结果.