python执行循环内存变大_python – 为什么我的循环在每次迭代时需要更多内存？...

最新推荐文章于 2024-03-23 08:34:56 发布

weixin_39761195

最新推荐文章于 2024-03-23 08:34:56 发布

阅读量732

点赞数

文章标签： python执行循环内存变大

虽然脚本确实使用了相当多的内存,即使使用“较小”的示例值,答案也是如此

Does Python clone my entire environment each time the parallel

processes are run, including the variables not required by f()? How

can I prevent this behaviour?

它是以某种方式克隆环境与forking一个新进程,但如果copy-on-write语义可用,在写入之前不需要复制实际的物理内存.例如在这个系统上

% uname -a

Linux mypc 4.2.0-27-generic #32-Ubuntu SMP Fri Jan 22 04:49:08 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

COW似乎可以使用,但在其他系统上可能并非如此.在Windows上,这是完全不同的,因为从.exe而不是forking执行新的Python解释器.既然你提到使用htop,你就会使用UNIX或UNIX之类的系统,并获得COW语义.

For each iteration of the for loop the processes in simulation() each

have a memory usage equal to the total memory used by my code.

生成的进程将显示几乎相同的值RSS,但这可能会产生误导,因为如果不发生写操作,它们大多会占用映射到多个进程的相同实际物理内存.在Pool.map中,故事有点复杂,因为它“将可迭代的内容划分为多个块,并将其作为单独的任务提交给进程池”.此提交在IPC进行,提交的数据将被复制.在您的示例中,IPC和2 ** 20函数调用也支配CPU使用率.在模拟中用单个矢量化乘法替换映射使得该机器上的脚本运行时间从大约150s到0.66s.

我们可以通过(某种程度上)简化的示例来观察COW,该示例分配一个大型数组并将其传递给生成的进程以进行只读处理：

import numpy as np

from multiprocessing import Process, Condition, Event

from time import sleep

import psutil

def read_arr(arr, done, stop):

with done:

S = np.sum(arr)

print(S)

done.notify()

while not stop.is_set():

sleep(1)

def main():

# Create a large array

print('Available before A (MiB):', psutil.virtual_memory().available / 1024 ** 2)

input("Press Enter...")

A = np.random.random(2**28)

print('Available before Process (MiB):', psutil.virtual_memory().available / 1024 ** 2)

input("Press Enter...")

done = Condition()

stop = Event()

p = Process(target=read_arr, args=(A, done, stop))

with done:

p.start()

done.wait()

print('Available with Process (MiB):', psutil.virtual_memory().available / 1024 ** 2)

input("Press Enter...")

stop.set()

p.join()

if __name__ == '__main__':

main()

本机输出：

% python3 test.py

Available before A (MiB): 7779.25

Press Enter...

Available before Process (MiB): 5726.125

Press Enter...

134221579.355

Available with Process (MiB): 5720.79296875

Press Enter...

现在,如果我们用一个修改数组的函数替换函数read_arr：

def mutate_arr(arr, done, stop):

with done:

arr[::4096] = 1

S = np.sum(arr)

print(S)

done.notify()

while not stop.is_set():

sleep(1)

结果完全不同：

Available before A (MiB): 7626.12109375

Press Enter...

Available before Process (MiB): 5571.82421875

Press Enter...

134247509.654

Available with Process (MiB): 3518.453125

Press Enter...

for循环在每次迭代后确实需要更多内存,但这显而易见：它从映射中堆叠total_results,因此它必须为新数组分配空间以保存旧结果和新结果以及释放现在未使用的数组旧的结果.

weixin_39761195

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。