python 降低cpu_在处理大型的numpy数组时,Python随机降低到0%的CPU使用率,导致代码“挂断”?...

博主在运行代码时遇到一个问题,即从大型二进制文件加载1D numpy数组并使用numpy.where()修改数组后,有时会遇到长时间的运行延迟。CPU使用率会突然下降到0%,并且这个问题似乎是随机出现的。经过调查,发现延迟并非由Python或numpy引起,而是由于从共享磁盘读取数据时的网络I/O成为瓶颈。解决办法是优化大数组的内存读取,以减少对网络I/O的依赖。
摘要由CSDN通过智能技术生成

bd96500e110b49cbb3cd949968f18be7.png

I have been running some code, a part of which loads in a large 1D numpy array from a binary file, and then alters the array using the numpy.where() method.

Here is an example of the operations performed in the code:

import numpy as np

num = 2048

threshold = 0.5

with open(file, 'rb') as f:

arr = np.fromfile(f, dtype=np.float32, count=num**3)

arr *= threshold

arr = np.where(arr >= 1.0, 1.0, arr)

vol_avg = np.sum(arr)/(num**3)

# both arr and vol_avg needed later

I have run this many times (on a free machine, i.e. no other inhibiting CPU or memory usage) with no issue. But recently I have noticed that sometimes the code hangs for an extended period of time, making the runtime an order of magnitude longer. On these occasions I have been monitoring %CPU and memory usage (using gnome system monitor), and found that python's CPU usage drops to 0%.

Using basic prints in between the above operations to debug, it seems to be arbitrary as to which operation causes the pausing (i.e. open(), np.fromfile(), np.where() have each separately caused a hang on a random run). It is as if I am being throttled randomly, because on other runs there are no hangs.

I have considered things like garbage collection or this question, but I cannot see any obvious relation to my problem (for example keystrokes have no effect).

Further notes: the binary file is 32GB, the machine (running Linux) has 256GB memory. I am running this code remotely, via an ssh session.

EDIT: This may be incidental, but I have noticed that there are no hang ups if I run the code after the machine has just been rebooted. It seems they begin to happen after a couple of runs, or at least other usage of the system.

解决方案

The drops in CPU usage were unrelated to python or numpy, but were indeed a result of reading from a shared disk, and network I/O was the real culprit. For such large arrays, reading into memory can be a major bottleneck.

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值