python 降低cpu_在处理大型的numpy数组时，Python随机降低到0％的CPU使用率，导致代码“挂断”？...-CSDN博客

本文链接：https://blog.csdn.net/weixin_33180221/article/details/112990867

博主在运行代码时遇到一个问题，即从大型二进制文件加载1D numpy数组并使用numpy.where()修改数组后，有时会遇到长时间的运行延迟。CPU使用率会突然下降到0%，并且这个问题似乎是随机出现的。经过调查，发现延迟并非由Python或numpy引起，而是由于从共享磁盘读取数据时的网络I/O成为瓶颈。解决办法是优化大数组的内存读取，以减少对网络I/O的依赖。

摘要由CSDN通过智能技术生成

I have been running some code, a part of which loads in a large 1D numpy array from a binary file, and then alters the array using the numpy.where() method.

Here is an example of the operations performed in the code:

import numpy as np

num = 2048

threshold = 0.5

with open(file, 'rb') as f:

arr = np.fromfile(f, dtype=np.float32, count=num**3)

arr *= threshold

arr = np.where(arr >= 1.0, 1.0, arr)

vol_avg = np.sum(arr)/(num**3)

# both arr and vol_avg needed later

I have run this many times (on a free machine, i.e. no other inhibiting CPU or memory usage) with no issue. But recently I have noticed that sometimes the code hangs for an extended period of time, making the runtime an order of magnitude longer. On these occasions I have been monitoring %CPU and memory usage (using gnome system monitor), and found that python's CPU usage drops to 0%.

Using basic prints in between the above operations to debug, it seems to be arbitrary as to which operation causes the pausing (i.e. open(), np.fromfile(), np.where() have each separately caused a hang on a random run). It is as if I am being throttled randomly, because on other runs there are no hangs.

I have considered things like garbage collection or this question, but I cannot see any obvious relation to my problem (for example keystrokes have no effect).

Further notes: the binary file is 32GB, the machine (running Linux) has 256GB memory. I am running this code remotely, via an ssh session.

EDIT: This may be incidental, but I have noticed that there are no hang ups if I run the code after the machine has just been rebooted. It seems they begin to happen after a couple of runs, or at least other usage of the system.

解决方案

The drops in CPU usage were unrelated to python or numpy, but were indeed a result of reading from a shared disk, and network I/O was the real culprit. For such large arrays, reading into memory can be a major bottleneck.