大内存使用多进程的踩坑记录

最新推荐文章于 2023-10-23 11:48:46 发布

zzllg

最新推荐文章于 2023-10-23 11:48:46 发布

阅读量666

点赞数 2

分类专栏：踩坑记录文章标签： linux 内存溢出 rbf 多进程

本文链接：https://blog.csdn.net/weixin_39982211/article/details/103194675

版权

踩坑记录专栏收录该内容

3 篇文章 0 订阅

订阅专栏

服务器信息：

系统版本：Linux vm172-31-0-3.ksc.com 3.10.0-693.11.6.el7.x86_64

CPU核心数：28

内存大小：56 GB

硬盘大小：约1TB

脚本介绍：

将固定站点污染物信息插值到等经纬度网格（1400*1000）上

问题说明：

在该服务器上定时运行该插值脚本，每次运行需要1.5h左右，时效性不够。其中主要程序即RBF插值计算部分使用双进程并行处理，计划增加进程数目以充分利用计算资源加速运行程序，试验过程中遇到以下问题：

使用超过2个进程，程序无法正常运行，会出现进程消失现象
进程不返回任何错误值，不容易定位错误

解决方案尝试：

首先考虑是否代码结构问题：代码在一个class实例中开启若干进程，为测试是否为该问题，做了以下试验:

a.写了一段类似程序，一个实例内开启多个进程：运行发现并没有上述错误现象发生。

b.重构该代码，将开启进程的操作部分转移到实例外面：运行发现依旧存在上述问题。

判断该问题发生与代码结构无关。
然后考虑是否服务器计算资源问题导致无法同时处理四个该进程，做了一组对照试验：

a.一个脚本同时开启四个进程：发生错误，无法正常运行

b.两个脚本各开启两个进程：发生错误，无法正常运行

c.一个脚本开启另个进程：正常运行

怀疑是计算资源问题
验证是否计算资源问题，做了一组对照试验：

a.不改变格点分辨率，一个重构后脚本开启4个进程：无法正常运行

b.不改变格点分辨率，一个重构前脚本开启4个进程：无法正常运行

c.降低格点分辨率为之前的1/16，一个重构后脚本开启四个进程：正常运行

d.降低格点分辨率为之前的1/16，一个重构前脚本开启四个进程：正常运行

e.降低格点分辨率为之前1/4，一个重构后脚本开启四个进程：正常运行

f.降低格点分辨率为之前1/4，一个重构前脚本开启四个进程：正常运行

试验表明，降低网格分辨率可以使程序正常运行。
验证是否硬件资源不够，在运行过程中监测系统使用情况：

a.格点分辨率为之前1/16时：每个进程占用内存最多时刻大概只有2%，四个最多占用10%程序可以正常运行。
b.格点分辨率为之前1/4时：

每个进程占用内存大概有7%，四个至多占用30%左右。程序可以正常运行。
c.不降低格点分辨率时：

每个进程经常占用至接近30%，四个进程将导致内存不足。程序无法正常运行。
初步认为，是计算需要内存过多导致程序无法正常运行，后续考虑改变数据类型，以降低程序所申请内存空间大小。

监视内存使用情况：

使用memory_profiler监视每一步代码的内存占用。
首先先用pip安装memory_profiler
pip install memory_profiler
memory_profiler是利用python的装饰器工作的，所以在我们需要测试的函数上添加装饰器 @profile 。之后在运行代码时加上 -m memory_profiler 就可以了解函数每一步代码的内存占用了 .
根据之前的判断，已经定位到时 rbf自定义函数部分出现的问题，所以监测此函数：
在这里插入图片描述
发现监测结果与实际使用结果存在不同，实际使用了大概14GB的内存，但是并没有监测到该部分，猜想该内存监测程序不能监测中途申请，后面又释放的内存，考虑可能是在scipy的插值过程中申请。

查阅插值函数代码

代码（删减过）如下：

class Rbf(object):

    def _euclidean_norm(self, x1, x2):
        return np.sqrt(((x1 - x2)**2).sum(axis=0))

    def _h_linear(self, r):
        return r

    # Setup self._function and do smoke test on initial r
    def _init_function(self, r):
        if isinstance(self.function, str):
            self.function = self.function.lower()
            _mapped = {'inverse': 'inverse_multiquadric',
                       'inverse multiquadric': 'inverse_multiquadric',
                       'thin-plate': 'thin_plate'}
            if self.function in _mapped:
                self.function = _mapped[self.function]

            func_name = "_h_" + self.function
            if hasattr(self, func_name):
                self._function = getattr(self, func_name)
            else:
                functionlist = [x[3:] for x in dir(self) if x.startswith('_h_')]
                raise ValueError("function must be a callable or one of " +
                                     ", ".join(functionlist))
            self._function = getattr(self, "_h_"+self.function)
        elif callable(self.function):
            allow_one = False
            if hasattr(self.function, 'func_code') or \
                   hasattr(self.function, '__code__'):
                val = self.function
                allow_one = True
            elif hasattr(self.function, "im_func"):
                val = get_method_function(self.function)
            elif hasattr(self.function, "__call__"):
                val = get_method_function(self.function.__call__)
            else:
                raise ValueError("Cannot determine number of arguments to function")

            argcount = get_function_code(val).co_argcount
            if allow_one and argcount == 1:
                self._function = self.function
            elif argcount == 2:
                if sys.version_info[0] >= 3:
                    self._function = self.function.__get__(self, Rbf)
                else:
                    import new
                    self._function = new.instancemethod(self.function, self,
                                                        Rbf)
            else:
                raise ValueError("Function argument must take 1 or 2 arguments.")

        a0 = self._function(r)
        if a0.shape != r.shape:
            raise ValueError("Callable must take array and return array of the same shape")
        return a0

    def __init__(self, *args, **kwargs):
        self.xi = np.asarray([np.asarray(a, dtype=np.float_).flatten()
                           for a in args[:-1]])
        self.N = self.xi.shape[-1]
        self.di = np.asarray(args[-1]).flatten()

        if not all([x.size == self.di.size for x in self.xi]):
            raise ValueError("All arrays must be equal length.")

        self.norm = kwargs.pop('norm', self._euclidean_norm)
        self.epsilon = kwargs.pop('epsilon', None)
        if self.epsilon is None:
            # default epsilon is the "the average distance between nodes" based
            # on a bounding hypercube
            dim = self.xi.shape[0]
            ximax = np.amax(self.xi, axis=1)
            ximin = np.amin(self.xi, axis=1)
            edges = ximax-ximin
            edges = edges[np.nonzero(edges)]
            self.epsilon = np.power(np.prod(edges)/self.N, 1.0/edges.size)
        self.smooth = kwargs.pop('smooth', 0.0)

        self.function = kwargs.pop('function', 'multiquadric')

        # attach anything left in kwargs to self
        #  for use by any user-callable function or
        #  to save on the object returned.
        for item, value in kwargs.items():
            setattr(self, item, value)

        self.nodes = linalg.solve(self.A, self.di)

    @property
    def A(self):
        # this only exists for backwards compatibility: self.A was available
        # and, at least technically, public.
        r = self._call_norm(self.xi, self.xi)
        return self._init_function(r) - np.eye(self.N)*self.smooth

    def _call_norm(self, x1, x2):
        if len(x1.shape) == 1:
            x1 = x1[np.newaxis, :]
        if len(x2.shape) == 1:
            x2 = x2[np.newaxis, :]
        x1 = x1[..., :, np.newaxis]
        x2 = x2[..., np.newaxis, :]
        return self.norm(x1, x2)

    def __call__(self, *args):
        args = [np.asarray(x) for x in args]
        if not all([x.shape == y.shape for x in args for y in args]):
            raise ValueError("Array lengths must be equal")
        shp = args[0].shape
        xa = np.asarray([a.flatten() for a in args], dtype=np.float_)
        r = self._call_norm(xa, self.xi)
        return np.dot(self._function(r), self.nodes).reshape(shp)

发现在使用该类时，首先实例化此类 rbf，然后传入网格参数，即需要插成的网格（1400*1000），然后使用过程中会调用rbf中的**call函数，计算网格中每个格点与每个站点**的欧几里得距离:

def _euclidean_norm(self, x1, x2):
        return np.sqrt(((x1 - x2)**2).sum(axis=0))

这样的话，约有1400个站点，网格大小为1400x1000 ，存储方式为float64，每个数据需要占用8个字节，其需要申请的内存则为:
$(1400 * 1000 * 1400 * 8) / (1024 * 1024 * 1024) \approx 14.6 G B$
所以同时运行多个rbf插值函数时候，可能会导致内存溢出。

解决方案

根据插值方法的特性，采用分块处理，即先将1400x1000的网格分成4小块，分别对该四小块进行插值，最终合并到一起返回。这样可以保证每个进程最多只申请3GB左右的内存，即可以多开几个进程同时计算。测试结果如下：
a.不分块计算，双进程运行时内存使用情况与耗时：
在这里插入图片描述
两个进程各会占用25%-30%的内存空间，对8个时次进行插值运算，最终耗时728s。

b.分块进行计算，8进程运行时内存使用情况与耗时：
在这里插入图片描述
8个进程各最多占用7%-7.5%的内存空间，对8个时次进行插值运算，最终耗时298s。

结论

网格点多的情况下，rbf插值方法需要消耗巨大内存，内存不足时，进程就会消失。利用该插值方法的特性，将需要插值的区域切分，分开计算后合并到一起，这样可以开启多个进程运算。经过测试，优化后的方法可以在40min左右将数据处理好，较之以前有显著提升。

zzllg

关注

2
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
大内存使用多进程的踩坑记录

服务器信息：系统版本：Linux vm172-31-0-3.ksc.com 3.10.0-693.11.6.el7.x86_64 CPU核心数：28 内存大小：56 GB 硬盘大小：约1TB脚本介绍：将固定站点污染物信息插值到等经纬度网格（1400*1000）上问题说明：在该服务器上定时运行该插值脚本，每次运行需要1.5h左右，时效性不够。其中主要程序即RBF...
复制链接

扫一扫

专栏目录