python内存分配失败,OSError:[Errno 12]使用python多处理池时无法分配内存

I am trying to apply a function to 5 cross validation sets in parallel using Python's multiprocessing and repeat that for different parameter values, like so:

import pandas as pd

import numpy as np

import multiprocessing as mp

from sklearn.model_selection import StratifiedKFold

#simulated datasets

X = pd.DataFrame(np.random.randint(2, size=(3348,868), dtype='int8'))

y = pd.Series(np.random.randint(2, size=3348, dtype='int64'))

#dummy function to apply

def _work(args):

del(args)

for C in np.arange(0.0,2.0e-3,1.0e-6):

splitter = StratifiedKFold(n_splits=5)

with mp.Pool(processes=5) as pool:

pool_results = \

pool.map(

func=_work,

iterable=((C,X.iloc[train_index],X.iloc[test_index]) for train_index, test_index in splitter.split(X, y))

)

However halfway through execution I get the following error:

Traceback (most recent call last):

File "mre.py", line 19, in

with mp.Pool(processes=5) as pool:

File "/usr/lib/python3.5/multiprocessing/context.py", line 118, in Pool

context=self.get_context())

File "/usr/lib/python3.5/multiprocessing/pool.py", line 168, in __init__

self._repopulate_pool()

File "/usr/lib/python3.5/multiprocessing/pool.py", line 233, in _repopulate_pool

w.start()

File "/usr/lib/python3.5/multiprocessing/process.py", line 105, in start

self._popen = self._Popen(self)

File "/usr/lib/python3.5/multiprocessing/context.py", line 267, in _Popen

return Popen(process_obj)

File "/usr/lib/python3.5/multiprocessing/popen_fork.py", line 20, in __init__

self._launch(process_obj)

File "/usr/lib/python3.5/multiprocessing/popen_fork.py", line 67, in _launch

self.pid = os.fork()

OSError: [Errno 12] Cannot allocate memory

I'm running this on Ubuntu 16.04 with 32Gb of memory, and checking htop during execution it never goes over 18.5Gb, so I don't think I'm running out of memory.

It is definitly due to the splitting of my dataframes with the indexes from splitter.split(X,y) since when I directly pass my dataframes to the Pool object no error is thrown.

I saw this answer that says it might be due to too many file dependencies being created, but I have no idea how I might go about fixing that, and isn't the context manager supposed to help avoid this sort of problem?

解决方案

os.fork() makes a copy of a process, so if you're sitting at about 18 GB of usage, and want to call fork, you need another 18 GB. Twice 18 is 36 GB, which is well over 32 GB. While this analysis is (intentionally) naive—some things don't get copied on fork—it's probably sufficient to explain the problem.

The solution is either to make the pools earlier, when less memory needs to be copied, or to work harder at sharing the largest objects. Or, of course, add more memory (perhaps just virtual memory, i.e., swap space) to the system.

  • 0
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值