我正在尝试在大量数据(功能和目标,(约75k x 130k))上训练MultinomialNB分类器。 我知道一个事实,该分类器将为每个目标生成一个不同的分类器,因此内存预计会爆炸。
但是,即使计算机大约有640GB,该进程分配的RAM也不会超过20GB。
我试图设置内存锁定,试图以root身份运行(必须调整这些限制),但是它不起作用。
Traceback (most recent call last):
File "test_classifiers.py", line 202, in
train_mb()
File "test_classifiers.py", line 168, in train_mb
mb_classifier.partial_fit(X, y, list(set(y)))
File "/usr/local/lib/python3.5/dist-packages/sklearn/naive_bayes.py", line 539, in partial_fit
Y = label_binarize(y, classes=self.classes_)
File "/usr/local/lib/python3.5/dist-packages/sklearn/preprocessing/label.py", line 657, in label_binarize
Y = Y.toarray()
File "/usr/local/lib/python3.5/dist-packages/scipy/sparse/compressed.py", line 1024, in toarray
out = self._process_toarray_args(order, out)
File "/usr/local/lib/python3.5/dist-packages/scipy/sparse/base.py", line 1186, in _process_toarray_args
return np.zeros(self.shape, dtype=self.dtype, order=order)
MemoryError
resource.setrlimit(resource.RLIMIT_MEMLOCK, (-1, -1))
和
resource.setrlimit(resource.RLIMIT_MEMLOCK, (resource.RLIM_INFINITY, resource.RLIM_INFINITY))
已经尝试过,有什么想法吗? 它是否与使用该分类器只能使用一个CPU的事实相关?