背景
在用GBDT系列训练时,报错ValueError: X.dtype should be np.float32, got float64,如下所示。
ValueError Traceback (most recent call last)
<ipython-input-14-aa936862d7d7> in <module>()
----> 1 abc.apply(X_train)
~/tmp/dataset/Augboost+FM/AugBoost.py in apply(self, X)
461 for j in range(n_classes):
462 estimator = self.estimators_[i, j]
--> 463 leaves[:, i, j] = estimator.apply(np.concatenate([X_original, X], axis=1), check_input=False)
464
465 return leaves
~/anaconda3/lib/python3.7/site-packages/sklearn/tree/tree.py in apply(self, X, check_input)
464 check_is_fitted(self, 'tree_')
465 X = self._validate_X_predict(X, check_input)
--> 466 return self.tree_.apply(X)
467
468 def decision_path(self, X, check_input=True):
sklearn/tree/_tree.pyx in sklearn.tree._tree.Tree.apply()
sklearn/tree/_tree.pyx in sklearn.tree._tree.Tree.apply()
sklearn/tree/_tree.pyx in sklearn.tree._tree.Tree._apply_dense()
ValueError: X.dtype should be np.float32, got float64
解决方法
很显然,就是字面上的意思,只能是np.float32,但是给出的是float64
我看了tree.py
sklearn内置的代码一路走下来就应该是32位的。所以怀疑是自己前面int类型的输入训练集在转化是转化为了64位
看了一下,前面有这样的代码
X_original = X
X_normed = self.normalizer.transform(X)
X是我输入的X_train ,dataframe格式,int类型的数据
看下X_normed
ok,发现问题了,经过normalizer.transform()我的数据变成了numpy.ndarray类型,float64。那么咱们把类型转化过来就行了
ndarray的数据类型:
https://blog.csdn.net/weixin_43181110/article/details/83996915?spm=1001.2101.3001.6650.1&utm_medium=distribute.pc_relevant.none-task-blog-2%7Edefault%7ECTRLIST%7Edefault-1.no_search_link&depth_1-utm_source=distribute.pc_relevant.none-task-blog-2%7Edefault%7ECTRLIST%7Edefault-1.no_search_link
X_normed = X_normed.astype(np.float32)
这样就可以啦