sklearn中GradientBoostingClassifier bug：ValueError: Input contains NaN, infinity or a value too large

最新推荐文章于 2024-04-08 22:16:18 发布

千行百行

最新推荐文章于 2024-04-08 22:16:18 发布

阅读量2.4k

点赞数 1

欢迎转载，但是请明确地标注清楚源自CSDN千行百行。不加标注，虽远必诛！！！

本文链接：https://blog.csdn.net/shiyuzuxiaqianli/article/details/124350165

版权

python 同时被 3 个专栏收录

67 篇文章 9 订阅

订阅专栏

Debug

18 篇文章 1 订阅

订阅专栏

sklearn

6 篇文章 0 订阅

订阅专栏

宣称支持缺失值处理

sklearn的文档宣称支持处理缺失值，文档原文如下（为了防止文档发生变化，特意截图如下）：

Note Scikit-learn 0.21 introduces two new implementations of gradient boosting trees, namely HistGradientBoostingClassifier and HistGradientBoostingRegressor, inspired by LightGBM (See [LightGBM]).
These histogram-based estimators can be orders of magnitude faster than GradientBoostingClassifier and GradientBoostingRegressor when the number of samples is larger than tens of thousands of samples.
They also have built-in support for missing values, which avoids the need for an imputer.
These estimators are described in more detail below in Histogram-Based Gradient Boosting.
The following guide focuses on GradientBoostingClassifier and GradientBoostingRegressor, which might be preferred for small sample sizes since binning may lead to split points that are too approximate in this setting.

实际却不支持缺失值处理

import numpy as np
from sklearn.datasets import make_hastie_10_2
from sklearn.ensemble import GradientBoostingClassifier
X, y = make_hastie_10_2(random_state=0)
X_train, X_test = X[:2000], X[2000:]
y_train, y_test = y[:2000], y[2000:]
X_train[0,0]=np.nan
clf = GradientBoostingClassifier()
clf.fit(X_train, y_train)

报错：

ValueError: Input contains NaN, 
infinity or a value too large for dtype('float32').

我使用的sklearn版本是1.0.2，版本是符合文档要求的。

可见sklearn还是有bug的，大家用的时候小心点！

千行百行

关注

1
点赞
踩
0

收藏

觉得还不错? 一键收藏
打赏
0
评论
sklearn中GradientBoostingClassifier bug：ValueError: Input contains NaN, infinity or a value too large

sklearn的GradientBoostingClassifier真的支持缺失值处理吗？？？
复制链接

扫一扫