XGBoost处理缺失值(Null)问题

最新推荐文章于 2024-06-02 16:48:38 发布

VitoDi

最新推荐文章于 2024-06-02 16:48:38 发布

阅读量1.4w

点赞数

分类专栏： XGBoost 文章标签： XGBoost

本文链接：https://blog.csdn.net/VitoDi/article/details/59541300

版权

XGBoost能够自动处理数据中的缺失值，无需预先填充。通过以稀疏矩阵形式输入数据，未出现的特征被视为缺失。内部算法会学习如何在值缺失时选择最佳路径，等价于自动学习最佳填充值。在Python中，可使用DMatrix加载多种格式数据，或者将数据转换为libsvm格式，如sklearn.datasets.load_svmlight_file()函数。XGBoost的官方文档和教程提供了更多关于处理缺失值的支持。

摘要由CSDN通过智能技术生成

对于数据缺失的问题，XGBoost设计了很好的默认机制处理这个问题。以下摘自陈天奇在GitHub讨论组中的讨论。

You can directly feed data in as sparse matrix, and only contains non-missing value. i.e. features that are not presented in the sparse feature matrix are treated as ‘missing’.

XGBoost will handle it internally and you do not need to do anything on it.

It will depends on how you present the data. If you put data in as LIBSVM format, and list zero features there, it will not be treated as missing.

Internally, XGBoost will automatically learn what is the best direction to go when a value is missing. Equivalently, this can be viewed as automatically “learn” what is the best imputation value for missing values based on reduction on training loss.

当数据中含有缺失值的时候，我们可以不再填充缺失值。利用XGBoost的机制自动处理缺失值。这时候需要生成libsvm格式的数据。（补充：其实其他格式的数据也可以，我理解错了。）
具体讨论见链接点击。</

最低0.47元/天解锁文章

VitoDi

关注

0
点赞
踩
4

收藏

觉得还不错? 一键收藏
1
评论
XGBoost处理缺失值(Null)问题

对于数据缺失的问题，XGBoost设计了很好的默认机制处理这个问题。以下摘自陈天奇在GitHub讨论组中的讨论。 You can directly feed data in as sparse matrix, and only contains non-missing value. i.e. features that are not presented in the sparse fea
复制链接

扫一扫