XGBoost常见错误与调优

最新推荐文章于 2023-10-31 20:00:00 发布

phyllisyuell

最新推荐文章于 2023-10-31 20:00:00 发布

阅读量1.4k

点赞数

分类专栏： xgboost 文章标签： XGBoost

本文链接：https://blog.csdn.net/phyllisyuell/article/details/114004880

版权

本文详细介绍了在使用XGBoost过程中遇到的常见错误，如处理缺失值问题和'not found key: train'的错误，并提供了相应的解决方案。同时，文章探讨了模型调优的两个关键点，包括如何应对模型训练时间过长的问题以及如何结合Spark的动态资源分配策略进行优化。

摘要由CSDN通过智能技术生成

一、常见错误

1、空值

报错信息：

java.lang.RuntimeException: you can only specify missing value as 0.0 (the currently set value NaN) when you have SparseVector or Empty vector as your feature format
at ml.dmlc.xgboost4j.scala.spark.XGBoost$$anonfun$verifyMissingSetting$1.apply(XGBoost.scala:77)
at ml.dmlc.xgboost4j.scala.spark.XGBoost$$anonfun$verifyMissingSetting$1.apply(XGBoost.scala:75)

XGBosot.scala报错地方点进去发现：

这里是XGBoost对于缺失值的处理,xxxxxxx

解决办法，设置missing处理的情况，

val xliff = new XGBoostClassifier(params)
  .setFeaturesCol("indexedFeatures") //feature 列
 .setLabelCol("indexedLabel") //lable 列
 .setMissing(0)  //设置对于缺失值的处理

2、xgboost4j-spark中碰到not found key:train

给xgb的参数setNumWorks(80)有关，设置的太大就会这样报错，原因是xgb中num_worker这个参数是表示模型在run的时候，会并行跑多少个worker，而每个worker起码是会分配到一个partition的。所以partition设置的越小