一、常见错误
1、空值
报错信息:
java.lang.RuntimeException: you can only specify missing value as 0.0 (the currently set value NaN) when you have SparseVector or Empty vector as your feature format
at ml.dmlc.xgboost4j.scala.spark.XGBoost$$anonfun$verifyMissingSetting$1.apply(XGBoost.scala:77)
at ml.dmlc.xgboost4j.scala.spark.XGBoost$$anonfun$verifyMissingSetting$1.apply(XGBoost.scala:75)
XGBosot.scala报错地方点进去发现:
这里是XGBoost对于缺失值的处理,xxxxxxx
解决办法,设置missing处理的情况,
val xliff = new XGBoostClassifier(params)
.setFeaturesCol("indexedFeatures") //feature 列
.setLabelCol("indexedLabel") //lable 列
.setMissing(0) //设置对于缺失值的处理
2、xgboost4j-spark中碰到not found key:train
给xgb的参数setNumWorks(80)有关,设置的太大就会这样报错,原因是xgb中num_worker这个参数是表示模型在run的时候,会并行跑多少个worker,而每个worker起码是会分配到一个partition的。所以partition设置的越小