1、假设我们有一个训练好的二元分类模型tvsFitted,我们看下能得到这个模型的哪些指标来评价这个模型的好坏。
//fit
val tvsFitted = tvs.fit(trainData)
2、模型训练结束后,使用summary来获取评价指标
//best model summary 获取训练时最优模型的评价指标
import org.apache.spark.ml.PipelineModel
//获取最优模型
val trainedPipeline = tvsFitted.bestModel.asInstanceOf[PipelineModel]
val TrainedLR = trainedPipeline.stages(lrStage).asInstanceOf[LogisticRegressionModel]
//获取最优模型的二元分类summary
val summaryLR = TrainedLR.binarySummary
//查看损失函数迭代过程
println("bestModel object history is:"+ summaryLR.objectiveHistory.mkString(","))
println("bestModel object history iterate times:"+ summaryLR.objectiveHistory.length)
//查看回归系数、截距
println("bestModel coefficients is:"+ TrainedLR.coefficients)
println("bestModel intercept is:"+ TrainedLR.intercept)
//查看超参数
println("bestModel regParam:"+TrainedLR.getRegParam)
println("bestModel Threshold:"+TrainedLR.getThreshold)
//获取ROC数据 Obtain the receiver-operating characteristic
val roc = summaryLR.roc
println("receiver-operating characteristic DataFrame is :")
roc.show()
//获取AUC
val auc1 = summaryLR.areaUnderROC
println(s"bestModel summary areaUnderROC i.e AUC: $auc1")
//获取查准率precision、召回率recall、查全率accuracy
val accuracy = summaryLR.accuracy
val falsePositiveRate = summaryLR.weightedFalsePositiveRate
val truePositiveRate = summaryLR.weightedTruePositiveRate
val fMeasure = summaryLR.weightedFMeasure
val precision = summaryLR.weightedPrecision
val recall = summaryLR.weightedRecall
println(s"bestModel Accuracy: $accuracy\nFPR: $falsePositiveRate\nTPR: $truePositiveRate\n" +
s"F-measure: $fMeasure\nPrecision: $precision\nRecall: $recall")
//获取使得F1度量最大的阈值,并赋值给最优模型 Set the model threshold to maximize F-Measure
val fMeasureDF = summaryLR.fMeasureByThreshold
val maxFMeasure = fMeasureDF.select(max("F-Measure")).head().getDouble(0)
val bestThreshold = fMeasureDF.where(col("F-Measure") === maxFMeasure)
.select("threshold").head().getDouble(0)
TrainedLR.setThreshold(bestThreshold)
println(s"segmentModel best threshold is: $bestThreshold, maxFMeasure is $maxFMeasure")
3、模型预测一个测试集后获取评价指标
//evaluate
val tvsPredict = tvsFitted.transform(testData)
tvsPredict.cache()
tvsPredict.show()
val auc = evaluator.evaluate(tvsPredict)
println(s"Model evaluate testData areaUnderROC i.e AUC: $auc")