使用scala做二元分类模型的评价

1、假设我们有一个训练好的二元分类模型tvsFitted,我们看下能得到这个模型的哪些指标来评价这个模型的好坏。

    //fit
    val tvsFitted = tvs.fit(trainData)

2、模型训练结束后,使用summary来获取评价指标

    //best model summary 获取训练时最优模型的评价指标
    import org.apache.spark.ml.PipelineModel

    //获取最优模型
    val trainedPipeline = tvsFitted.bestModel.asInstanceOf[PipelineModel]
    val TrainedLR = trainedPipeline.stages(lrStage).asInstanceOf[LogisticRegressionModel]

    //获取最优模型的二元分类summary
     val summaryLR = TrainedLR.binarySummary

    //查看损失函数迭代过程
    println("bestModel object history is:"+ summaryLR.objectiveHistory.mkString(","))
    println("bestModel object history iterate times:"+ summaryLR.objectiveHistory.length)

    //查看回归系数、截距
    println("bestModel coefficients is:"+ TrainedLR.coefficients)
    println("bestModel intercept is:"+ TrainedLR.intercept)

    //查看超参数
    println("bestModel regParam:"+TrainedLR.getRegParam)
    println("bestModel Threshold:"+TrainedLR.getThreshold)

    //获取ROC数据 Obtain the receiver-operating characteristic
    val roc = summaryLR.roc
    println("receiver-operating characteristic DataFrame is :")
    roc.show()
    
    //获取AUC
    val auc1 = summaryLR.areaUnderROC
    println(s"bestModel summary areaUnderROC i.e AUC: $auc1")

    //获取查准率precision、召回率recall、查全率accuracy
    val accuracy = summaryLR.accuracy
    val falsePositiveRate = summaryLR.weightedFalsePositiveRate
    val truePositiveRate = summaryLR.weightedTruePositiveRate
    val fMeasure = summaryLR.weightedFMeasure
    val precision = summaryLR.weightedPrecision
    val recall = summaryLR.weightedRecall
    println(s"bestModel Accuracy: $accuracy\nFPR: $falsePositiveRate\nTPR: $truePositiveRate\n" +
      s"F-measure: $fMeasure\nPrecision: $precision\nRecall: $recall")
    
    //获取使得F1度量最大的阈值,并赋值给最优模型 Set the model threshold to maximize F-Measure
    val fMeasureDF = summaryLR.fMeasureByThreshold
    val maxFMeasure = fMeasureDF.select(max("F-Measure")).head().getDouble(0)
    val bestThreshold = fMeasureDF.where(col("F-Measure") === maxFMeasure)
      .select("threshold").head().getDouble(0)
    TrainedLR.setThreshold(bestThreshold) 
    println(s"segmentModel best threshold is: $bestThreshold, maxFMeasure is $maxFMeasure")

3、模型预测一个测试集后获取评价指标

    //evaluate
    val tvsPredict = tvsFitted.transform(testData)
    tvsPredict.cache()
    tvsPredict.show()

    val auc = evaluator.evaluate(tvsPredict)
    println(s"Model evaluate testData areaUnderROC i.e AUC: $auc")

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值