在进行分类时,通常不仅希望知道该样本是被预测为0,1,还希望获得该样本被预测为0,1的概率
LR中:
val model = new LogisticRegressionWithLBFGS().setNumClasses(2).run(trainingData)
model.clearThreshold()
//默认Threshold为0.5,只需通过model.clearThreshold()函数去掉阈值即可获得分类概率
GBDT 中:
原始的
predict函数只能输出0,1;我们需要通过以下原代码修改该函数即可得到概率
def predict(features: Vector): Double = { (algo, combiningStrategy) match { case (Regression, Sum) => predictBySumming(features) case (Regression, Average) => predictBySumming(features) / sumWeights case (Classification, Sum) => // binary classification val prediction = predictBySumming(features) // TODO: predicted labels are +1 or -1 for GBT. Need a better way to store this info. //if (prediction > 0.0) 1.0 else 0.0(原始) (1/(1+math.pow(2.7, -prediction)))//修改为sigmoid函数 case (Classification, Vote) => predictByVoting(features) case _ => throw new IllegalArgumentException( "TreeEnsembleModel1 given unsupported (algo, combiningStrategy) combination: " + s"($algo, $combiningStrategy).") } }
然后import重定义的class即可