分类算法如何绘制roc曲线_如何根据分类树概率绘制ROC曲线

最新推荐文章于 2023-11-11 15:03:56 发布

绿豆貉

最新推荐文章于 2023-11-11 15:03:56 发布

阅读量597

点赞数 1

文章标签：分类算法如何绘制roc曲线

本文链接：https://blog.csdn.net/weixin_30660429/article/details/111972426

版权

I am attempting to plot a ROC curve with classification trees probabilities. However, when I plot the curve, it is absent. I am trying to plot the ROC curve and then find the AUC value from the area under the curve. Does anyone know how to fix this? Thank you if you can. The binary column Risk stands for risk misclassification, which I presume is my label. Should I be applying the ROC curve equation at a different point in my code?

Here is the data frame:

library(ROCR)

data(Risk.table)

pred = prediction(Risk.table$Predicted.prob, Risk.table2$Risk)

perf = performance(pred, measure="tpr", x.measure="fpr")

perf

plot(perf)

Predicted.prob Actual.prob predicted actual Risk

1 0.5384615 0.4615385 G8 V4 0

2 0.1212121 0.8787879 V4 V4 1

3 0.5384615 0.4615385 G8 G8 1

4 0.9000000 0.1000000 G8 G8 1

5 0.1212121 0.8787879 V4 V4 1

6 0.1212121 0.8787879 V4 V4 1

7 0.9000000 0.1000000 G8 G8 1

8 0.5384615 0.4615385 G8 V4 0

9 0.5384615 0.4615385 G8 V4 0

10 0.1212121 0.8787879 V4 G8 0

11 0.1212121 0.8787879 V4 V4 1

12 0.9000000 0.1000000 G8 V4 0

13 0.9000000 0.1000000 G8 V4 0

14 0.1212121 0.8787879 G8 V4 1

15 0.9000000 0.1000000 G8 G8 1

16 0.5384615 0.4615385 G8 V4 0

17 0.9000000 0.1000000 G8 V4 0

18 0.1212121 0.8787879 V4 V4 1

19 0.5384615 0.4615385 G8 V4 0

20 0.1212121 0.8787879 V4 V4 1

21 0.9000000 0.1000000 G8 G8 1

22 0.5384615 0.4615385 G8 V4 0

23 0.9000000 0.1000000 G8 V4 0

24 0.1212121 0.8787879 V4 V4 1

Here is the ROC curve this code outputs, but the curve is missing:

I tried again and this ROC curve is just wrong

I constructed the above data frame using the code below:

The initial data frame containing all the data is called shuffle.cross.validation2

#Split data 70:30 after shuffling the data frame

index

trainindex.LDA3=sample(index, trunc(length(index)*0.70),replace=FALSE)

LDA.70.trainset3

LDA.30.testset3

Run classification tree using package rpart()

tree.split3

summary(tree.split3)

print(tree.split3)

plot(tree.split3)

text(tree.split3,use.n=T,digits=0)

printcp(tree.split3)

tree.split3

Predict the predicted and actual data

res3=predict(tree.split3,newdata=LDA.30.testset3)

res4=as.data.frame(res3)

Create two columns with NA's (Actual and predicted classification rate)

res4$predicted

res4$actual

for (i in 1:length(res4$G8)){

if(res4$R2[i]>res4$V4[i]) {

res4$predicted[i]

}

else {

res4$predicted[i]

}

print(i)

}

res4

res4$actual

res4

Risk.table$Risk

Risk.table

Create the binary predictor column

for (i in 1:length(Risk.table$Risk)){

if(Risk.table$predicted[i]==res4$actual[i]) {

Risk.table$Risk[i]

}

else {

Risk.table$Risk[i]

}

print(i)

}

Creation of the predicted and actual probabilities for the two families V4 and G8 above

#Confusion Matrix

cm=table(res4$actual, res4$predicted)

names(dimnames(cm))=c("actual", "predicted")

Naive Bayes

index

trainindex.LDA.help1=sample(index, trunc(length(index)*0.70), replace=FALSE)

sig.train=significant.lda.Wilks2[trainindex.LDA.help1,]

sig.test=significant.lda.Wilks2[-trainindex.LDA.help1,]

library(klaR)

nbmodel

prediction

colnames(NB)

NB$actual2 = NA

NB$actual2[NB$Actual=="G8"] = 1

NB$actual2[NB$Actual=="V4"] = 0

NB2

plot(fit.perf, col="red"); #Naive Bayes

plot(perf, col="blue", add=T); #Classification Tree

abline(0,1,col="green")

Original Naive Bayes code using the caret package

library(caret)

library(e1071)

train_control

model

predictions

confusionMatrix(predictions,LDA.scores$Family)

Results

Confusion Matrix and Statistics

Reference

Prediction V4 G8

V4 25 2

G8 5 48

Accuracy : 0.9125

95% CI : (0.828, 0.9641)

No Information Rate : 0.625

P-Value [Acc > NIR] : 4.918e-09

Kappa : 0.8095

Mcnemar's Test P-Value : 0.4497

Sensitivity : 0.8333

Specificity : 0.9600

Pos Pred Value : 0.9259

Neg Pred Value : 0.9057

Prevalence : 0.3750

Detection Rate : 0.3125

Detection Prevalence : 0.3375

Balanced Accuracy : 0.8967

'Positive' Class : V4

解决方案

I have various things to point out:

1) I think your code has to be Family ~ . inside your rpart command.

2) In your initial table I can see a value W3 in your predicted column. Does that mean you don’t have a binary dependent variable? ROC curves work with binary data, so check it.

3) Your predicted and actual probabilities in your initial table always sum to 1. Is that reasonable? I think they represent something else, so you might consider changing names in case they confuse you in the future.

4) I think you’re confused about how ROC works and what inputs it needs. Your Risk column uses 1 to represent a correct prediction and 0 to represent a wrong prediction. However, the ROC curve needs 1 to represent one class and 0 to represent the other class. In simple words, the command is prediction(predictions, labels) where predictions are your predicted probabilities and labels are the true class/levels of your dependent variable.

Check the following code:

dt = read.table(text="

Id Predicted.prob Actual.prob predicted actual Risk