分类算法如何绘制roc曲线_如何根据分类树概率绘制ROC曲线

I am attempting to plot a ROC curve with classification trees probabilities. However, when I plot the curve, it is absent. I am trying to plot the ROC curve and then find the AUC value from the area under the curve. Does anyone know how to fix this? Thank you if you can. The binary column Risk stands for risk misclassification, which I presume is my label. Should I be applying the ROC curve equation at a different point in my code?

Here is the data frame:

library(ROCR)

data(Risk.table)

pred = prediction(Risk.table$Predicted.prob, Risk.table2$Risk)

perf = performance(pred, measure="tpr", x.measure="fpr")

perf

plot(perf)

Predicted.prob Actual.prob predicted actual Risk

1 0.5384615 0.4615385 G8 V4 0

2 0.1212121 0.8787879 V4 V4 1

3 0.5384615 0.4615385 G8 G8 1

4 0.9000000 0.1000000 G8 G8 1

5 0.1212121 0.8787879 V4 V4 1

6 0.1212121 0.8787879 V4 V4 1

7 0.9000000 0.1000000 G8 G8 1

8 0.5384615 0.4615385 G8 V4 0

9 0.5384615 0.4615385 G8 V4 0

10 0.1212121 0.8787879 V4 G8 0

11 0.1212121 0.8787879 V4 V4 1

12 0.9000000 0.1000000 G8 V4 0

13 0.9000000 0.1000000 G8 V4 0

14 0.1212121 0.8787879 G8 V4 1

15 0.9000000 0.1000000 G8 G8 1

16 0.5384615 0.4615385 G8 V4 0

17 0.9000000 0.1000000 G8 V4 0

18 0.1212121 0.8787879 V4 V4 1

19 0.5384615 0.4615385 G8 V4 0

20 0.1212121 0.8787879 V4 V4 1

21 0.9000000 0.1000000 G8 G8 1

22 0.5384615 0.4615385 G8 V4 0

23 0.9000000 0.1000000 G8 V4 0

24 0.1212121 0.8787879 V4 V4 1

Here is the ROC curve this code outputs, but the curve is missing:

I tried again and this ROC curve is just wrong

I constructed the above data frame using the code below:

The initial data frame containing all the data is called shuffle.cross.validation2

#Split data 70:30 after shuffling the data frame

index

trainindex.LDA3=sample(index, trunc(length(index)*0.70),replace=FALSE)

LDA.70.trainset3

LDA.30.testset3

Run classification tree using package rpart()

tree.split3

tree.split3

summary(tree.split3)

print(tree.split3)

plot(tree.split3)

text(tree.split3,use.n=T,digits=0)

printcp(tree.split3)

tree.split3

Predict the predicted and actual data

res3=predict(tree.split3,newdata=LDA.30.testset3)

res4=as.data.frame(res3)

Create two columns with NA's (Actual and predicted classification rate)

res4$predicted

res4$actual

for (i in 1:length(res4$G8)){

if(res4$R2[i]>res4$V4[i]) {

res4$predicted[i]

}

else {

res4$predicted[i]

}

print(i)

}

res4

res4$actual

res4

Risk.table$Risk

Risk.table

Create the binary predictor column

for (i in 1:length(Risk.table$Risk)){

if(Risk.table$predicted[i]==res4$actual[i]) {

Risk.table$Risk[i]

}

else {

Risk.table$Risk[i]

}

print(i)

}

Creation of the predicted and actual probabilities for the two families V4 and G8 above

#Confusion Matrix

cm=table(res4$actual, res4$predicted)

names(dimnames(cm))=c("actual", "predicted")

Naive Bayes

index

trainindex.LDA.help1=sample(index, trunc(length(index)*0.70), replace=FALSE)

sig.train=significant.lda.Wilks2[trainindex.LDA.help1,]

sig.test=significant.lda.Wilks2[-trainindex.LDA.help1,]

library(klaR)

nbmodel

prediction

NB

colnames(NB)

NB$actual2 = NA

NB$actual2[NB$Actual=="G8"] = 1

NB$actual2[NB$Actual=="V4"] = 0

NB2

plot(fit.perf, col="red"); #Naive Bayes

plot(perf, col="blue", add=T); #Classification Tree

abline(0,1,col="green")

Original Naive Bayes code using the caret package

library(caret)

library(e1071)

train_control

model

predictions

confusionMatrix(predictions,LDA.scores$Family)

Results

Confusion Matrix and Statistics

Reference

Prediction V4 G8

V4 25 2

G8 5 48

Accuracy : 0.9125

95% CI : (0.828, 0.9641)

No Information Rate : 0.625

P-Value [Acc > NIR] : 4.918e-09

Kappa : 0.8095

Mcnemar's Test P-Value : 0.4497

Sensitivity : 0.8333

Specificity : 0.9600

Pos Pred Value : 0.9259

Neg Pred Value : 0.9057

Prevalence : 0.3750

Detection Rate : 0.3125

Detection Prevalence : 0.3375

Balanced Accuracy : 0.8967

'Positive' Class : V4

解决方案

I have various things to point out:

1) I think your code has to be Family ~ . inside your rpart command.

2) In your initial table I can see a value W3 in your predicted column. Does that mean you don’t have a binary dependent variable? ROC curves work with binary data, so check it.

3) Your predicted and actual probabilities in your initial table always sum to 1. Is that reasonable? I think they represent something else, so you might consider changing names in case they confuse you in the future.

4) I think you’re confused about how ROC works and what inputs it needs. Your Risk column uses 1 to represent a correct prediction and 0 to represent a wrong prediction. However, the ROC curve needs 1 to represent one class and 0 to represent the other class. In simple words, the command is prediction(predictions, labels) where predictions are your predicted probabilities and labels are the true class/levels of your dependent variable.

Check the following code:

dt = read.table(text="

Id Predicted.prob Actual.prob predicted actual Risk

1 0.5384615 0.4615385 G8 V4 0

2 0.1212121 0.8787879 V4 V4 1

3 0.5384615 0.4615385 G8 G8 1

4 0.9000000 0.1000000 G8 G8 1

5 0.1212121 0.8787879 V4 V4 1

6 0.1212121 0.8787879 V4 V4 1

7 0.9000000 0.1000000 G8 G8 1

8 0.5384615 0.4615385 G8 V4 0

9 0.5384615 0.4615385 G8 V4 0

10 0.1212121 0.8787879 V4 G8 0

11 0.1212121 0.8787879 V4 V4 1

12 0.9000000 0.1000000 G8 V4 0

13 0.9000000 0.1000000 G8 V4 0

14 0.1212121 0.8787879 W3 V4 1

15 0.9000000 0.1000000 G8 G8 1

16 0.5384615 0.4615385 G8 V4 0

17 0.9000000 0.1000000 G8 V4 0

18 0.1212121 0.8787879 V4 V4 1

19 0.5384615 0.4615385 G8 V4 0

20 0.1212121 0.8787879 V4 V4 1

21 0.9000000 0.1000000 G8 G8 1

22 0.5384615 0.4615385 G8 V4 0

23 0.9000000 0.1000000 G8 V4 0

24 0.1212121 0.8787879 V4 V4 1", header=T)

library(ROCR)

roc_pred

perf

plot(perf, col="red")

abline(0,1,col="grey")

The ROC curve is :

When you create a new column actual2 where you have 1 instead of G8 and 0 instead of V4:

dt$actual2 = NA

dt$actual2[dt$actual=="G8"] = 1

dt$actual2[dt$actual=="V4"] = 0

roc_pred

perf

plot(perf, col="red")

abline(0,1,col="grey")

5) As @eipi10 mentioned above, you should try to get rid of the for loops in your code.

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值