加载包含数据集的包:
install.packages("ISLR")
library('ISLR')
查看数据集:
head(Default)
划分训练集、测试集:
set.seed(123)
data1 = Default
head(data1)
nn = sample(1:nrow(data1), ceiling(nrow(data1) * 0.7), replace = TRUE)
train = data1[nn,]
test = data1[-nn,]
用逻辑回归模型训练:
default_logit = glm(default ~., family = binomial(link = 'logit') ,data = train,
maxit=1000)
查看训练的模型信息:
summary(default_logit)
用模型预测:
p = predict(default_logit, newdata = test, type = 'response') > 0.29
查看p的信息:
summary(p)
转换为因子变量:
p_factor = factor(ifelse(p, 'Yes', 'No'), levels = c('No', 'Yes'))
查看因子变量的信息:
summary(p_factor)
生成列联表:
table(test$default, p_factor)
将列联表转换为矩阵:
matrix = as.matrix(table(test$default, p_factor))
查看矩阵:
手工计算一些指标:
TN = matrix[1, 1]
TP = matrix[2, 2]
FP = matrix[1, 2]
FN = matrix[2, 1]
accuracy = (TN + TP) / (TN + TP + FP + FN)
recall = TP / (TP + FN)
spec = TN / (TN + FP)
prec = TP / (TP + FP)
F_value = (2 * prec * recall) / (prec + recall)
查看指标的值:
画ROC曲线:
install.packages('ROCR')
library('ROCR')
pred = prediction(predict(default_logit, newdata = test, type = 'response'), test$default)
performance(pred, 'auc')@y.values
perf = performance(pred, 'tpr', 'fpr')
plot(perf, main='逻辑回归的ROC曲线', lwd=2)
abline(a=0, b=1, lty=2)