ISLR;R语言; 机器学习 ;线性回归
一些专业词汇只知道英语的,中文可能不标准,请轻喷
5.Default数据分析
> library(ISLR)
> summary(Default)
default student balance income
No :9667 No :7056 Min. : 0.0 Min. : 772
Yes: 333 Yes:2944 1st Qu.: 481.7 1st Qu.:21340
Median : 823.6 Median :34553
Mean : 835.4 Mean :33517
3rd Qu.:1166.3 3rd Qu.:43808
Max. :2654.3 Max. :73554
> attach(Default)
a)
> set.seed(1)
> glm.fit=glm(default~income+balance,data=Default,family=binomial)
b)
> FiveB=function(){
+ #i.
+ train=sample(dim(Default)[1],dim(Default)[1]/2)
+ #ii.
+ glm.fit = glm(default ~ income + balance, data=Default, family = binomial,subset=train)
+ #iii.
+ glm.pred = rep("No",dim(Default)[1]/2)
+ glm.probs=predict(glm.fit,Default[-train, ],type="response")
+ glm.pred[glm.probs > 0.5]="Yes"
+ #iv.
+ return(mean(glm.pred != Default[-train, ]$default))
+ }
> FiveB()
[1] 0.0236
2.36%的错误率
c)
> FiveB()
[1] 0.028
> FiveB()
[1] 0.0268
> FiveB()
[1] 0.0252
错误率在2.6%上下波动。
d)
> train=sample(dim(Default)[1],dim(Default)[1]/2)
> glm.fit = glm(default ~ income + balance + student, data=Default, family = binomial, subset = train)
> glm.pred = rep("No",dim(Default)[1]/2)
> glm.probs = predict(glm.fit, Default[-train,],type="response")
> glm.pred[glm.probs > 0.5] = "Yes"
> mean(glm.pred != Default[-train,]$default)
[1] 0.0246
错误率为2.46%,增加student变量并没有减少错误率
6.Default数据集
> library(ISLR)
> summary(Default)
default student balance income
No :9667 No :7056 Min. : 0.0 Min. : 772
Yes: 333 Yes:2944 1st Qu.: 481.7 1st Qu.:21340
Median : 823.6 Median :34553
Mean : 835.4 Mean :33517
3rd Qu.:1166.3 3rd Qu.:43808
Max. :2654.3 Max. :73554
> attach(Default)
a)
> set.seed(1)
> glm.fit = glm(default ~ income + balance, data = Default, family = binomial)
> summary(glm.fit)
Call:
glm(formula = default ~ income + balance, family = binomial,
data = Default)
Deviance Residuals: