problem13
对Boston数据集拟合分类模型来预测郊区犯罪率高于中位数还是你低于中位数。
Boston$c.crim <- (Boston$crim > median(Boston$crim))
#随机拆分数据集
set.seed(122)
rands <- rnorm(nrow(Boston))
test <- (rands > quantile(rands,0.75))
train <- !test
Boston.train <- Boston[train,]
Boston.test <- Boston[test,]
#分类变量的训练数据集
Boston.train$crim<- factor(as.numeric(Boston.train$c.crim))
head(Boston.train)
#逻辑斯蒂回归
logit.fit <- glm(crim~.-c.crim,data=Boston.train,family=binomial)
summary(logit.fit)
Call:
glm(formula = crim ~ . - c.crim, family = binomial, data = Boston.train)
Deviance Residuals:
Min 1Q Median 3Q Max
-2.3717 -0.1638 -0.0050 0.0027 3.5013
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -31.585466 7.570606 -4.172 3.02e-05 ***
zn -0.056584 0.039644 -1.427 0.153492
indus -0.073904 0.050582 -1.461 0.143995
chas 1.479926 0.920073 1.608 0.107729
nox 49.194715 9.059145 5.430 5.62e-08 ***
rm -0.764723 0.769264 -0.994 0.320176
age 0.042474 0.014863 2.858 0.004269 **
dis 0.589066 0.259613 2.269 0.023268 *
rad 0.611718 0.175804 3.480 0.000502 ***
tax -0.007147 0.003233 -2.211 0.027059 *
ptratio 0.344830 0.140583 2.453 0.014172 *
black -0.012981 0.006615 -1.962 0.049732 *
lstat -0.040590 0.056146 -0.723 0.469723
medv 0.155290 0.077391 2.007 0.044795 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 523.76 on 378 degrees of freedom
Residual deviance: 160.87 on 365 degrees of freedom
AIC: 188.87
Number of Fisher Scoring iterations: 9
glm.probs=predict(logit.fit,Boston.test,type="response")
glm.pred=rep(0,nrow(Boston.test))
glm.pred[glm.probs > 0.50]=1
#混淆矩阵
table(glm.pred,Boston.test$crim01)
glm.pred 0 1
0 49 7
1 2 69
#计算整体预算准确率
mean(glm.pred==Boston.test$crim01)
[1] 0.9291339
LDA模型
“`
lda.fit=lda(crim~nox+rad+medv+age+tax+ptratio, data=Boston.train)
lda.fit
Call:
lda(crim ~ nox + rad + medv + age + tax + ptratio, data = Boston.train)
Prior probabilities of groups:
0 1
0.5329815 0.4670185
Group means:
nox rad medv age tax ptratio
0 0.4729441 4.183168 24.58069 50.94010 309.7475 17.93614
1 0.6347175 14.559322 20.13729 86.69944 503.7062 18.90226
Coefficients of linear discriminants:
LD1
nox 8.247306805
rad 0.087278767
medv 0.030474664
age 0.016015886
tax -0.001093165
ptratio 0.028299344
lda.pred=predict(lda.fit,Boston.test) classtable(lda.pred,Boston.test crim)
lda.pred 0 1
0 50 18
1 1 58
mean(lda.pred==Boston.test$crim)
[1] 0.8503937