chapter 4 exercise

最新推荐文章于 2020-06-11 11:54:51 发布

Distrlili

最新推荐文章于 2020-06-11 11:54:51 发布

阅读量1.5k

点赞数 1

分类专栏： data mining 文章标签： glm lda

本文链接：https://blog.csdn.net/G090909/article/details/50277727

版权

data mining 专栏收录该内容

15 篇文章 0 订阅

订阅专栏

problem13
对Boston数据集拟合分类模型来预测郊区犯罪率高于中位数还是你低于中位数。

Boston$c.crim <- (Boston$crim > median(Boston$crim))
#随机拆分数据集
set.seed(122)
rands <- rnorm(nrow(Boston))
test <- (rands > quantile(rands,0.75))
train <- !test
Boston.train <- Boston[train,]
Boston.test <- Boston[test,]

#分类变量的训练数据集
Boston.train$crim<- factor(as.numeric(Boston.train$c.crim))
head(Boston.train)

#逻辑斯蒂回归

logit.fit <- glm(crim~.-c.crim,data=Boston.train,family=binomial)
summary(logit.fit)
Call:
glm(formula = crim ~ . - c.crim, family = binomial, data = Boston.train)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-2.3717  -0.1638  -0.0050   0.0027   3.5013  

Coefficients:
              Estimate Std. Error z value Pr(>|z|)    
(Intercept) -31.585466   7.570606  -4.172 3.02e-05 ***
zn           -0.056584   0.039644  -1.427 0.153492    
indus        -0.073904   0.050582  -1.461 0.143995    
chas          1.479926   0.920073   1.608 0.107729    
nox          49.194715   9.059145   5.430 5.62e-08 ***
rm           -0.764723   0.769264  -0.994 0.320176    
age           0.042474   0.014863   2.858 0.004269 ** 
dis           0.589066   0.259613   2.269 0.023268 *  
rad           0.611718   0.175804   3.480 0.000502 ***
tax          -0.007147   0.003233  -2.211 0.027059 *  
ptratio       0.344830   0.140583   2.453 0.014172 *  
black        -0.012981   0.006615  -1.962 0.049732 *  
lstat        -0.040590   0.056146  -0.723 0.469723    
medv          0.155290   0.077391   2.007 0.044795 *  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 523.76  on 378  degrees of freedom
Residual deviance: 160.87  on 365  degrees of freedom
AIC: 188.87

Number of Fisher Scoring iterations: 9

glm.probs=predict(logit.fit,Boston.test,type="response")
glm.pred=rep(0,nrow(Boston.test))
glm.pred[glm.probs > 0.50]=1

#混淆矩阵
table(glm.pred,Boston.test$crim01)
glm.pred  0  1
       0 49  7
       1  2 69

#计算整体预算准确率
mean(glm.pred==Boston.test$crim01)
[1] 0.9291339

LDA模型

“`
lda.fit=lda(crim~nox+rad+medv+age+tax+ptratio, data=Boston.train)
lda.fit

Call:
lda(crim ~ nox + rad + medv + age + tax + ptratio, data = Boston.train)

Prior probabilities of groups:
0 1
0.5329815 0.4670185

Group means:
nox rad medv age tax ptratio
0 0.4729441 4.183168 24.58069 50.94010 309.7475 17.93614
1 0.6347175 14.559322 20.13729 86.69944 503.7062 18.90226

Coefficients of linear discriminants:
LD1
nox 8.247306805
rad 0.087278767
medv 0.030474664
age 0.016015886
tax -0.001093165
ptratio 0.028299344

lda.pred=predict(lda.fit,Boston.test) $class table(lda.pred,Boston.test$ crim)

lda.pred 0 1
0 50 18
1 1 58

mean(lda.pred==Boston.test$crim)
[1] 0.8503937

Distrlili

关注

1
点赞
踩
1

收藏

觉得还不错? 一键收藏
1
评论
chapter 4 exercise

problem13 对Boston数据集拟合分类模型来预测郊区犯罪率高于中位数还是你低于中位数。Boston$c.crim <- (Boston$crim > median(Boston$crim))#随机拆分数据集set.seed(122)rands <- rnorm(nrow(Boston))test <- (rands > quantile(rands,0.75))train <
复制链接

扫一扫