Author:龙箬
Data Science and Big Data Technology
Change the world with data!
CSDN@weixin_43975035
很多人喜欢把心事扔进河里,就变成了石头
广义与一般线性模型及R使用
1.logistic回归模型
> d5.1=read.table("clipboard",header=T) #读取数据
> logit.glm <- glm(y ~ x1+x2+x3,family=binomial,data = d5.1) # Logistic回归模型
> summary(logit.glm)#Logistic回归模型
Call:
glm(formula = y ~ x1 + x2 + x3, family = binomial, data = d5.1)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.5636 -0.9131 -0.7892 0.9637 1.6000
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.597610 0.894831 0.668 0.5042
x1 -1.496084 0.704861 -2.123 0.0338 *
x2 -0.001595 0.016758 -0.095 0.9242
x3 0.315865 0.701093 0.451 0.6523
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 62.183 on 44 degrees of freedom
Residual deviance: 57.026 on 41 degrees of freedom
AIC: 65.026
Number of Fisher Scoring iterations: 4
数据为对45名驾驶员的调查结果
x1 :表示视力情况,分类变量,1表示好,0表示有问题
x2:年龄,数值型
x3:驾车教育,分类变量,1表示参加过驾驶教育,0表示没有参加过
y:分类变量(去年是否出过事故,1表示出过事故,0表示没有)
> d5.1
y x1 x2 x3
1 1 1 17 1
2 0 1 44 0
3 0 1 48 1
4 0 1 55 0
5 1 1 75 1
6 1 0 35 0
7 1 0 42 1
8 0 0 57 0
9 1 0 28 0
10 1 0 20 0
11 0 0 38 1
12 1 0 45 0
13 1 0 47 1
14 0 0 52 0
15 1 0 55 0
16 0 1 68 1
17 0 1 18 1
18 0 1 68 0
19 1 1 48 1
20 0 1 17 0
21 1 1 70 1
22 0 1 72 1
23 1 1 35 0
24 0 1 19 1
25 0 1 62 1
26 1 0 39 1
27 1 0 40 1
28 0 0 55 0
29 1 0 68 0
30 0 0 25 1
31 0 0 17 0
32 1 0 45 0
33 1 0 44 0
34 0 0 67 0
35 1 0 55 0
36 0 1 61 1
37 0 1 19 1
38 0 1 69 0
39 1 1 23 1
40 0 1 19 0
41 1 1 72 1
42 0 1 74 1
43 1 1 31 0
44 0 1 16 1
45 0 1 61 1
> logit.step <- step(logit.glm,direction = "both") #逐步筛选法变量选择
Start: AIC=65.03
y ~ x1 + x2 + x3
Df Deviance AIC
- x2 1 57.035 63.035
- x3 1 57.232 63.232
<none> 57.026 65.026
- x1 1 61.936 67.936
Step: AIC=63.03
y ~ x1 + x3
Df Deviance AIC
- x3 1 57.241 61.241
<none> 57.035 63.035
+ x2 1 57.026 65.026
- x1 1 61.991 65.991
Step: AIC=61.24
y ~ x1
Df Deviance AIC
<none> 57.241 61.241
+ x3 1 57.035 63.035
+ x2 1 57.232 63.232
- x1 1 62.183 64.183
> summary(logit.step) #逐步筛选法变量选择结果
Call:
glm(formula = y ~ x1, family = binomial, data = d5.1)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.4490 -0.8782 -0.8782 0.9282 1.5096
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.6190 0.4688 1.320 0.1867
x1 -1.3728 0.6353 -2.161 0.0307 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 62.183 on 44 degrees of freedom
Residual deviance: 57.241 on 43 degrees of freedom
AIC: 61.241
Number of Fisher Scoring iterations: 4
> prel <- predict(logit.step,data.frame(x1=1)) #预测视力正常司机Logistic回归结果
> p1 <- exp(prel)/(1+exp(prel)) #预测视力正常司机发生事故的概率
> prel2 <- predict(logit.step,data.frame(x1=0)) #预测视力有问题司机Logistic回归结果
> p2 <- exp(prel2)/(1+exp(prel2)) #预测视力有问题司机发生事故的概率
> c(p1,p2) #结果显示
1 1
0.32 0.65