使用R语言做逻辑回归的时候,当自变量中有分类变量(大于两个)的时候,对于回归模型的结果有一点困惑,搜索相关知识发现不少人也有相同的疑问,通过查阅资料这里给出自己的理解。
首先看一个实例(数据下载自:http://freakonometrics.free.fr/db.txt)
> db <- read.table("db.txt",header=TRUE,sep=";")
> head(db)
Y X1 X2 X3 1 1 3.297569 16.25411 B 2 1 6.418031 18.45130 D 3 1 5.279068 16.61806 B 4 1 5.539834 19.72158 C 5 1 4.123464 18.38634 C 6 1 7.778443 19.58338 C
> summary(db)
Y X1 X2 X3 Min. :0.000 Min. :-1.229 Min. :10.93 A:197 1st Qu.:1.000 1st Qu.: 4.545 1st Qu.:17.98 B:206 Median :1.000 Median : 5.982 Median :20.00 C:196 Mean :0.921 Mean : 5.958 Mean :19.94 D:197 3rd Qu.:1.000 3rd Qu.: 7.358 3rd Qu.:21.89 E:204 Max. :1.000 Max. :11.966 Max. :28.71
> reg <- glm(Y~X1+X2+X3,family=binomial,data=db)
> summary(reg)
Call: glm(formula = Y ~ X1 + X2 + X3, family = binomial, data = db) Deviance Residuals: Min 1Q Median 3Q Max -2.98017 0.09327 0.19106 0.37000 1.50646 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -4.45885 1.04646 -4.261 2.04e-05 *** X1 0.51664 0.11178 4.622 3.80e-06 *** X2 0.21008 0.07247 2.899 0.003745 ** X3B 1.74496 0.49952 3.493 0.000477 *** X3C -0.03470 0.35691 -0.097 0.922543 X3D 0.08004 0.34916 0.229 0.818672 X3E 2.21966 0.56475 3.930 8.48e-05 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 552.6