logit回归模型假设_Logistic回归模型及应用建模(二)

本文介绍了如何使用R中的nnet包建立多元Logistic回归模型,以进入高中学生的计划选择为例,分析了写作成绩和社会经济地位对选择的影响。通过模型,我们可以观察不同变量对不同结果的概率影响,并通过预测概率来理解模型。
摘要由CSDN通过智能技术生成

二、因变量多分类 logistic

回归

​1、概述:多元Logistic回归模型被用来建立有多个输出变量的模型,且这些预测变量通过一个线性组合变成为一个最终的预测变量。Multinomial

Logistic 回归模型中因变量可以取多个值.

所需的应用包:​

​library(foreign)

library(nnet)

library(ggplot2)

library(reshape2)​

2、建模例子:​

Example . Entering high school students make program

choices among general program, vocational program and academic

program. Their choice might be modeled using their writing score

and their social economic

status.​​

例子:进入高中的学生在做计划选择时,面临着一般计划、职业计划及学术计划三种选择。他们的选择往往是通过对他们的写作成绩及社会经济地位进行模型化决定。

1)读取数据:

ml

read.dta("http://www.ats.ucla.edu/stat/data/hsbdemo.dta")

这个数据集包含了有200个学生的记录,结果变量为prog(program

type)三种项目类型,预测变量为ses(social economic

status)社会经济地位,3个分类变量(id,female,schtyp),及其他连续变量。​

2)数据初步探索​:

with(ml,

table(ses, prog))

prog

ses

general

academic  vocation

low

16

19

12

middle

20

44

31

high

9

42

7​

按照prog分组统计写作得分的均值及标准方差

with(ml, do.call(rbind, tapply(write, prog, function(x) c(M =

mean(x), SD = sd(x)))))

​M

SD

general  51.33333 9.397775

academic 56.25714 7.943343

vocation 46.76000 9.318754

3)建立模型:

利用nnet包中的函数multinom,建立多元logistic回归模型:

Before

running our model. We then choose the level of our outcome that we

wish to use as our baseline and specify this in

therelevelfunction.

Then, we run our model usingmultinom.

Themultinompackage

does not include p-value calculation for the regression

coefficients, so we calculate p-values using Wald tests (here

z-tests).

在建立模型前,我们选择输出变量的水平,并利用函数​relevel设置prog为哑变量(虚拟变量),以academic作为参考水平:

ml$prog2

"academic")

​test

~ ses + write, data = ml)

summary(test)​

结果显示:

Call:

multinom(formula = prog2 ~ ses + write, data =

ml)

Coefficients:

(Intercept)  sesmiddle

seshigh

write

general     2.852198

-0.5332810 -1.1628226 -0.0579287

vocation    5.218260

0.2913859 -0.9826649 -0.1136037

Std. Errors:

(Intercept) sesmiddle   seshigh

write

general     1.166441

0.4437323 0.5142196 0.02141097

vocation    1.163552

0.4763739 0.5955665 0.02221996

Residual Deviance: 359.9635

AIC: 375.9635

​z

summary(test)$coefficients/summary(test)$standard.errors

(Intercept)  sesmiddle

seshigh

write

general     2.445214

-1.2018081 -2.261334 -2.705562

vocation    4.484769

0.6116747 -1.649967 -5.112689

​2-tailed z

test

​p

-

pnorm(abs(z),0, 1)) * 2

​(Intercept)

sesmiddle

seshigh

write

general  0.0144766100 0.2294379

0.02373856 6.818902e-03

vocation 0.0000072993 0.5407530 0.09894976

3.176045e-07

​The model summary output

has a block of coefficients and a block of standard errors.

模型的结果输出为一系列的系数及相应的标准差,每一行参数都反应了一个模型的等式。Each of

these blocks has one row of values corresponding to a model

equation. Focusing on the block of coefficients, we can

look at the first row

comparing ​prog =

"general"to our

baselineprog =

"academic"and the second row

comparingprog =

"vocation"to our

baselineprog =

"academic". If we consider our coefficients from the

first row to be b_1 and our coefficients from the second row to be

b_2, we can write our model equations:

\[ln\left(\frac{P(prog=general)}{P(prog=academic)}\right)

= b_{10} + b_{11}(ses=2) + b_{12}(ses=3) +

b_{13}write\] ​

模型公式一

\[ln\left(\frac{P(prog=vocation)}{P(prog=academic)}\right)

= b_{20} + b_{21}(ses=2) + b_{22}(ses=3) + b_{23}write\]

模型公式二

​## extract the

coefficients from the model and exponentiate

exp(coef(test))

(Intercept) sesmiddle

seshigh

write

general     17.32582

0.5866769 0.3126026 0.9437172

vocation   184.61262 1.3382809 0.3743123

0.8926116

解释:

The relative risk ratio for a one-unit

increase in the

variable write is

.9437 for being in general program vs. academic

program.

The relative risk ratio switching

from ses= 1 to 3 is .3126 for being

in general program vs. academic program.

​You can also use predicted

probabilities to help you understand the model.

You can calculate predicted probabilities for

each of our outcome levels using

thefittedfunction.

We can start by generating the predicted probabilities for the

observations in our dataset and viewing the first few

rows

head(pp

​academic

general  vocation

1 0.1482764 0.3382454 0.5134781

2 0.1202017 0.1806283 0.6991700

3 0.4186747 0.2368082 0.3445171

4 0.1726885 0.3508384 0.4764731

5 0.1001231 0.1689374 0.7309395

6 0.3533566 0.2377976 0.4088458

Next, if we want to examine the changes in predicted

probability associated with one of our two variables, we can create

small datasets varying one variable while holding the other

constant. We will first do this

holdingwriteat

its mean and examining the predicted probabilities for each level

ofses.​

dses

data.frame(ses

=

c("low",

"middle",

"high"), write

=

mean(ml$write))​

ses

write

1

low

52.775

2 middle   52.775

3

high

52.775

predict(test, newdata = dses,

"probs")

​academic

general  vocation

1 0.4396845 0.3581917 0.2021238

2 0.4777488 0.2283353 0.2939159

3 0.7009007 0.1784939 0.1206054

​Another way to understand

the model using the predicted probabilities is to look at the

averaged predicted probabilities for different values of the

continuous predictor

variablewritewithin

each level

ofses.

​dwrite

data.frame(ses = rep(c("low", "middle", "high"), each = 41), write

= rep(c(30:70),3))

​##

store the predicted probabilities for each value of ses and

write

pp.write

cbind(dwrite, predict(test, newdata = dwrite, type = "probs", se =

TRUE))

## calculate the mean probabilities within each level of

ses

​by(pp.write[,

3:5],

pp.write$ses,

colMeans)​

​pp.write$ses: high

academic   general

vocation

0.6164315 0.1808037 0.2027648

------------------------------------------------------------------------

pp.write$ses: low

academic   general

vocation

0.3972977 0.3278174 0.2748849

------------------------------------------------------------------------

pp.write$ses: middle

academic   general

vocation

0.4256198 0.2010864 0.3732938

​Sometimes, a couple of

plots can convey a good deal amount of information. Using the

predictions we generated for the pp.write object above, we can plot

the predicted probabilities against the writing score by the level

ofsesfor

different levels of the outcome variable.

​## melt data set to long

for ggplot2

​lpp

melt(pp.write, id.vars

=

c("ses",

"write"), value.name

=

"probability")

​head(lpp)

​ # view first few

rows

​ses     write

variable

probability

1 low    30

academic  0.09843588

2 low    31

academic  0.10716868

3 low    32

academic  0.11650390

4 low    33

academic  0.12645834

5 low    34

academic  0.13704576

6 low    35

academic  0.14827643

​## plot predicted

probabilities across write values for each level of ses

​## facetted by program

type

ggplot(lpp, aes(x = write, y = probability, colour =

ses)) + geom_line() + facet_grid(variable ~ ., scales =

"free")​

4、其他应用包及函数​

1)程序包mlogit提供了多项logit的模型拟合函数:

mlogit(formula, data, subset, weights,na.action, start =

NULL,

alt.subset = NULL, reflevel = NULL,

nests = NULL, un.nest.el = FALSE, unscaled = FALSE,

heterosc = FALSE, rpar = NULL, probit = FALSE,

R = 40, correlation = FALSE, halton = NULL,

random.nb = NULL, panel = FALSE, estimate = TRUE,

seed = 10, ...)​​

mlogit.data(data, choice, shape = c("wide","long"), varying =

NULL,

sep=".",alt.var = NULL, chid.var = NULL, alt.levels =

NULL,id.var = NULL, opposite = NULL, drop.index = FALSE,

ranked = FALSE, ...)

参数说明:

formula:mlogit提供了条件logit,多项logit,混合logit多种模型,对于多项logit的估计模型应写为:因变量~0|自变量,如:mode

~ 0|income

data:使用mlogit.data函数使得数据结构符合mlogit函数要求。

Choice:确定分类变量是什么

Shape:如果每一行是一个观测,我们选择wide,如果每一行是表示一个选择,那么就应该选择long。

alt.var:对于shape为long的数据,需要标明所有的选择名称​

例子:​

install.packages("mlogit")

library(Formula)

library(maxLik)

library(miscTools)

library(mlogit)​

data("Fishing", package = "mlogit")​

Fish

= "mode")​

summary(mlogit(mode ~ 0|income, data =

Fish))​

结果显示:​

Call:

mlogit(formula = mode ~ 0|income, data = Fish, method =

"nr",

print.level = 0)

Frequencies of alternatives:

beach

boat

charter

pier

0.11337

0.35364

0.38240

0.15059

nr method

4 iterations, 0h:0m:0s

g'(-H)^-1g = 8.32E-07

gradient close to zero

Coefficients :

Estimate Std. Error t-value Pr(>t)

boat:(intercept) 7.3892e-01 1.9673e-01 3.7560 0.0001727 ***

charter:(intercept) 1.3413e+00 1.9452e-01 6.8955 5.367e-12

***

pier:(intercept) 8.1415e-01 2.2863e-01 3.5610 0.0003695 ***

boat:income 9.1906e-05 4.0664e-05 2.2602 0.0238116 *

charter:income -3.1640e-05 4.1846e-05 -0.7561 0.4495908

pier:income -1.4340e-04 5.3288e-05 -2.6911 0.0071223 **

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’

1

Log-Likelihood: -1477.2

McFadden R^2: 0.013736

Likelihood ratio test : chisq = 41.145 (p.value =

6.0931e-09)

2)程序包MASS提供polr()函数可以进行ordered

logit或probit回归

polr(formula, data, weights, start, ..., subset, na.action,

contrasts = NULL, Hess = FALSE, model = TRUE,

method = c("logistic", "probit", "cloglog",

"cauchit"))​

参数说明:

Formula:回归形式,与lm()函数的formula参数用法一致

Data:数据集

Method:默认为order logit,选择probit时变为order

probit模型。​​

例子:​

library(MASS)

house.plr

weights = Freq, data = housing)

summary(house.plr, digits = 3)

结果显示:

Re-fitting to get Hessian

Call:

polr(formula = Sat ~ Infl + Type + Cont, data = housing, weights

= Freq)

Coefficients:

Value Std. Error t value

InflMedium 0.566 0.1047 5.41

InflHigh

1.289 0.1272 10.14

TypeApartment -0.572 0.1192 -4.80

TypeAtrium -0.366 0.1552 -2.36

TypeTerrace -1.091 0.1515 -7.20

ContHigh 0.360 0.0955 3.77

Intercepts:

Value Std. Error t value

Low|Medium -0.496 0.125 -3.974

Medium|High 0.691 0.125 5.505

Residual Deviance: 3479.149

AIC: 3495.149

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值