logit回归模型假设_Logistic回归模型及应用建模（二）

最新推荐文章于 2024-06-25 18:12:04 发布

司梦化虚

最新推荐文章于 2024-06-25 18:12:04 发布

阅读量3.4k

点赞数 1

文章标签： logit回归模型假设

本文链接：https://blog.csdn.net/weixin_42474164/article/details/112876857

版权

本文介绍了如何使用R中的nnet包建立多元Logistic回归模型，以进入高中学生的计划选择为例，分析了写作成绩和社会经济地位对选择的影响。通过模型，我们可以观察不同变量对不同结果的概率影响，并通过预测概率来理解模型。

摘要由CSDN通过智能技术生成

二、因变量多分类 logistic

回归

1、概述：多元Logistic回归模型被用来建立有多个输出变量的模型，且这些预测变量通过一个线性组合变成为一个最终的预测变量。Multinomial

Logistic 回归模型中因变量可以取多个值.

所需的应用包：

library(foreign)

library(nnet)

library(ggplot2)

library(reshape2)

2、建模例子：

Example . Entering high school students make program

choices among general program, vocational program and academic

program. Their choice might be modeled using their writing score

and their social economic

status.

例子：进入高中的学生在做计划选择时，面临着一般计划、职业计划及学术计划三种选择。他们的选择往往是通过对他们的写作成绩及社会经济地位进行模型化决定。

1)读取数据：

read.dta("http://www.ats.ucla.edu/stat/data/hsbdemo.dta")

这个数据集包含了有200个学生的记录，结果变量为prog(program

type)三种项目类型，预测变量为ses(social economic

status)社会经济地位，3个分类变量(id,female,schtyp),及其他连续变量。

2)数据初步探索：

with(ml,

table(ses, prog))

prog

ses

general

academic vocation

low

middle

high

按照prog分组统计写作得分的均值及标准方差

with(ml, do.call(rbind, tapply(write, prog, function(x) c(M =

mean(x), SD = sd(x)))))

general 51.33333 9.397775

academic 56.25714 7.943343

vocation 46.76000 9.318754

3)建立模型：

利用nnet包中的函数multinom，建立多元logistic回归模型：

Before

running our model. We then choose the level of our outcome that we

wish to use as our baseline and specify this in

therelevelfunction.

Then, we run our model usingmultinom.

Themultinompackage

does not include p-value calculation for the regression

coefficients, so we calculate p-values using Wald tests (here

z-tests).

在建立模型前，我们选择输出变量的水平，并利用函数relevel设置prog为哑变量(虚拟变量)，以academic作为参考水平：

ml$prog2

"academic")

test

~ ses + write, data = ml)

summary(test)

结果显示：

Call:

multinom(formula = prog2 ~ ses + write, data =

ml)

Coefficients:

(Intercept) sesmiddle

seshigh

write

general 2.852198

-0.5332810 -1.1628226 -0.0579287

vocation 5.218260

0.2913859 -0.9826649 -0.1136037

Std. Errors:

(Intercept) sesmiddle seshigh

write

general 1.166441

0.4437323 0.5142196 0.02141097

vocation 1.163552

0.4763739 0.5955665 0.02221996

Residual Deviance: 359.9635

AIC: 375.9635

summary(test)$coefficients/summary(test)$standard.errors

(Intercept) sesmiddle

seshigh

write

general 2.445214

-1.2018081 -2.261334 -2.705562

vocation 4.484769

0.6116747 -1.649967 -5.112689

2-tailed z

test

pnorm(abs(z),0, 1)) * 2

(Intercept)

sesmiddle

seshigh

write

general 0.0144766100 0.2294379

0.02373856 6.818902e-03

vocation 0.0000072993 0.5407530 0.09894976

3.176045e-07

The model summary output

has a block of coefficients and a block of standard errors.

模型的结果输出为一系列的系数及相应的标准差，每一行参数都反应了一个模型的等式。Each of

these blocks has one row of values corresponding to a model

equation. Focusing on the block of coefficients, we can

look at the first row

comparing prog =

"general"to our

baselineprog =

"academic"and the second row

comparingprog =

"vocation"to our

baselineprog =

"academic". If we consider our coefficients from the

first row to be b_1 and our coefficients from the second row to be

b_2, we can write our model equations:

\[ln\left(\frac{P(prog=general)}{P(prog=academic)}\right)

= b_{10} + b_{11}(ses=2) + b_{12}(ses=3) +

b_{13}write\]

模型公式一

\[ln\left(\frac{P(prog=vocation)}{P(prog=academic)}\right)

= b_{20} + b_{21}(ses=2) + b_{22}(ses=3) + b_{23}write\]

模型公式二

## extract the

coefficients from the model and exponentiate

exp(coef(test))

(Intercept) sesmiddle

seshigh

write

general 17.32582

0.5866769 0.3126026 0.9437172

vocation 184.61262 1.3382809 0.3743123

0.8926116

解释：

The relative risk ratio for a one-unit

increase in the

variable write is

.9437 for being in general program vs. academic

program.

The relative risk ratio switching

from ses= 1 to 3 is .3126 for being

in general program vs. academic program.

You can also use predicted

probabilities to help you understand the model.

You can calculate predicted probabilities for

each of our outcome levels using

thefittedfunction.

We can start by generating the predicted probabilities for the

observations in our dataset and viewing the first few

rows

head(pp

academic

general vocation

1 0.1482764 0.3382454 0.5134781

2 0.1202017 0.1806283 0.6991700

3 0.4186747 0.2368082 0.3445171

4 0.1726885 0.3508384 0.4764731

5 0.1001231 0.1689374 0.7309395

6 0.3533566 0.2377976 0.4088458

Next, if we want to examine the changes in predicted

probability associated with one of our two variables, we can create

small datasets varying one variable while holding the other

constant. We will first do this

holdingwriteat

its mean and examining the predicted probabilities for each level

ofses.

dses

data.frame(ses

c("low",

"middle",

"high"), write

mean(ml$write))

ses

write

low

52.775

2 middle 52.775

high

52.775

predict(test, newdata = dses,

"probs")

academic

general vocation

1 0.4396845 0.3581917 0.2021238

2 0.4777488 0.2283353 0.2939159

3 0.7009007 0.1784939 0.1206054

Another way to understand

the model using the predicted probabilities is to look at the

averaged predicted probabilities for different values of the

continuous predictor

variablewritewithin

each level

ofses.

dwrite

data.frame(ses = rep(c("low", "middle", "high"), each = 41), write

= rep(c(30:70),3))

store the predicted probabilities for each value of ses and

write

pp.write

cbind(dwrite, predict(test, newdata = dwrite, type = "probs", se =

TRUE))

## calculate the mean probabilities within each level of

ses

by(pp.write[,

3:5],

pp.write$ses,

colMeans)

pp.write$ses: high

academic general

vocation

0.6164315 0.1808037 0.2027648

------------------------------------------------------------------------

pp.write$ses: low

academic general

vocation

0.3972977 0.3278174 0.2748849

------------------------------------------------------------------------

pp.write$ses: middle

academic general

vocation

0.4256198 0.2010864 0.3732938

Sometimes, a couple of

plots can convey a good deal amount of information. Using the

predictions we generated for the pp.write object above, we can plot

the predicted probabilities against the writing score by the level

ofsesfor

different levels of the outcome variable.

## melt data set to long

for ggplot2

lpp

melt(pp.write, id.vars

c("ses",

"write"), value.name

"probability")

head(lpp)

# view first few

rows

ses write

variable

probability

1 low 30

academic 0.09843588

2 low 31

academic 0.10716868

3 low 32

academic 0.11650390

4 low 33

academic 0.12645834

5 low 34

academic 0.13704576

6 low 35

academic 0.14827643

## plot predicted

probabilities across write values for each level of ses

## facetted by program

type

ggplot(lpp, aes(x = write, y = probability, colour =

ses)) + geom_line() + facet_grid(variable ~ ., scales =

"free")

4、其他应用包及函数

1)程序包mlogit提供了多项logit的模型拟合函数：

mlogit(formula, data, subset, weights,na.action, start =

NULL,

alt.subset = NULL, reflevel = NULL,

nests = NULL, un.nest.el = FALSE, unscaled = FALSE,

heterosc = FALSE, rpar = NULL, probit = FALSE,

R = 40, correlation = FALSE, halton = NULL,

random.nb = NULL, panel = FALSE, estimate = TRUE,

seed = 10, ...)

mlogit.data(data, choice, shape = c("wide","long"), varying =

NULL,

sep=".",alt.var = NULL, chid.var = NULL, alt.levels =

NULL,id.var = NULL, opposite = NULL, drop.index = FALSE,

ranked = FALSE, ...)

参数说明：

formula：mlogit提供了条件logit，多项logit，混合logit多种模型，对于多项logit的估计模型应写为：因变量~0|自变量,如：mode

~ 0|income

data：使用mlogit.data函数使得数据结构符合mlogit函数要求。

Choice：确定分类变量是什么

Shape：如果每一行是一个观测，我们选择wide，如果每一行是表示一个选择，那么就应该选择long。

alt.var：对于shape为long的数据，需要标明所有的选择名称

例子：

install.packages("mlogit")

library(Formula)

library(maxLik)

library(miscTools)

library(mlogit)

data("Fishing", package = "mlogit")

Fish

= "mode")

summary(mlogit(mode ~ 0|income, data =

Fish))

结果显示：

Call:

mlogit(formula = mode ~ 0|income, data = Fish, method =

"nr",

print.level = 0)

Frequencies of alternatives:

beach

boat

charter

pier

0.11337

0.35364

0.38240

0.15059

nr method

4 iterations, 0h:0m:0s

g'(-H)^-1g = 8.32E-07

gradient close to zero

Coefficients :

Estimate Std. Error t-value Pr(>t)

boat:(intercept) 7.3892e-01 1.9673e-01 3.7560 0.0001727 ***

charter:(intercept) 1.3413e+00 1.9452e-01 6.8955 5.367e-12

***

pier:(intercept) 8.1415e-01 2.2863e-01 3.5610 0.0003695 ***

boat:income 9.1906e-05 4.0664e-05 2.2602 0.0238116 *

charter:income -3.1640e-05 4.1846e-05 -0.7561 0.4495908

pier:income -1.4340e-04 5.3288e-05 -2.6911 0.0071223 **

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’

Log-Likelihood: -1477.2

McFadden R^2: 0.013736

Likelihood ratio test : chisq = 41.145 (p.value =

6.0931e-09)

2)程序包MASS提供polr()函数可以进行ordered

logit或probit回归

polr(formula, data, weights, start, ..., subset, na.action,

contrasts = NULL, Hess = FALSE, model = TRUE,

method = c("logistic", "probit", "cloglog",

"cauchit"))

参数说明：

Formula：回归形式，与lm()函数的formula参数用法一致

Data：数据集

Method：默认为order logit，选择probit时变为order

probit模型。

例子：

library(MASS)

house.plr

weights = Freq, data = housing)

summary(house.plr, digits = 3)

结果显示：

Re-fitting to get Hessian

Call:

polr(formula = Sat ~ Infl + Type + Cont, data = housing, weights

= Freq)

Coefficients:

Value Std. Error t value

InflMedium 0.566 0.1047 5.41

InflHigh

1.289 0.1272 10.14

TypeApartment -0.572 0.1192 -4.80

TypeAtrium -0.366 0.1552 -2.36

TypeTerrace -1.091 0.1515 -7.20

ContHigh 0.360 0.0955 3.77

Intercepts:

Value Std. Error t value

Low|Medium -0.496 0.125 -3.974

Medium|High 0.691 0.125 5.505

Residual Deviance: 3479.149

AIC: 3495.149

司梦化虚

关注

1
点赞
踩
4

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫