matlab mnrfit,Multinomial logistic regression

Fit a hierarchical multinomial regression model.

Load the sample data.

load('smoking.mat');

The data set smoking contains five variables: sex, age, weight, and systolic and diastolic blood pressure. Sex is a binary variable where 1 indicates female patients, and 0 indicates male patients.

Define the response variable.

Y = categorical(smoking.Smoker);

The data in Smoker has four categories:

0: Nonsmoker, 0 cigarettes a day

1: Smoker, 1–5 cigarettes a day

2: Smoker, 6–10 cigarettes a day

3: Smoker, 11 or more cigarettes a day

Define the predictor variables.

X = [smoking.Sex smoking.Age smoking.Weight...

smoking.SystolicBP smoking.DiastolicBP];

Fit a hierarchical multinomial model.

[B,dev,stats] = mnrfit(X,Y,'model','hierarchical');

B

B = 6×3

43.8148 5.9571 44.0712

1.8709 -0.0230 0.0662

0.0188 0.0625 0.1335

0.0046 -0.0072 -0.0130

-0.2170 0.0416 -0.0324

-0.2273 -0.1449 -0.4824

The first column of B includes the intercept and the coefficient estimates for the model of the relative risk of being a nonsmoker versus a smoker. The second column includes the parameter estimates for modeling the log odds of smoking 1–5 cigarettes a day versus more than five cigarettes a day given that a person is a smoker. Finally, the third column includes the parameter estimates for modeling the log odds of a person smoking 6–10 cigarettes a day versus more than 10 cigarettes a day given he/she smokes more than 5 cigarettes a day.

The coefficients differ across categories. You can specify this using the 'interactions','on' name-value pair argument, which is the default for hierarchical models. So, the model in this example is

ln(P(y=0)P(y>0))=43.8148+1.8709XS+0.0188XA+0.0046XW-0.2170XSBP-0.2273XDBP

ln(P(1≤y≤5)P(y>5))=5.9571-0.0230XS+0.0625XA-0.0072XW+0.0416XSBP-0.1449XDBP

ln(P(6≤y≤10)P(y>10))=44.0712+0.0662XS+0.1335XA-0.0130XW-0.0324XSBP-0.4824XDBP

For example, the coefficient estimate of 1.8709 indicates that the likelihood of being a smoker versus a nonsmoker increases by exp(1.8709) = 6.49 times as the gender changes from female to male given everything else held constant.

Assess the statistical significance of the terms.

stats.p

ans = 6×3

0.0000 0.5363 0.2149

0.3549 0.9912 0.9835

0.6850 0.2676 0.2313

0.9032 0.8523 0.8514

0.0009 0.5187 0.8165

0.0004 0.0483 0.0545

Sex, age, or weight don’t appear significant on any level. The p-values of 0.0009 and 0.0004 indicate that both types of blood pressure are significant on the relative risk of a person being a smoker versus a nonsmoker. The p-value of 0.0483 shows that only diastolic blood pressure is significant on the odds of a person smoking 0–5 cigarettes a day versus more than 5 cigarettes a day. Similarly, the p-value of 0.0545 indicates that diastolic blood pressure is significant on the odds of a person smoking 6–10 cigarettes a day versus more than 10 cigarettes a day.

Check if any nonsignificant factors are correlated to each other. Draw a scatterplot of age versus weight grouped by sex.

figure()

gscatter(smoking.Age,smoking.Weight,smoking.Sex)

legend('Male','Female')

xlabel('Age')

ylabel('Weight')

fc337a9f035d0bfd084768dfaca4c9e8.png

The range of weight of an individual seems to differ according to gender. Age does not seem to have any obvious correlation with sex or weight. Age is insignificant and weight seems to be correlated with sex, so you can eliminate both and reconstruct the model.

Eliminate age and weight from the model and fit a hierarchical model with sex, systolic blood pressure, and diastolic blood pressure as the predictor variables.

X = double([smoking.Sex smoking.SystolicBP...

smoking.DiastolicBP]);

[B,dev,stats] = mnrfit(X,Y,'model','hierarchical');

B

B = 4×3

44.8456 5.3230 25.0248

1.6045 0.2330 0.4982

-0.2161 0.0497 0.0179

-0.2222 -0.1358 -0.3092

Here, a coefficient estimate of 1.6045 indicates that the likelihood of being a nonsmoker versus a smoker increases by exp(1.6045) = 4.97 times as sex changes from male to female. A unit increase in the systolic blood pressure indicates an exp(–.2161) = 0.8056 decrease in the likelihood of being a nonsmoker versus a smoker. Similarly, a unit increase in the diastolic blood pressure indicates an exp(–.2222) = 0.8007 decrease in the relative rate of being a nonsmoker versus being a smoker.

Assess the statistical significance of the terms.

stats.p

ans = 4×3

0.0000 0.4715 0.2325

0.0210 0.7488 0.6362

0.0010 0.4107 0.8899

0.0003 0.0483 0.0718

The p-values of 0.0210, 0.0010, and 0.0003 indicate that the terms sex and both types of blood pressure are significant on the relative risk of a person being a nonsmoker versus a smoker, given the other terms in the model. Based on the p-value of 0.0483, diastolic blood pressure appears significant on the relative risk of a person smoking 1–5 cigarettes versus more than 5 cigarettes a day, given that this person is a smoker. Because none of the p-values on the third column are less than 0.05, you can say that none of the variables are statistically significant on the relative risk of a person smoking from 6–10 cigarettes versus more than 10 cigarettes, given that this person smokes more than 5 cigarettes a day.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值