matlab mnrfit,Multinomial logistic regression

最新推荐文章于 2021-04-09 05:26:34 发布

GarryTonon

最新推荐文章于 2021-04-09 05:26:34 发布

阅读量365

点赞数

文章标签： matlab mnrfit

Fit a hierarchical multinomial regression model.

Load the sample data.

load('smoking.mat');

The data set smoking contains five variables: sex, age, weight, and systolic and diastolic blood pressure. Sex is a binary variable where 1 indicates female patients, and 0 indicates male patients.

Define the response variable.

Y = categorical(smoking.Smoker);

The data in Smoker has four categories:

0: Nonsmoker, 0 cigarettes a day

1: Smoker, 1–5 cigarettes a day

2: Smoker, 6–10 cigarettes a day

3: Smoker, 11 or more cigarettes a day

Define the predictor variables.

X = [smoking.Sex smoking.Age smoking.Weight...

smoking.SystolicBP smoking.DiastolicBP];

Fit a hierarchical multinomial model.

[B,dev,stats] = mnrfit(X,Y,'model','hierarchical');

B = 6×3

43.8148 5.9571 44.0712

1.8709 -0.0230 0.0662

0.0188 0.0625 0.1335

0.0046 -0.0072 -0.0130

-0.2170 0.0416 -0.0324

-0.2273 -0.1449 -0.4824

The first column of B includes the intercept and the coefficient estimates for the model of the relative risk of being a nonsmoker versus a smoker. The second column includes the parameter estimates for modeling the log odds of smoking 1–5 cigarettes a day versus more than five cigarettes a day given that a person is a smoker. Finally, the third column includes the parameter estimates for modeling the log odds of a person smoking 6–10 cigarettes a day versus more than 10 cigarettes a day given he/she smokes more than 5 cigarettes a day.

The coefficients differ across categories. You can specify this using the 'interactions','on' name-value pair argument, which is the default for hierarchical models. So, the model in this example is

ln(P(y=0)P(y>0))=43.8148+1.8709XS+0.0188XA+0.0046XW-0.2170XSBP-0.2273XDBP

ln(P(1≤y≤5)P(y>5))=5.9571-0.0230XS+0.0625XA-0.0072XW+0.0416XSBP-0.1449XDBP

ln(P(6≤y≤10)P(y>10))=44.0712+0.0662XS+0.1335XA-0.0130XW-0.0324XSBP-0.4824XDBP

For example, the coefficient estimate of 1.8709 indicates that the likelihood of being a smoker versus a nonsmoker increases by exp(1.8709) = 6.49 times as the gender changes from female to male given everything else held constant.

Assess the statistical significance of the terms.

stats.p

ans = 6×3

0.0000 0.5363 0.2149

0.3549 0.9912 0.9835

0.6850 0.2676 0.2313

0.9032 0.8523 0.8514

0.0009 0.5187 0.8165

0.0004 0.0483 0.0545

Sex, age, or weight don’t appear significant on any level. The p-values of 0.0009 and 0.0004 indicate that both types of blood pressure are significant on the relative risk of a person being a smoker versus a nonsmoker. The p-value of 0.0483 shows that only diastolic blood pressure is significant on the odds of a person smoking 0–5 cigarettes a day versus more than 5 cigarettes a day. Similarly, the p-value of 0.0545 indicates that diastolic blood pressure is significant on the odds of a person smoking 6–10 cigarettes a day versus more than 10 cigarettes a day.

Check if any nonsignificant factors are correlated to each other. Draw a scatterplot of age versus weight grouped by sex.

figure()

gscatter(smoking.Age,smoking.Weight,smoking.Sex)

legend('Male','Female')

xlabel('Age')

ylabel('Weight')

The range of weight of an individual seems to differ according to gender. Age does not seem to have any obvious correlation with sex or weight. Age is insignificant and weight seems to be correlated with sex, so you can eliminate both and reconstruct the model.

Eliminate age and weight from the model and fit a hierarchical model with sex, systolic blood pressure, and diastolic blood pressure as the predictor variables.

X = double([smoking.Sex smoking.SystolicBP...

smoking.DiastolicBP]);

[B,dev,stats] = mnrfit(X,Y,'model','hierarchical');

B = 4×3

44.8456 5.3230 25.0248

1.6045 0.2330 0.4982

-0.2161 0.0497 0.0179

-0.2222 -0.1358 -0.3092

Here, a coefficient estimate of 1.6045 indicates that the likelihood of being a nonsmoker versus a smoker increases by exp(1.6045) = 4.97 times as sex changes from male to female. A unit increase in the systolic blood pressure indicates an exp(–.2161) = 0.8056 decrease in the likelihood of being a nonsmoker versus a smoker. Similarly, a unit increase in the diastolic blood pressure indicates an exp(–.2222) = 0.8007 decrease in the relative rate of being a nonsmoker versus being a smoker.

Assess the statistical significance of the terms.

stats.p

ans = 4×3

0.0000 0.4715 0.2325

0.0210 0.7488 0.6362

0.0010 0.4107 0.8899

0.0003 0.0483 0.0718

The p-values of 0.0210, 0.0010, and 0.0003 indicate that the terms sex and both types of blood pressure are significant on the relative risk of a person being a nonsmoker versus a smoker, given the other terms in the model. Based on the p-value of 0.0483, diastolic blood pressure appears significant on the relative risk of a person smoking 1–5 cigarettes versus more than 5 cigarettes a day, given that this person is a smoker. Because none of the p-values on the third column are less than 0.05, you can say that none of the variables are statistically significant on the relative risk of a person smoking from 6–10 cigarettes versus more than 10 cigarettes, given that this person smokes more than 5 cigarettes a day.

GarryTonon

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
matlab mnrfit,Multinomial logistic regression

Fit a hierarchical multinomial regression model.Load the sample data.load('smoking.mat');The data set smoking contains five variables: sex, age, weight, and systolic and diastolic blood pressure. Sex ...
复制链接

扫一扫