python混合线性模型_如何在Python statsmodels线性混合效果模型中具有多个组？-CSDN博客

本文链接：https://blog.csdn.net/weixin_29749887/article/details/111961043

博主在尝试使用Python的statsmodels库构建一个包含两个随机截距（主题组和场景组）的线性混合效果模型。然而，遇到了奇异矩阵错误。该问题在R的lme4包中可以正常运行。目前statsmodels不支持交叉随机效应，只能处理嵌套的随机效应。解决方案包括调整为嵌套模型或者使用R的lme4库进行分析。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

I am trying to use the Python statsmodels linear mixed effects model to fit a model that has two random intercepts, e.g. two groups. I cannot figure out how to initialize the model so that I can do this.

Here's the example. I have data that looks like the following (taken from here):

subject gender scenario attitude frequency

F1 F 1 pol 213.3

F1 F 1 inf 204.5

F1 F 2 pol 285.1

F1 F 2 inf 259.7

F1 F 3 pol 203.9

F1 F 3 inf 286.9

F1 F 4 pol 250.8

F1 F 4 inf 276.8

I want to make a linear mixed effects model with two random effects -- one for the subject group and one for the scenario group. I am trying to do this:

import statsmodels.api as sm

model = sm.MixedLM.from_formula("frequency ~ attitude + gender", data, groups=data[['subject', 'scenario']])

result = model.fit()

print result.summary()

I keep getting this error:

LinAlgError: Singular matrix

It works fine in R. When I use lme4 in R with the formula-based rendering it fits just fine:

politeness.model = lmer(frequency ~ attitude + gender +

(1|subject) + (1|scenario), data=politeness)

I don't understand why this is happening. It works when I use any one random effect/group, e.g.

model = sm.MixedLM.from_formula("frequency ~ attitude + gender", data, groups=data['subject'])

Then I get:

Mixed Linear Model Regression Results

===============================================================

Model: MixedLM Dependent Variable: frequency

No. Observations: 83 Method: REML

No. Groups: 6 Scale: 850.9456

Min. group size: 13 Likelihood: -393.3720

Max. group size: 14 Converged: Yes

Mean group size: 13.8

---------------------------------------------------------------

Coef. Std.Err. z P>|z| [0.025 0.975]

---------------------------------------------------------------

Intercept 256.785 15.226 16.864 0.000 226.942 286.629

attitude[T.pol] -19.415 6.407 -3.030 0.002 -31.972 -6.858

gender[T.M] -108.325 21.064 -5.143 0.000 -149.610 -67.041

Intercept RE 603.948 23.995

===============================================================

Alternatively, if I do:

model = sm.MixedLM.from_formula("frequency ~ attitude + gender", data, groups=data['scenario'])

This is the result I get:

Mixed Linear Model Regression Results

================================================================

Model: MixedLM Dependent Variable: frequency

No. Observations: 83 Method: REML

No. Groups: 7 Scale: 1110.3788

Min. group size: 11 Likelihood: -402.5003

Max. group size: 12 Converged: Yes

Mean group size: 11.9

----------------------------------------------------------------

Coef. Std.Err. z P>|z| [0.025 0.975]

----------------------------------------------------------------

Intercept 256.892 8.120 31.637 0.000 240.977 272.807

attitude[T.pol] -19.807 7.319 -2.706 0.007 -34.153 -5.462

gender[T.M] -108.603 7.319 -14.838 0.000 -122.948 -94.257

Intercept RE 182.718 5.502

================================================================

I have no idea what's going on. I feel like I am missing something foundational in the statistics of the problem.

解决方案

You are trying to fit a model with crossed random effects, i.e., you want to allow for consistent variation among subjects across scenarios as well as consistent variation among scenarios across subjects. You can use multiple random-effects terms in statsmodels, but they must be nested. Fitting crossed (as opposed to nested) random effects requires more sophisticated algorithms, and indeed the statsmodels documentation says (as of 25 Aug 2016, emphasis added):

Some limitations of the current implementation are that it does not support structure more complex on the residual errors (they are always homoscedastic), and it does not support crossed random effects. We hope to implement these features for the next release.

As far as I can see, your choices are (1) fall back to a nested model (i.e. fit the model as though either scenario is nested within subject or vice versa - or try both and see if the difference matters); (2) fall back to lme4, either within R or via rpy2.

As always, you're entitled to a full refund of the money you paid to use statsmodels ...