方差分析
# 配置stata17在jupyter 配置
import stata_setup
stata_setup.config(r"D:\Stata17\setup",'se')
1 单因素方差分析
from pystata import stata
# 查看单因素方差分析命令介绍
stata.run("help(oneway)")
stata.run("webuse apple")
(Apple trees)
# 描述统计
%stata sum
# 或者stata.run("sum")
Variable | Obs Mean Std. dev. Min Max
-------------+---------------------------------------------------------
treatment | 10 2.3 1.159502 1 4
weight | 10 80.62 25.36212 48.9 117.5
# 频率表
%stata tab weight treatment
# 或者stata.run("tab weight treatment")
Average |
weight in | Fertilizer
grams | 1 2 3 4 | Total
-----------+--------------------------------------------+----------
48.9 | 0 1 0 0 | 1
50.4 | 0 1 0 0 | 1
58.9 | 0 1 0 0 | 1
67.3 | 0 0 0 1 | 1
70.4 | 0 0 1 0 | 1
86.9 | 0 0 1 0 | 1
87.7 | 0 0 0 1 | 1
104.4 | 1 0 0 0 | 1
113.8 | 1 0 0 0 | 1
117.5 | 1 0 0 0 | 1
-----------+--------------------------------------------+----------
Total | 3 3 2 2 | 10
# 散点图
%stata scatter weight treatment
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-vjtjpRBH-1683949793570)(output_7_0.svg)]
方差分析原假设为
H 0 H_0 H0: treatment水平对 weight没有影响;
H 1 H_1 H1: treatment水平对 weight有影响;
构建F统计量
F
=
M
S
A
M
S
E
=
S
S
A
/
(
r
−
1
)
S
S
E
/
(
n
−
r
)
∼
F
(
r
−
1
,
n
−
r
)
\mathrm{F}=\frac{\mathrm{MSA}}{\mathrm{MSE}}=\frac{\mathrm{SSA} /(\mathrm{r}-1)}{\mathrm{SSE} /(\mathrm{n}-\mathrm{r})} \sim \mathrm{F}(\mathrm{r}-1, \mathrm{n}-\mathrm{r})
F=MSEMSA=SSE/(n−r)SSA/(r−1)∼F(r−1,n−r)
其中MSA表示组间均方差,包括组间差异与随机差异。MSE表示水平均均方差,当二者差距不大时,表明结果的差异性主要来随机性差异,反之为系统性差异(水平对结果存在干扰)
# 单因素方差分析
stata.run("oneway weight treatment")
Analysis of variance
Source SS df MS F Prob > F
------------------------------------------------------------------------
Between groups 5295.54433 3 1765.18144 21.46 0.0013
Within groups 493.591667 6 82.2652778
------------------------------------------------------------------------
Total 5789.136 9 643.237333
Bartlett's equal-variances test: chi2(3) = 1.3900 Prob>chi2 = 0.708
# 加入tabulate参数可以查看水平内weight均值和标准误等信息
stata.run("oneway weight treatment,tabulate")
| Summary of Average weight in grams
Fertilizer | Mean Std. dev. Freq.
------------+------------------------------------
1 | 111.9 6.7535176 3
2 | 52.733333 5.3928966 3
3 | 78.65 11.667262 2
4 | 77.5 14.424978 2
------------+------------------------------------
Total | 80.62 25.362124 10
Analysis of variance
Source SS df MS F Prob > F
------------------------------------------------------------------------
Between groups 5295.54433 3 1765.18144 21.46 0.0013
Within groups 493.591667 6 82.2652778
------------------------------------------------------------------------
Total 5789.136 9 643.237333
Bartlett's equal-variances test: chi2(3) = 1.3900 Prob>chi2 = 0.708
F值为21.46,p值非常小,说明treat对weight又影响。由于方差分析要保证同方差假定,最后结果表明equal-variance伴随概率为0.708不拒绝原假设,即同方差,结果可靠
2 多因素方差分析
# 查看anova用法
%stata help(anova)
多因素方差分析原假设为
H 0 H_0 H0: 水平A或B对 weight没有影响;
H 1 H_1 H1:水平A或B对 weight有影响;
构建F统计量
F
A
=
M
S
A
M
S
E
=
S
S
A
/
(
r
−
1
)
S
S
E
/
(
n
−
r
−
k
+
1
)
∼
F
(
r
−
1
,
n
−
r
−
k
+
1
)
F
B
=
M
S
B
M
S
E
=
S
S
B
/
(
k
−
1
)
S
S
E
/
(
n
−
r
−
k
+
1
)
∼
F
(
k
−
1
,
n
−
r
−
k
+
1
)
\begin{aligned} &\mathrm{F}_{\mathrm{A}}=\frac{\mathrm{MSA}}{\mathrm{MSE}}=\frac{\mathrm{SSA} /(\mathrm{r}-1)}{\mathrm{SSE} /(\mathrm{n}-\mathrm{r}-\mathrm{k}+1)} \sim \mathrm{F}(\mathrm{r}-1, \mathrm{n}-\mathrm{r}-\mathrm{k}+1) \\ &\mathrm{F}_{\mathrm{B}}=\frac{\mathrm{MSB}}{\mathrm{MSE}}=\frac{\mathrm{SSB} /(\mathrm{k}-1)}{\mathrm{SSE} /(\mathrm{n}-\mathrm{r}-\mathrm{k}+1)} \sim \mathrm{F}(\mathrm{k}-1, \mathrm{n}-\mathrm{r}-\mathrm{k}+1) \end{aligned}
FA=MSEMSA=SSE/(n−r−k+1)SSA/(r−1)∼F(r−1,n−r−k+1)FB=MSEMSB=SSE/(n−r−k+1)SSB/(k−1)∼F(k−1,n−r−k+1)
# 单个因素情形
stata.run("""
webuse systolic
anova systolic drug
""")
.
. webuse systolic
(Systolic blood pressure data)
. anova systolic drug
Number of obs = 58 R-squared = 0.3355
Root MSE = 10.7211 Adj R-squared = 0.2985
Source | Partial SS df MS F Prob>F
-----------+----------------------------------------------------
Model | 3133.2385 3 1044.4128 9.09 0.0001
|
drug | 3133.2385 3 1044.4128 9.09 0.0001
|
Residual | 6206.9167 54 114.9429
-----------+----------------------------------------------------
Total | 9340.1552 57 163.86237
.
# 双因素方差分析
%stata anova systolic drug disease
Number of obs = 58 R-squared = 0.3803
Root MSE = 10.5503 Adj R-squared = 0.3207
Source | Partial SS df MS F Prob>F
-----------+----------------------------------------------------
Model | 3552.0722 5 710.41445 6.38 0.0001
|
drug | 3063.4329 3 1021.1443 9.17 0.0001
disease | 418.83374 2 209.41687 1.88 0.1626
|
Residual | 5788.0829 52 111.30929
-----------+----------------------------------------------------
Total | 9340.1552 57 163.86237
# 交互项情形
%stata anova systolic drug##disease
Number of obs = 58 R-squared = 0.4560
Root MSE = 10.5096 Adj R-squared = 0.3259
Source | Partial SS df MS F Prob>F
-------------+----------------------------------------------------
Model | 4259.3385 11 387.21259 3.51 0.0013
|
drug | 2997.4719 3 999.15729 9.05 0.0001
disease | 415.87305 2 207.93652 1.88 0.1637
drug#disease | 707.26626 6 117.87771 1.07 0.3958
|
Residual | 5080.8167 46 110.45254
-------------+----------------------------------------------------
Total | 9340.1552 57 163.86237
# 三因素情形
stata.run("""
webuse manuf
anova yield temp chem temp#chem meth temp#meth chem#meth temp#chem#meth
""")
.
. webuse manuf
(Manufacturing process data)
. anova yield temp chem temp#chem meth temp#meth chem#meth temp#chem#meth
Number of obs = 36 R-squared = 0.5474
Root MSE = 2.62996 Adj R-squared = 0.3399
Source | Partial SS df MS F Prob>F
----------------------+----------------------------------------------------
Model | 200.75 11 18.25 2.64 0.0227
|
temperature | 30.5 2 15.25 2.20 0.1321
chemical | 12.25 1 12.25 1.77 0.1958
temperature#chemical | 24.5 2 12.25 1.77 0.1917
method | 42.25 1 42.25 6.11 0.0209
temperature#method | 87.5 2 43.75 6.33 0.0062
chemical#method | .25 1 .25 0.04 0.8508
temperature#chemical# |
method | 3.5 2 1.75 0.25 0.7785
|
Residual | 166 24 6.9166667
----------------------+----------------------------------------------------
Total | 366.75 35 10.478571
.
更多详情help(anova)