用于多个比较的方差分析（ANOVA）

最新推荐文章于 2024-08-30 01:00:00 发布

timothyzh

最新推荐文章于 2024-08-30 01:00:00 发布

阅读量1.3w

点赞数

分类专栏：概率统计文章标签： null c each

概率统计专栏收录该内容

16 篇文章

订阅专栏

Source: http://www.r-bloggers.com/analysis-of-variance-anova-for-multiple-comparisons/

用于多个比较的方差分析（ANOVA：Analysis of variance）

ANOVA模型能用于比较多个群组之间的均值，这里使用了参数（parametric）的方法，也就是假设这些群组符合Gaussian分布。以下为例子：

----------------------------------------------------

超市连锁店的经理想看看4个店面的耗电量（千瓦）是否相等。他在每个月底收集数据，持续了6个月，结果如下：

Store A: 65, 48, 66, 75, 70, 55
Store B: 64, 44, 70, 70, 68, 59
Store C: 60, 50, 65, 69, 69, 57
Store D: 62, 46, 68, 72, 67, 56

为了使用ANOVA来验证，我们必须首先验证homoskedasticity，也就是方差的同质性检验。R软件提供了两种检验方法：Bartlett检验，和Fligner-Killeen检验。

---------------------------------------------------

我们先看Bartlett检验。首先我们创建4个向量，然后再组合成一个向量：

a = c(65, 48, 66, 75, 70, 55)
b = c(64, 44, 70, 70, 68, 59)
c = c(60, 50, 65, 69, 69, 57)
d = c(62, 46, 68, 72, 67, 56)
dati = c(a, b, c, d)

另外我们再创建一个对应用于标示dati分组的4个水平的factor：

groups = factor(rep(letters[1:4], each = 6))

这样我们就可以进行Bartlett test了：

bartlett.test(dati, groups)

        Bartlett test of homogeneity of variances

data:  dati and groups 
Bartlett's K-squared = 0.4822, df = 3, p-value = 0.9228

这个函数得到了统计检验的值（K squared）和p-value。因为p-value > 0.05，所以我们可以说这些组的方差是同质的。另一方面，我们也可以比较Barlett的K-squared和查表的chi-square值，使用函数qchisq，其输入包括alpha值和自由度

qchisq(0.950, 3)
[1] 7.814728

显然，这里的chi-squared 大于上面计算的Bartlett的K-squared，因此我们接受null hypothesis H0，即方差都是同质的。

-------------------------------------------------------------------

现在我们试着用Fligner-Killeen test来检测同质性。调用函数的方法和过程都类似：

a = c(65, 48, 66, 75, 70, 55)
b = c(64, 44, 70, 70, 68, 59)
c = c(60, 50, 65, 69, 69, 57)
d = c(62, 46, 68, 72, 67, 56)

dati = c(a, b, c, d)

groups = factor(rep(letters[1:4], each = 6))

fligner.test(dati, groups)

        Fligner-Killeen test of homogeneity of variances

data:  dati and groups 
Fligner-Killeen:med chi-squared = 0.1316, df = 3, p-value = 0.9878

这里的结论也与Bartlett test类似。

----------------------------------------------------------------------------------

已验证了4个群组的同质性，我们就可以来处理ANOVA模型了。首先拟合模型：

fit = lm(formula = dati ~ groups)

然后分析ANOVA模型：

anova (fit)

Analysis of Variance Table

Response: dati
          Df  Sum Sq Mean Sq F value Pr(>F)
groups     3    8.46    2.82  0.0327 0.9918
Residuals 20 1726.50   86.33

函数的输出为经典的ANOVA表，数据如下：

Df = degree of freedom，自由度
Sum Sq = deviance (within groups, and residual)，总方差和（分别有groups和residual的）
Mean Sq = variance (within groups, and residual)，平均方差和（分别有groups和residual的）
F value = the value of the Fisher statistic test, so computed (variance within groups) / (variance residual)，统计检验的值
Pr(>F) = p-value

因为p-value大于0.05，我们接受null hypothesis H0，即4个样本的均值统计相等。我们也可以比较计算的F-vaue和查表的F-value：