Degrees of freedom

最新推荐文章于 2024-04-21 11:36:12 发布

chiechie

最新推荐文章于 2024-04-21 11:36:12 发布

阅读量1.3k

点赞数

分类专栏：统计学文章标签：算法

本文链接：https://blog.csdn.net/chiechie/article/details/48174869

版权

统计学专栏收录该内容

1 篇文章 0 订阅

订阅专栏

Degrees of freedom in linear models[edit]
The demonstration of the t and chi-squared distributions for one-sample problems above is the simplest example where degrees-of-freedom arise. However, similar geometry and vector decompositions underlie much of the theory of linear models, including linear regression and analysis of variance. An explicit example based on comparison of three means is presented here; the geometry of linear models is discussed in more complete detail by Christensen (2002).[4]

Suppose independent observations are made for three populations, $X_1,\ldots,X_n$ , $Y_1,\ldots,Y_n$ and $Z_1,\ldots,Z_n$ . The restriction to three groups and equal sample sizes simplifies notation, but the ideas are easily generalized.

The observations can be decomposed as

X i Y i Z i = M ¯ + (X ¯ - M ¯) + (X i - X ¯) = M ¯ + (Y ¯ - M ¯) + (Y i - Y ¯) = M ¯ + (Z ¯ - M ¯) + (Z i - Z ¯)

$\begin{align} X_i &= \bar{M} + (\bar{X}-\bar{M}) + (X_i-\bar{X})\\ Y_i &= \bar{M} + (\bar{Y}-\bar{M}) + (Y_i-\bar{Y})\\ Z_i &= \bar{M} + (\bar{Z}-\bar{M}) + (Z_i-\bar{Z}) \end{align}$
where

X¯,Y¯,Z¯ $\bar{X}, \bar{Y}, \bar{Z}$ are the means of the individual samples, and

M¯=(X¯+Y¯+Z¯)/3 $\bar{M}=(\bar{X}+\bar{Y}+\bar{Z})/3$ is the mean of all 3n observations. In vector notation this decomposition can be written as

⎛ ⎝ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ X 1 ⋮ X n Y 1 ⋮ Y n Z 1 ⋮ Z n ⎞ ⎠ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟

$\begin{pmatrix} X_1 \\ \vdots \\ X_n \\ Y_1 \\ \vdots \\ Y_n \\ Z_1 \\ \vdots \\ Z_n \end{pmatrix}$
=

M¯ $\bar{M}$

⎛ ⎝ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ 1 ⋮ 11 ⋮ 11 ⋮ 1 ⎞ ⎠ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟

$\begin{pmatrix}1 \\ \vdots \\ 1 \\ 1 \\ \vdots \\ 1 \\ 1 \\ \vdots \\ 1 \end{pmatrix}$
+

⎛ ⎝ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ X ¯ - M ¯ ⋮ X ¯ - M ¯ Y ¯ - M ¯ ⋮ Y ¯ - M ¯ Z ¯ - M ¯ ⋮ Z ¯ - M ¯ ⎞ ⎠ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟

$\begin{pmatrix}\bar{X}-\bar{M}\\ \vdots \\ \bar{X}-\bar{M} \\ \bar{Y}-\bar{M}\\ \vdots \\ \bar{Y}-\bar{M} \\ \bar{Z}-\bar{M}\\ \vdots \\ \bar{Z}-\bar{M} \end{pmatrix}$
+

⎛ ⎝ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ X 1 - X ¯ ⋮ X n - X ¯ Y 1 - Y ¯ ⋮ Y n - Y ¯ Z 1 - Z ¯ ⋮ Z n - Z ¯ ⎞ ⎠ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟

$\begin{pmatrix} X_1-\bar{X} \\ \vdots \\ X_n-\bar{X} \\ Y_1-\bar{Y} \\ \vdots \\ Y_n-\bar{Y} \\ Z_1-\bar{Z} \\ \vdots \\ Z_n-\bar{Z} \end{pmatrix}$ .
The observation vector, on the left-hand side, has 3n degrees of freedom. On the right-hand side, the first vector has one degree of freedom (or dimension) for the overall mean. The second vector depends on three random variables,

X¯−M¯,Y¯−M¯ $\bar{X}-\bar{M}, \bar{Y}-\bar{M}$ and

Z⎯⎯⎯−M⎯⎯⎯⎯ $\overline{Z}-\overline{M}$ . However, these must sum to 0 and so are constrained; the vector therefore must lie in a 2-dimensional subspace, and has 2 degrees of freedom. The remaining 3n − 3 degrees of freedom are in the residual vector (made up of n − 1 degrees of freedom within each of the populations).

Sum of squares and degrees of freedom

In statistical testing problems, one usually isn’t interested in the component vectors themselves, but rather in their squared lengths, or Sum of Squares. The degrees of freedom associated with a sum-of-squares is the degrees-of-freedom of the corresponding component vectors.

The three-population example above is an example of one-way Analysis of Variance. The model, or treatment, sum-of-squares is the squared length of the second vector,

$\text{SSTr} = n(\bar{X}-\bar{M})^2 + n(\bar{Y}-\bar{M})^2 + n(\bar{Z}-\bar{M})^2$

with 2 degrees of freedom. The residual, or error, sum-of-squares is
$\text{SSE} = \sum_{i=1}^n (X_i-\bar{X})^2 + \sum_{i=1}^n (Y_i-\bar{Y})^2 + \sum_{i=1}^n (Z_i-\bar{Z})^2$
with 3(n−1) degrees of freedom. Of course, introductory books on ANOVA usually state formulae without showing the vectors, but it is this underlying geometry that gives rise to SS formulae, and shows how to unambiguously determine the degrees of freedom in any given situation.

Under the null hypothesis of no difference between population means (and assuming that standard ANOVA regularity assumptions are satisfied) the sums of squares have scaled chi-squared distributions, with the corresponding degrees of freedom. The F-test statistic is the ratio, after scaling by the degrees of freedom. If there is no difference between population means this ratio follows an F distribution with 2 and 3n − 3 degrees of freedom.

In some complicated settings, such as unbalanced split-plot designs, the sums-of-squares no longer have scaled chi-squared distributions. Comparison of sum-of-squares with degrees-of-freedom is no longer meaningful, and software may report certain fractional ‘degrees of freedom’ in these cases. Such numbers have no genuine degrees-of-freedom interpretation, but are simply providing an approximate chi-squared distribution for the corresponding sum-of-squares. The details of such approximations are beyond the scope of this page.

chiechie

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Degrees of freedom

Degrees of freedom in linear models[edit] The demonstration of the t and chi-squared distributions for one-sample problems above is the simplest example where degrees-of-freedom arise. However, simila
复制链接

扫一扫