在分层随机抽样中的总体均值和总体方差
引子
\qquad 在日常工作学习和生产生活中,我们常常会遇到已知各层样本的均值和方差求总体均值和方差的问题,本文给出了对于此类问题的一般解法及证明。
引例1
\qquad 某高中兴趣小组想调查本校各个年级学生的平均身高和方差以及总体学生的平均身高和方差。现已知他们采用分层随机抽样的方法抽取了:
- 高一学生100名,平均身高163cm,方差3.3;
- 高二学生90名,平均身高165cm,方差2.0;
- 高三学生110名,平均身高168cm,方差4.2。
\qquad 请问全校所有学生的平均身高和方差分别为多少?
引例2
\qquad 某市有5所高中进行了一场大联考,各校的总成绩统计如下:
高中编号 | 抽取人数占比 | 总成绩均值 | 总成绩方差 |
---|---|---|---|
1 | 15% | 443 | 28 |
2 | 23% | 450 | 33 |
3 | 22% | 470 | 23 |
4 | 16% | 453 | 40 |
5 | 24% | 466 | 43 |
\qquad 求这次考试这五所学校的考生的总成绩的平均值和方差分别为多少?
问题抽象
在分层随机抽样中,已知:
- 总共分为 L L L 层
- 各层样本占比: ω 1 , ω 2 , . . . , ω L \omega_1,\omega_2,...,\omega_L ω1,ω2,...,ωL
- 各层平均数: x 1 ‾ , x 2 ‾ , . . . , x L ‾ \overline{x_1},\overline{x_2},...,\overline{x_L} x1,x2,...,xL
- 各层方差: s 1 2 , s 2 2 , . . . , s L 2 s^2_1,s^2_2,...,s^2_L s12,s22,...,sL2
求解:总体均值 x ‾ \overline{x} x 和总体方差 s 2 s^2 s2
结论
x ‾ = ∑ i = 1 L ω i x i ‾ s 2 = ∑ i = 1 L ω i [ s i 2 + ( x i ‾ − x ‾ ) 2 ] \begin{align} \overline{x} &= \sum\limits_{i=1}^{L}\omega_i \overline{x_i} \nonumber \\ s^2 &= \sum\limits_{i = 1}^{L} \omega_i \left[ s^2_i + (\overline{x_i} - \overline{x})^2 \right] \nonumber \end{align} xs2=i=1∑Lωixi=i=1∑Lωi[si2+(xi−x)2]
求解过程
设:
- 各层样本量: n 1 , n 2 , . . . , n L n_1,n_2,...,n_L n1,n2,...,nL
- 第 i i i 层的 n i n_i ni 个样本数据分别为: x i 1 , x i 2 , . . . , x i n i x_{i1},x_{i2},...,x_{in_i} xi1,xi2,...,xini
则可知:
- 总样本量: N = ∑ i = 1 L n i N = \sum\limits_{i=1}^{L}n_i N=i=1∑Lni
- 各层样本占比: ω i = n i N \omega_i = \frac{n_i}{N} ωi=Nni
由平均数定义,得:
x
‾
=
1
N
∑
i
=
1
L
n
i
x
i
‾
=
∑
i
=
1
L
ω
i
x
i
‾
\overline{x} = \frac{1}{N} \sum\limits_{i=1}^{L}n_i \overline{x_i} = \sum\limits_{i=1}^{L}\omega_i \overline{x_i}
x=N1i=1∑Lnixi=i=1∑Lωixi
由方差定义,得:
s
2
=
1
N
∑
i
=
1
L
∑
j
=
1
n
i
(
x
i
j
−
x
‾
)
2
=
1
N
∑
i
=
1
L
∑
j
=
1
n
i
(
x
i
j
−
x
i
‾
+
x
i
‾
−
x
‾
)
2
=
1
N
∑
i
=
1
L
[
∑
j
=
1
n
i
(
x
i
j
−
x
i
‾
)
2
+
∑
j
=
1
n
i
2
(
x
i
j
−
x
i
‾
)
(
x
i
‾
−
x
‾
)
+
∑
j
=
1
n
i
(
x
i
‾
−
x
‾
)
2
]
\begin{align} s^2 &= \frac{1}{N} \sum\limits_{i=1}^{L} \sum\limits_{j=1}^{n_i} (x_{ij} - \overline{x})^2 \nonumber \\ &= \frac{1}{N} \sum\limits_{i=1}^{L} \sum\limits_{j=1}^{n_i} (x_{ij} - \overline{x_i} + \overline{x_i} - \overline{x})^2 \nonumber \\ &= \frac{1}{N} \sum\limits_{i=1}^{L} \left[ \sum\limits_{j=1}^{n_i} (x_{ij} - \overline{x_i})^2 + \sum\limits_{j=1}^{n_i} 2(x_{ij} - \overline{x_i})(\overline{x_i} - \overline{x}) + \sum\limits_{j=1}^{n_i} (\overline{x_i} - \overline{x})^2 \right] \nonumber \end{align}
s2=N1i=1∑Lj=1∑ni(xij−x)2=N1i=1∑Lj=1∑ni(xij−xi+xi−x)2=N1i=1∑L[j=1∑ni(xij−xi)2+j=1∑ni2(xij−xi)(xi−x)+j=1∑ni(xi−x)2]
由于
x
i
‾
=
∑
j
=
1
n
i
x
i
j
n
i
\overline{x_i} = \frac{\sum\limits_{j=1}^{n_i} x_{ij}}{n_i}
xi=nij=1∑nixij
则
∑
j
=
1
n
i
2
(
x
i
j
−
x
i
‾
)
(
x
i
‾
−
x
‾
)
=
2
(
x
i
‾
−
x
‾
)
∑
j
=
1
n
i
(
x
i
j
−
x
i
‾
)
=
2
(
x
i
‾
−
x
‾
)
(
∑
j
=
1
n
i
x
i
j
−
n
i
x
i
‾
)
=
0
\begin{align} &\sum\limits_{j=1}^{n_i} 2(x_{ij} - \overline{x_i})(\overline{x_i} - \overline{x}) \nonumber \\ = &2(\overline{x_i} - \overline{x})\sum\limits_{j=1}^{n_i} (x_{ij} - \overline{x_i}) \nonumber \\ = &2(\overline{x_i} - \overline{x}) \left( \sum\limits_{j=1}^{n_i} x_{ij} - n_i \overline{x_i} \right) \nonumber \\ = &0 \nonumber \end{align}
===j=1∑ni2(xij−xi)(xi−x)2(xi−x)j=1∑ni(xij−xi)2(xi−x)(j=1∑nixij−nixi)0
故
s
2
=
1
N
∑
i
=
1
L
[
∑
j
=
1
n
i
(
x
i
j
−
x
i
‾
)
2
+
∑
j
=
1
n
i
(
x
i
‾
−
x
‾
)
2
]
=
1
N
∑
i
=
1
L
[
n
i
s
i
2
+
n
i
(
x
i
‾
−
x
‾
)
2
]
=
∑
i
=
1
L
[
n
i
N
⋅
s
i
2
+
n
i
N
⋅
(
x
i
‾
−
x
‾
)
2
]
=
∑
i
=
1
L
ω
i
[
s
i
2
+
(
x
i
‾
−
x
‾
)
2
]
\begin{align} s^2 &= \frac{1}{N} \sum\limits_{i=1}^{L} \left[ \sum\limits_{j=1}^{n_i} (x_{ij} - \overline{x_i})^2 + \sum\limits_{j=1}^{n_i} (\overline{x_i} - \overline{x})^2 \right] \nonumber\\ &= \frac{1}{N} \sum\limits_{i=1}^{L} \left[ n_is_i^2 + n_i (\overline{x_i} - \overline{x})^2 \right] \nonumber \\ &= \sum\limits_{i=1}^{L} \left[ \frac{n_i}{N} \cdot s_i^2 + \frac{n_i}{N} \cdot (\overline{x_i} - \overline{x})^2 \right] \nonumber \\ &= \sum\limits_{i = 1}^{L} \omega_i \left[ s^2_i + (\overline{x_i} - \overline{x})^2 \right] \nonumber \end{align}
s2=N1i=1∑L[j=1∑ni(xij−xi)2+j=1∑ni(xi−x)2]=N1i=1∑L[nisi2+ni(xi−x)2]=i=1∑L[Nni⋅si2+Nni⋅(xi−x)2]=i=1∑Lωi[si2+(xi−x)2]
举例
以 引例2 为例:
x
‾
=
15
%
×
443
+
23
%
×
450
+
22
%
×
470
+
16
%
×
453
+
24
%
×
466
=
457.67
s
2
=
15
%
×
[
28
+
(
443
−
457.67
)
2
]
+
23
%
×
[
33
+
(
450
−
457.67
)
2
]
+
22
%
×
[
23
+
(
470
−
457.67
)
2
]
+
16
%
×
[
40
+
(
453
−
457.67
)
2
]
+
24
%
×
[
43
+
(
466
−
457.67
)
2
]
=
132.971
\begin{align} \overline{x} =& 15\% \times 443 + 23\% \times 450 + 22\% \times 470 + 16\% \times 453 + 24\% \times 466 \nonumber \\ =& 457.67 \nonumber \\ s^2 = &15\% \times [28 + (443 - 457.67)^2] + \nonumber \\ &23\% \times [33 + (450 - 457.67)^2] + \nonumber \\ &22\% \times [23 + (470 - 457.67)^2] + \nonumber \\ &16\% \times [40 + (453 - 457.67)^2] + \nonumber \\ &24\% \times [43 + (466 - 457.67)^2] \nonumber \\ =& 132.971 \nonumber \end{align}
x==s2==15%×443+23%×450+22%×470+16%×453+24%×466457.6715%×[28+(443−457.67)2]+23%×[33+(450−457.67)2]+22%×[23+(470−457.67)2]+16%×[40+(453−457.67)2]+24%×[43+(466−457.67)2]132.971
因此,这五所学校成绩的总体均值为457.67,方差为132.971.