Intuition from some examples
给定一个随机变量
X
X
X满足某种分布, 我们可以通过sample它得到其mean or variance. 假设sample了
N
N
N个点, 那sample mean
X
‾
=
1
N
∑
n
=
1
N
X
n
\overline{X}=\frac{1}{N}\sum_{n=1}^N X_n
X=N1n=1∑NXn
随着
N
N
N的增加,
X
‾
\overline{X}
X应该越来越趋近于真实的mean
E
[
X
]
\mathbb{E}[X]
E[X] 最终相等. 但是simulation results indicate otherwise.
例1: Bernoulli distribution
X
∼
B
e
r
n
o
u
l
l
i
(
p
=
0.6
)
X\sim Bernoulli(p=0.6)
X∼Bernoulli(p=0.6), 随着N的增大,sample mean的curve如下图所示
可以看到, 最终的mean确实好像是收敛到了
p
=
0.6
p=0.6
p=0.6. 但是如果我们放大来看的话会发现, 这条曲线实际上在抖动,就是说他并不是converge到一个点的.
例2: Gaussian distribution
X
∼
N
(
0
,
1
)
X\sim N(0,1)
X∼N(0,1), sample mean的curve
可以看到, 1 0 6 10^6 106之后sample mean仍在抖动. 虽然抖动的很小, 但至少不是想象中的converge to a single point.
以上simulation表示, sample mean 收敛不到 population mean (i.e., 真实的mean) . It should be close to the population mean, but may not exactly equal the population mean.
Main Results
另一种表述方法是: 即使 N N N足够大, 每次sample N次得到的 X ‾ \overline{X} X仍然不是一个固定值, 而是一个distribution.
Theorem 1 (mean of sample mean). If E [ X ] = μ \mathbb{E}[X]=\mu E[X]=μ, then E [ X ‾ ] = μ \mathbb{E}[\overline{X}]=\mu E[X]=μ.
Theorem 1很好理解, 即 X X X 的 sample mean 的 mean 即为 X X X 的 mean. 这也很好verify:
E [ X ‾ ] = E [ 1 N ∑ n = 1 N X n ] = 1 N ∑ n = 1 N E [ X n ] = E [ X ] = μ , \mathbb{E}[\overline{X}]= \mathbb{E}\bigg[\frac{1}{N}\sum_{n=1}^N X_n\bigg] =\frac{1}{N}\sum_{n=1}^N \mathbb{E}[X_n]=\mathbb{E}[X]=\mu, E[X]=E[N1n=1∑NXn]=N1n=1∑NE[Xn]=E[X]=μ,
因为每次的sample都是i.i.d.的.
Theorem 2 (variance of sample mean). If var [ X ] = σ 2 \text{var} [X]=\sigma^2 var[X]=σ2, then var [ X ‾ ] = σ 2 N \text{var}[\overline{X}]=\frac{\sigma^2}{N} var[X]=Nσ2.
var [ X ‾ ] = var [ 1 N ∑ n = 1 N X n ] = 1 N 2 ∑ n = 1 N var [ X n ] = σ 2 N , \text{var}[\overline{X}]= \text{var}\bigg[\frac{1}{N}\sum_{n=1}^N X_n\bigg] =\frac{1}{N^2}\sum_{n=1}^N \text{var}[X_n]=\frac{\sigma^2}{N}, var[X]=var[N1n=1∑NXn]=N21n=1∑Nvar[Xn]=Nσ2,
从Theorem 2中也可以看出, 多sample是有好处的, N N N越大sample mean 的variance越小也就越趋近于population mean.
Conclusion
Overall, the sample mean is not a robust statistic, meaning that they are sensitive to outliers. We can only give a lower bound and an upper bound of the population mean, and say how confident we are (in %) that the population mean is between the lower bound and upper bound of the confidence interval.
Confidence interval is
[
X
‾
−
E
,
X
‾
+
E
]
\big[ \overline{X}-E, \overline{X}+E\big]
[X−E,X+E], where
E
E
E is called the margin of error, and is given by
E
=
z
α
/
2
σ
N
E=z_{\alpha/2}\frac{\sigma}{\sqrt{N}}
E=zα/2Nσ
z
z
z: critical value, can be computed from standard normal distribution if given
α
/
2
\alpha/2
α/2.
α
\alpha
α: significance level.
C
L
=
1
−
α
CL=1-\alpha
CL=1−α: confidence level.
As shown in the figure,
- Given a C L = 95 % CL=95\% CL=95%;
- Calculate α = 0.05 \alpha = 0.05 α=0.05 and α = 0.025 \alpha = 0.025 α=0.025;
- Check norm distribution table and find z α / 2 = z 0.025 = 1.96 z_{\alpha/2}=z_{0.025}=1.96 zα/2=z0.025=1.96
- Compute E = z α / 2 σ N E=z_{\alpha/2}\frac{\sigma}{\sqrt{N}} E=zα/2Nσ, and the confidence interval.