Sample mean of a random variable

Intuition from some examples

给定一个随机变量 X X X满足某种分布, 我们可以通过sample它得到其mean or variance. 假设sample了 N N N个点, 那sample mean
X ‾ = 1 N ∑ n = 1 N X n \overline{X}=\frac{1}{N}\sum_{n=1}^N X_n X=N1n=1NXn
随着 N N N的增加, X ‾ \overline{X} X应该越来越趋近于真实的mean E [ X ] \mathbb{E}[X] E[X] 最终相等. 但是simulation results indicate otherwise.
例1: Bernoulli distribution X ∼ B e r n o u l l i ( p = 0.6 ) X\sim Bernoulli(p=0.6) XBernoulli(p=0.6), 随着N的增大,sample mean的curve如下图所示
在这里插入图片描述
可以看到, 最终的mean确实好像是收敛到了 p = 0.6 p=0.6 p=0.6. 但是如果我们放大来看的话会发现, 这条曲线实际上在抖动,就是说他并不是converge到一个点的.

例2: Gaussian distribution X ∼ N ( 0 , 1 ) X\sim N(0,1) XN(0,1), sample mean的curve
在这里插入图片描述

可以看到, 1 0 6 10^6 106之后sample mean仍在抖动. 虽然抖动的很小, 但至少不是想象中的converge to a single point.

以上simulation表示, sample mean 收敛不到 population mean (i.e., 真实的mean) . It should be close to the population mean, but may not exactly equal the population mean.

Main Results

另一种表述方法是: 即使 N N N足够大, 每次sample N次得到的 X ‾ \overline{X} X仍然不是一个固定值, 而是一个distribution.

Theorem 1 (mean of sample mean). If E [ X ] = μ \mathbb{E}[X]=\mu E[X]=μ, then E [ X ‾ ] = μ \mathbb{E}[\overline{X}]=\mu E[X]=μ.

Theorem 1很好理解, 即 X X X 的 sample mean 的 mean 即为 X X X 的 mean. 这也很好verify:

E [ X ‾ ] = E [ 1 N ∑ n = 1 N X n ] = 1 N ∑ n = 1 N E [ X n ] = E [ X ] = μ , \mathbb{E}[\overline{X}]= \mathbb{E}\bigg[\frac{1}{N}\sum_{n=1}^N X_n\bigg] =\frac{1}{N}\sum_{n=1}^N \mathbb{E}[X_n]=\mathbb{E}[X]=\mu, E[X]=E[N1n=1NXn]=N1n=1NE[Xn]=E[X]=μ,

因为每次的sample都是i.i.d.的.

Theorem 2 (variance of sample mean). If var [ X ] = σ 2 \text{var} [X]=\sigma^2 var[X]=σ2, then var [ X ‾ ] = σ 2 N \text{var}[\overline{X}]=\frac{\sigma^2}{N} var[X]=Nσ2.

var [ X ‾ ] = var [ 1 N ∑ n = 1 N X n ] = 1 N 2 ∑ n = 1 N var [ X n ] = σ 2 N , \text{var}[\overline{X}]= \text{var}\bigg[\frac{1}{N}\sum_{n=1}^N X_n\bigg] =\frac{1}{N^2}\sum_{n=1}^N \text{var}[X_n]=\frac{\sigma^2}{N}, var[X]=var[N1n=1NXn]=N21n=1Nvar[Xn]=Nσ2,

从Theorem 2中也可以看出, 多sample是有好处的, N N N越大sample mean 的variance越小也就越趋近于population mean.

Conclusion

Overall, the sample mean is not a robust statistic, meaning that they are sensitive to outliers. We can only give a lower bound and an upper bound of the population mean, and say how confident we are (in %) that the population mean is between the lower bound and upper bound of the confidence interval.

在这里插入图片描述

Confidence interval is [ X ‾ − E , X ‾ + E ] \big[ \overline{X}-E, \overline{X}+E\big] [XE,X+E], where E E E is called the margin of error, and is given by
E = z α / 2 σ N E=z_{\alpha/2}\frac{\sigma}{\sqrt{N}} E=zα/2N σ

z z z: critical value, can be computed from standard normal distribution if given α / 2 \alpha/2 α/2.
α \alpha α: significance level.
C L = 1 − α CL=1-\alpha CL=1α: confidence level.

As shown in the figure,

  1. Given a C L = 95 % CL=95\% CL=95%;
  2. Calculate α = 0.05 \alpha = 0.05 α=0.05 and α = 0.025 \alpha = 0.025 α=0.025;
  3. Check norm distribution table and find z α / 2 = z 0.025 = 1.96 z_{\alpha/2}=z_{0.025}=1.96 zα/2=z0.025=1.96
  4. Compute E = z α / 2 σ N E=z_{\alpha/2}\frac{\sigma}{\sqrt{N}} E=zα/2N σ, and the confidence interval.
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值