Central Limit Theorem Overview


Mathematical definitions/proofs/derivations are all omitted because this note intends to serve as a simple overview of the CLT, its power, and a misconception that I used to have about the CLT.

Random Variable

Definition

  • A random variable, Y Y Y, is a map (function) from the sample space S S S to R \R R
    • The sample space is the set of all possibe outcomes (events) resulted from a random experiment.
  • For example, if we care about the test scores of all the first year students in an university, then we have a sample space consists of all the integers in, say, for exmaple, [ 0 , 100 ] [0,100] [0,100], and we can naturally define the random variable Y Y Y to be Y : S → R Y:S\to \R Y:SR by mapping each possible score in the sample space to the same numerical value in R \R R.

Probability Density/Mass Function & Population

  • For discrete random variables, we can use its probability mass function to describe its probability distribution.
  • For continuous random variables, we can use its probability density function (PDF) to describe its probability distribution.
  • The probability distribution of a random variable is usually unknown in reality because the knowledge about how the values in the population of our interest are actually distributed simply cannot be obtained or cannot be suitably modeled by a closed-form mathematical formula. Consequently, information like the population mean μ \mu μ (the mean of the R.V.) and the variance σ 2 \sigma^2 σ2 (the variance of the R.V.) of the random variable remain unknown to us.
  • However, we do have a theorem that allows us to estimate these two parameters of the population from limited information that we can obtain from random samples drawn from the population.

Central Limit Theorem

  • Now suppose we take out a random sample of size n ‾ \underline n n from the population distribution, N ( μ , σ 2 ) N(\mu,\sigma^2) N(μ,σ2). Effectively, this means n n n random variables, Y 1 , . . . , Y n Y_1,...,Y_n Y1,...,Yn that are independent and from the same common probability distribution. (The definition of independence of multiple RV’s is omitted). - We call Y 1 , . . . . , Y n Y_1,....,Y_n Y1,....,Yn independent and identically distributed, where the distribution of each Y i Y_i Yi follows the population distribution N ( μ , σ 2 ) N(\mu, \sigma^2) N(μ,σ2). This is denoted Y i ∼ i i d N ( μ , σ 2 ) ∀ i = 1 , . . . , n Y_i \sim^{iid}N(\mu, \sigma^2)\forall i=1,...,n YiiidN(μ,σ2)i=1,...,n.

  • The power of the Central Limit Theorem is that you can always estimate μ \mu μ with the limited information from your sample(s) no matter how weird that unknown population probability distribution might be.

  • The Central Limit Theorem states that the sum of these n n n independent and identically distributed random variables, denoted X = ∑ i = 1 n Y i \bf{X=\sum_{i=1}^nY_i} X=i=1nYi where Y i ∼ i i d N ( μ , σ 2 ) Y_i \sim^{iid}N(\mu, \sigma^2) YiiidN(μ,σ2), follows a normal distribution when n n n is large enough.

  • Notationally, it says the following:

    • X = ∑ i = 1 n Y i \bf {X=\sum_{i=1}^nY_i} X=i=1nYi follows N ( μ , σ 2 n ) \bf {N(\mu, \sigma^2n)} N(μ,σ2n) when n → ∞ n\to \infin n
    • Or equivalently, Z n = x − n μ n σ 2 \bf {Z_n=\frac{x-n\mu}{\sqrt{n\sigma^2}}} Zn=nσ2 xnμ follows N ( 0 , 1 ) \bf {N(0,1)} N(0,1) when n → ∞ \bf {n\to \infin} n.
  • Remarks on the statements of the Theorem:

    • The random variable Z n Z_n Zn is a standardization of the random variable X X X i.e. by subtracting the mean of X X X from X X X and then dividing the result by the standard deviation of X X X. Indeed, N ( 0 , 1 ) N(0,1) N(0,1), the standard normal, is also called the Z Z Z-distribution.
    • The sample mean (another random variable), X ˉ = X / n \bar X=X/n Xˉ=X/n, or the sampling distribution of sample mean, follows a normal distribution N ( μ , σ 2 n ) N(\mu,\frac{\sigma^2}{n}) N(μ,nσ2) by CLT. This is obtained by X ∼ N ( n μ , n σ 2 ) X\sim N(n\mu,n\sigma^2) XN(nμ,nσ2), so E ( X / n ) = E ( X ) n E(X/n)=\frac{E(X)}{n} E(X/n)=nE(X), and V a r ( X / n ) = V a r ( X ) n 2 Var(X/n)=\frac{Var(X)}{n^2} Var(X/n)=n2Var(X).
    • The distribution of X X X is the sampling distribution of the sample statistic: sample sum (can also be sample mean if you use X ˉ : = X / n \bar X:=X/n Xˉ:=X/n) when the sample size is n n n. Thus, CLT also means: when randomly sampling samples of size n n n from any population with mean μ \mu μ and standard deviation σ \sigma σ, the sampling distribution of the sample sum follows N ( n μ , σ 2 n ) N(n\mu,\frac{\sigma^2}{n}) N(nμ,nσ2) when n n n is large enough. Recall that X X X is a random variable, so it’s probability distribution is never known to us in reality because, to obtain the exact probability density function of X X X, we will have to take out infinitely many random samples of size n n n, which is impossible.

Caveat 1-How many samples do we take?

  • The CLT has nothing to do with how many random samples of size n n n you actually take from the population! The distribution is just related to the sample size and the parameters of the population.
  • The reason is that, by the CLT’s statement, the distribution of X X X is already decided (theoretically, as the population may be unknown) when sample size n n n is decided, so how many samples you actually take does not really matter.
  • Indeed, to estimate the sampling distribution of sample mean, only one random sample is needed:
    • Even if we only have one random sample, the sample statistics y ˉ \bar y yˉ and S S S (sample standard deviation) we can obtain are enough for giving unbiased estimates of the population parameters μ \mu μ and σ 2 \sigma^2 σ2 (where μ \mu μ is estimated by the estimator y ˉ \bar y yˉ, sample mean, and σ 2 \sigma^2 σ2 is estimated by S 2 S^2 S2).
    • Thus, merely using the data from one random sample, CLT allows us to decide the approximate distribution of X X X (or X / n X/n X/n).
    • More precisely, as X X X refers to the sum of the Y Y Y values in a random sample, basically, CLT is telling us the sampling distribution of sample mean has a mean of x ˉ \bar x xˉ and a variance of S 2 n \frac{S^2}{n} nS2 (i.e. we can substitute μ \mu μ in the original formula with x ˉ \bar x xˉ and σ \sigma σ with S S S as these are unbiased estimates).
    • Moreover, this estimate of the variance of the sampling distribution of sample mean can in turn be used to calculate the confidence interval, as the idea of confidence interval also relies on the sampling distribution of sample mean.

Caveat 2-Limitations of CLT.

  • The CLT only applies to the sampling distributions of sample statistics that can be directly derived from the sample sum. For example, sample mean and sample proportion. It fails when we want to consider the sampling distribution of statistics like the sample variance. Therefore, t t t tests and/or z z z tests relying on CLT will not work if we want to test hypothesis about those parameters. Thus, test methods like bootstrap test may help. See the Boostrap part in this post.

Computation Example of CLT

  • The lengths of bolts produced in a factory are assumed to be normally distributed with mean 3.06 inches and standard deviation 0.63 inches.
    Suppose a researcher chooses 70 samples of size 50 from this population. Calculate the mean and variance of the sampling distribution.
  • Solution:
    • Notice that we now actually know the population distribution, so Y Y Y's probability density function is already known to be N ( μ = 3.06 , v a r = 0.6 3 2 ) N(\mu=3.06, var =0.63^2) N(μ=3.06,var=0.632).
    • Now, the sample size n = 50 n=50 n=50, by CLT, the sample mean, X / n X/n X/n, is approximately normally distributed, and we are asked to estimate the mean and variance of the sampling distribution of X : = ∑ i = 1 50 Y i X:=\sum_{i=1}^{50}Y_i X:=i=150Yi. By CLT, these are given by E ( X ) n = 3.06 ∗ 50 50 = 3.06 \frac{E(X)}{n}=\frac{3.06*50}{50}=3.06 nE(X)=503.0650=3.06 and v a r ( X ) n 2 = 50 ∗ 0.6 3 2 5 0 2 = 0.08 9 2 \frac{var(X)}{n^2}=\frac{50 *0.63^2}{50^2}=0.089^2 n2var(X)=502500.632=0.0892.
      • You may derive these two formulas using the linearity of expectation: E ( a u ( x ) + b ) = a ( E ( u ( x ) ) + b , a , b E(au(x)+b)=a(E(u(x)) + b, a,b E(au(x)+b)=a(E(u(x))+b,a,b constanst, X X X is any RV, and u : i m a g e ( X ) → R u: image(X)\to \R u:image(X)R is a given function of X X X.
    • Thus, the distribution of the sample mean is N ( 3.06 , 0.08 9 2 ) N(3.06, 0.089^2) N(3.06,0.0892)
  • Remark: as per Caveat 1 above, the number 70 70 70 has nothing to do with the sampling distribution of the sample mean!
  • 1
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 1
    评论
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值