B站白板推导系列笔记——高斯分布——极大似然估计——有偏or无偏
首先上大佬视频链接:本篇传送门
高斯分布
-
一维高斯分布:
X ∼ N ( μ , σ 2 ) f ( x ) = 1 2 π σ e x p ( − ( x − μ ) 2 2 σ 2 ) (1) \begin{aligned} X&\sim N(\mu,\sigma^2) \\ f(x) &= \frac{1}{\sqrt{2\pi}\sigma}exp(-\frac{(x-\mu)^2}{2\sigma^2}) \end{aligned} \tag{1} Xf(x)∼N(μ,σ2)=2πσ1exp(−2σ2(x−μ)2)(1)
其中: μ \mu μ 是期望, σ 2 \sigma^2 σ2 是方差
-
高维高斯分布:
X ∼ N ( μ , Σ ) f ( x ) = 1 ( 2 π ) d 2 ∣ Σ ∣ 1 2 e x p ( − 1 2 ( x − μ ) T Σ − 1 ( x − μ ) ) (2) \begin{aligned} X&\sim N(\bm{\mu},\bm{\Sigma}) \\ f(\bm{x}) &= \frac{1}{(2\pi)^{\frac{d}{2}}|\bm{\Sigma}|^{\frac{1}{2}}}exp(-\frac{1}{2}(\bm{x}-\bm{\mu})^T\bm{\Sigma^{-1}}(\bm{x}-\bm{\mu})) \end{aligned} \tag{2} Xf(x)∼N(μ,Σ)=(2π)2d∣Σ∣211exp(−21(x−μ)TΣ−1(x−μ))(2)其中 x \bm{x} x 是 d d d 维向量, μ \bm{\mu} μ 是期望, Σ \bm{\Sigma} Σ 是协方差矩阵
对高斯分布的参数估计的有偏性
以一维高斯分布为例:
由MLE(最大似然估计)可知:
μ
M
L
E
=
1
N
∑
i
=
1
N
x
i
(3)
\mu_{MLE} = \frac{1}{N}\sum_{i=1}^{N} x_i \tag{3}
μMLE=N1i=1∑Nxi(3)
σ M L E 2 = 1 N ∑ i = 1 N ( x i − μ M L E ) 2 (4) \sigma_{MLE}^2 = \frac{1}{N} \sum_{i=1}^{N} (x_i - \mu_{MLE})^2 \tag{4} σMLE2=N1i=1∑N(xi−μMLE)2(4)
-
讨论 μ M L E \mu_{MLE} μMLE 的有偏性:
E [ μ M L E ] = E [ 1 N ∑ i = 1 N x i ] = 1 N ∑ i = 1 N E [ x i ] = 1 N ∑ i = 1 N μ = μ \begin{aligned} E[\mu_{MLE}] = E[\frac{1}{N}\sum_{i=1}^{N} x_i] = \frac{1}{N}\sum_{i=1}^{N} E[x_i] = \frac{1}{N}\sum_{i=1}^{N} \mu\ = \mu \end{aligned} E[μMLE]=E[N1i=1∑Nxi]=N1i=1∑NE[xi]=N1i=1∑Nμ =μ
故用MLE对 μ \mu μ 的估计是无偏的 -
讨论 σ M L E 2 \sigma_{MLE}^2 σMLE2 的有偏性:
E [ σ M L E 2 ] = E [ 1 N ∑ i = 1 N ( x i − μ M L E ) 2 ] = 1 N ∑ i = 1 N E [ ( x i − μ M L E ) 2 ] = 1 N ∑ i = 1 N ( E [ x i 2 ] − E [ 2 μ M L E x ] + E [ μ M L E 2 ] ) = 1 N ∑ i = 1 N ( E [ x i 2 ] − μ M L E 2 ) = 1 N ∑ i = 1 N ( E [ x i 2 ] − μ 2 + μ 2 − μ M L E 2 ) = 1 N ∑ i = 1 N ( E [ x i 2 ] − μ 2 ) + 1 N ∑ i = 1 N ( μ 2 − μ M L E 2 ) = 1 N ∑ i = 1 N ( E [ x i 2 ] − E ( x i ) 2 ) + 1 N ∑ i = 1 N ( μ 2 − μ M L E 2 ) = 1 N ∑ i = 1 N V a r ( x i ) + 1 N ∑ i = 1 N ( μ 2 − μ M L E 2 ) = V a r ( x i ) + 1 N ∑ i = 1 N ( μ 2 − μ M L E 2 ) = V a r ( x i ) − 1 N ∑ i = 1 N ( μ M L E 2 − μ 2 ) = V a r ( x i ) − 1 N ∑ i = 1 N ( E [ μ M L E 2 ] − E [ μ ] 2 ) = V a r ( x i ) − 1 N ∑ i = 1 N ( E [ μ M L E 2 ] − E [ μ M L E ] 2 ) = V a r ( x i ) − 1 N ∑ i = 1 N V a r ( μ M L E ) = V a r ( x i ) − V a r ( μ M L E ) = V a r ( x i ) − V a r ( 1 N ∑ i = 1 N x i ) = V a r ( x i ) − 1 N 2 ∑ i = 1 N V a r ( x i ) = V a r ( x i ) − 1 N V a r ( x i ) = N − 1 N σ 2 \begin{aligned} E[\sigma_{MLE}^2] &= E[\frac{1}{N} \sum_{i=1}^{N} (x_i - \mu_{MLE})^2] \\ &= \frac{1}{N}\sum_{i=1}^{N} E[(x_i - \mu_{MLE})^2] \\ &= \frac{1}{N}\sum_{i=1}^{N} (E[x_i^2] - E[2\mu_{MLE}x]+E[\mu_{MLE}^2] )\\ &=\frac{1}{N}\sum_{i=1}^{N} (E[x_i^2] - \mu_{MLE}^2 )\\ &=\frac{1}{N}\sum_{i=1}^{N} (E[x_i^2] - \mu^2+\mu^2-\mu_{MLE}^2 )\\ &=\frac{1}{N}\sum_{i=1}^{N} (E[x_i^2] - \mu^2)+\frac{1}{N}\sum_{i=1}^{N}(\mu^2-\mu_{MLE}^2 )\\ &=\frac{1}{N}\sum_{i=1}^{N} (E[x_i^2] - E(x_i)^2)+\frac{1}{N}\sum_{i=1}^{N}(\mu^2-\mu_{MLE}^2 )\\ &=\frac{1}{N}\sum_{i=1}^{N} Var(x_i)+\frac{1}{N}\sum_{i=1}^{N}(\mu^2-\mu_{MLE}^2 )\\ &=Var(x_i)+\frac{1}{N}\sum_{i=1}^{N}(\mu^2-\mu_{MLE}^2 )\\ &=Var(x_i)-\frac{1}{N}\sum_{i=1}^{N}(\mu_{MLE}^2-\mu^2 )\\ &=Var(x_i)-\frac{1}{N}\sum_{i=1}^{N}(E[\mu_{MLE}^2]-E[\mu]^2 )\\ &=Var(x_i)-\frac{1}{N}\sum_{i=1}^{N}(E[\mu_{MLE}^2]-E[\mu_{MLE}]^2 )\\ &=Var(x_i)-\frac{1}{N}\sum_{i=1}^{N}Var(\mu_{MLE})\\ &=Var(x_i)-Var(\mu_{MLE})\\ &=Var(x_i)-Var(\frac{1}{N}\sum_{i=1}^{N} x_i)\\ &=Var(x_i)-\frac{1}{N^2}\sum_{i=1}^{N} Var(x_i)\\ &=Var(x_i)-\frac{1}{N}Var(x_i)\\ &=\frac{N-1}{N}\sigma^2\\ \end{aligned}\\ E[σMLE2]=E[N1i=1∑N(xi−μMLE)2]=N1i=1∑NE[(xi−μMLE)2]=N1i=1∑N(E[xi2]−E[2μMLEx]+E[μMLE2])=N1i=1∑N(E[xi2]−μMLE2)=N1i=1∑N(E[xi2]−μ2+μ2−μMLE2)=N1i=1∑N(E[xi2]−μ2)+N1i=1∑N(μ2−μMLE2)=N1i=1∑N(E[xi2]−E(xi)2)+N1i=1∑N(μ2−μMLE2)=N1i=1∑NVar(xi)+N1i=1∑N(μ2−μMLE2)=Var(xi)+N1i=1∑N(μ2−μMLE2)=Var(xi)−N1i=1∑N(μMLE2−μ2)=Var(xi)−N1i=1∑N(E[μMLE2]−E[μ]2)=Var(xi)−N1i=1∑N(E[μMLE2]−E[μMLE]2)=Var(xi)−N1i=1∑NVar(μMLE)=Var(xi)−Var(μMLE)=Var(xi)−Var(N1i=1∑Nxi)=Var(xi)−N21i=1∑NVar(xi)=Var(xi)−N1Var(xi)=NN−1σ2
可知, σ M L E 2 \sigma_{MLE}^2 σMLE2 是有偏的
总结
E [ μ M L E ] = μ , E [ σ M L E 2 ] = N − 1 N σ 2 E[\mu_{MLE}] = \mu, E[\sigma_{MLE}^2] = \frac{N-1}{N}\sigma^2 E[μMLE]=μ,E[σMLE2]=NN−1σ2, 即对期望的估计无偏,对方差的估计有偏,故若想得到无偏的方差