机器学习-白板推导 P2_1

一维高斯分布:

高斯分布

X : d a t a → X = ( x 1    x 2    . . .    x N ) N × p T x i ∈ R p x i ∼ i i d N ( μ , Σ ) θ = ( μ , Σ ) \begin{aligned} & X:data \rightarrow X=(x_1 \; x_2 \; ...\;x_N)^T_{N \times p} \\ & x_i \in R^p \\ & x_i \sim^{iid} N(\mu, \Sigma) \\ & \theta = (\mu, \Sigma) \end{aligned} X:dataX=(x1x2...xN)N×pTxiRpxiiidN(μ,Σ)θ=(μ,Σ)

一维高斯分布:

p ( x ) = 1 2 π σ exp ⁡ ( − ( x − μ ) 2 2 σ 2 ) p(x) = \frac{1}{\sqrt{2 \pi} \sigma} \exp \left( - \frac{(x - \mu)^2}{2 \sigma ^ 2} \right) p(x)=2π σ1exp(2σ2(xμ)2)

p维高斯分布:

p ( x ) = 1 2 π p 2 exp ⁡ ( − 1 2 ( x − μ ) T Σ − 1 ( x − μ ) ) p(x) = \frac{1}{{2 \pi}^{\frac{p}{2}} } \exp \left( - \frac{1}{2} (x - \mu)^T \Sigma^{-1 } (x - \mu) \right) p(x)=2π2p1exp(21(xμ)TΣ1(xμ))

参数值公式推导

θ M L E = a r g max ⁡ θ log ⁡ p ( x ∣ θ ) \theta_{MLE} = arg\max_\theta \log p(x|\theta) θMLE=argθmaxlogp(xθ)
p = 1 p=1 p=1, θ = ( μ , σ 2 ) \theta=(\mu,\sigma^2) θ=(μ,σ2)
log ⁡ p ( x ∣ θ ) = log ⁡ ∏ i = 1 N p ( x i ∣ θ ) = ∑ i = 1 N log ⁡ p ( x i ∣ θ ) = ∑ i = 1 N log ⁡ 1 2 π σ exp ⁡ ( − ( x i − μ ) 2 2 σ 2 ) = ∑ i = 1 N [ log ⁡ 1 2 π + log ⁡ 1 σ − ( x i − μ ) 2 2 σ 2 ] \begin{aligned} \log p(x|\theta) &= \log \prod_{i=1}^N p(x_i|\theta) \\ & = \sum_{i=1}^N \log p(x_i|\theta) \\ & = \sum_{i=1}^N \log \frac{1}{\sqrt{2 \pi} \sigma} \exp \left( - \frac{(x_i - \mu)^2}{2 \sigma ^ 2} \right) \\ & = \sum_{i=1}^N \left[ \log \frac{1}{\sqrt{2 \pi}} + \log \frac{1}{\sigma} - \frac{(x_i - \mu)^2}{2 \sigma^2} \right] \end{aligned} logp(xθ)=logi=1Np(xiθ)=i=1Nlogp(xiθ)=i=1Nlog2π σ1exp(2σ2(xiμ)2)=i=1N[log2π 1+logσ12σ2(xiμ)2]
μ \mu μ:
μ M L E = a r g max ⁡ μ log ⁡ p ( x ∣ θ ) = a r g max ⁡ μ ∑ i = 1 N − ( x i − μ ) 2 2 σ 2 = a r g min ⁡ μ ∑ i = 1 N ( x i − μ ) 2 \begin{aligned} \mu_{MLE} &= arg \max_{\mu} \log p(x|\theta) \\ &= arg \max_{\mu} \sum_{i=1}^N - \frac{(x_i - \mu)^2}{2 \sigma^2} \\ & = arg \min_{\mu} \sum_{i=1}^N (x_i - \mu)^2 \end{aligned} μMLE=argμmaxlogp(xθ)=argμmaxi=1N2σ2(xiμ)2=argμmini=1N(xiμ)2
∂ ∂ μ ∑ i = 1 N ( x i − μ ) 2 = ∑ i = 1 N 2 ∗ ( x i − μ ) ∗ ( − 1 ) = 0 ∑ i = 1 N ( x i − μ ) = 0 ∑ i = 1 N x i − ∑ i = 1 N μ = 0 \begin{aligned} \frac {\partial}{\partial \mu} \sum_{i=1}^N (x_i - \mu)^2 &= \sum_{i=1}^N 2*(x_i - \mu)*(-1) = 0 \\ \sum_{i=1}^N (x_i - \mu) &= 0 \\ \sum_{i=1}^N x_i - \sum_{i=1}^N \mu &= 0 \end{aligned} μi=1N(xiμ)2i=1N(xiμ)i=1Nxii=1Nμ=i=1N2(xiμ)(1)=0=0=0
μ M L E = 1 N ∑ i = 1 N x i \mu_{MLE} = \frac{1}{N}\sum_{i=1}^N x_i μMLE=N1i=1Nxi

σ 2 \sigma^2 σ2:
σ M L E 2 = a r g max ⁡ σ log ⁡ p ( x ∣ θ ) = a r g max ⁡ σ ∑ i = 1 N ( log ⁡ 1 σ − ( x i − μ ) 2 2 σ 2 ) \begin{aligned} \sigma^2_{MLE} &= arg \max_{\sigma} \log p(x|\theta) \\ & = arg \max_{\sigma} \sum_{i=1}^N \left( \log \frac{1}{\sigma} - \frac{(x_i - \mu)^2}{2 \sigma^2} \right) \end{aligned} σMLE2=argσmaxlogp(xθ)=argσmaxi=1N(logσ12σ2(xiμ)2)
∂ ∂ σ ∑ i = 1 N ( log ⁡ 1 σ − ( x i − μ ) 2 2 σ 2 ) = ∑ i = 1 N ( − 1 σ + ( x i − μ ) 2 ∗ σ − 3 ) = 0 ∑ i = 1 N ( − σ − 2 + ( x i − μ ) 2 ) = 0 − ∑ i = 1 N σ 2 + ∑ i = 1 N ( x i − μ ) 2 = 0 ∑ i = 1 N σ 2 = ∑ i = 1 N ( x i − μ ) 2 \begin{aligned} \frac {\partial}{\partial \sigma} \sum_{i=1}^N \left( \log \frac{1}{\sigma} - \frac{(x_i - \mu)^2}{2 \sigma^2} \right) &= \sum_{i=1}^N \left( -\frac{1}{\sigma} + (x_i - \mu)^2 * \sigma^{-3} \right) = 0 \\ \sum_{i=1}^N \left( -{\sigma}^{-2} + (x_i - \mu)^2 \right) &= 0 \\ -\sum_{i=1}^N \sigma^2 + \sum_{i=1}^N (x_i - \mu)^2 &= 0 \\ \sum_{i=1}^N \sigma^2 = \sum_{i=1}^N (x_i - \mu)^2 \end{aligned} σi=1N(logσ12σ2(xiμ)2)i=1N(σ2+(xiμ)2)i=1Nσ2+i=1N(xiμ)2i=1Nσ2=i=1N(xiμ)2=i=1N(σ1+(xiμ)2σ3)=0=0=0
σ M L E 2 = 1 N ∑ i = 1 N ( x i − μ ) 2 \sigma^2_{MLE} = \frac{1}{N}\sum_{i=1}^N (x_i - \mu)^2 σMLE2=N1i=1N(xiμ)2

参数有偏无偏推导

μ M L E \mu_{MLE} μMLE为无偏估计
E [ μ M L E ] = 1 N ∑ i = 1 N E [ x i ] = 1 N ∑ i = 1 N μ = μ E[\mu_{MLE}]=\frac{1}{N}\sum_{i=1}^NE[x_i]=\frac{1}{N}\sum_{i=1}^N\mu=\mu E[μMLE]=N1i=1NE[xi]=N1i=1Nμ=μ

V a r [ μ M L E ] = V a r [ 1 N ∑ i = 1 N x i ] = 1 N 2 ∑ i = 1 N V a r [ x i ] = 1 N 2 ∑ i = 1 N σ 2 = 1 N 2 N σ 2 = 1 N σ 2 \begin{aligned} Var[\mu_{MLE}] &= Var[\frac{1}{N} \sum_{i=1}^N x_i]=\frac{1}{N^2} \sum_{i=1}^N Var[x_i] \\ &=\frac{1}{N^2} \sum_{i=1}^N \sigma^2 = \frac{1}{N^2} N\sigma^2 = \frac{1}{N}\sigma^2 \end{aligned} Var[μMLE]=Var[N1i=1Nxi]=N21i=1NVar[xi]=N21i=1Nσ2=N21Nσ2=N1σ2

σ M L E 2 \sigma^2_{MLE} σMLE2 为有偏估计。
无偏估计值应为:
E [ σ M L E 2 ] = N − 1 N σ 2              σ ^ = 1 N − 1 ∑ i = 1 N ( x i − μ ) 2 E[\sigma^2_{MLE}]=\frac{N-1}{N}\sigma^2 \; \; \; \; \; \; \hat{\sigma}=\frac{1}{N-1}\sum_{i=1}^N (x_i - \mu)^2 E[σMLE2]=NN1σ2σ^=N11i=1N(xiμ)2
σ M L E 2 \sigma^2_{MLE} σMLE2 公式推导:
σ M L E 2 = 1 N ∑ i = 1 N ( x i − μ M L E ) 2 = 1 N ∑ i = 1 N ( x i 2 − 2 x i μ M L E + μ M L E 2 ) = 1 N ∑ i = 1 N x i 2 − 1 N ∑ i = 1 N 2 x i μ M L E + 1 N ∑ i = 1 N μ M L E 2 = 1 N ∑ i = 1 N x i 2 − μ M L E 2 \begin{aligned} \sigma^2_{MLE} &= \frac{1}{N}\sum_{i=1}^N (x_i - \mu_{MLE})^2 \\ & = \frac{1}{N}\sum_{i=1}^N (x_i^2 - 2x_i\mu_{MLE}+\mu_{MLE}^2) \\ &= \frac{1}{N}\sum_{i=1}^N x_i^2 - \frac{1}{N}\sum_{i=1}^N 2x_i\mu_{MLE} + \frac{1}{N} \sum_{i=1}^N \mu_{MLE}^2 \\ &= \frac{1}{N}\sum_{i=1}^N x_i^2 - \mu_{MLE}^2 \end{aligned} σMLE2=N1i=1N(xiμMLE)2=N1i=1N(xi22xiμMLE+μMLE2)=N1i=1Nxi2N1i=1N2xiμMLE+N1i=1NμMLE2=N1i=1Nxi2μMLE2
E [ σ M L E 2 ] = E [ 1 N ∑ i = 1 N x i 2 − μ M L E 2 ] = E [ ( 1 N ∑ i = 1 N x i 2 − μ 2 ) − ( μ M L E 2 − μ 2 ) ] = E [ 1 N ∑ i = 1 N x i 2 − μ 2 ] − E [ μ M L E 2 − μ 2 ] \begin{aligned} E[\sigma^2_{MLE}] &=E[ \frac{1}{N}\sum_{i=1}^N x_i^2 - \mu_{MLE}^2] \\ &=E[(\frac{1}{N}\sum_{i=1}^N x_i^2 - \mu^2)-( \mu_{MLE}^2 - \mu^2)] \\ &= E[\frac{1}{N}\sum_{i=1}^N x_i^2 - \mu^2] - E[ \mu_{MLE}^2 - \mu^2] \end{aligned} E[σMLE2]=E[N1i=1Nxi2μMLE2]=E[(N1i=1Nxi2μ2)(μMLE2μ2)]=E[N1i=1Nxi2μ2]E[μMLE2μ2]
E [ 1 N ∑ i = 1 N x i 2 − μ 2 ] = E [ 1 N ∑ i = 1 N ( x i 2 − μ 2 ) ] = 1 N ∑ i = 1 N E [ ( x i 2 − μ 2 ) ] = 1 N ∑ i = 1 N ( E [ x i 2 ] − μ 2 ) = 1 N ∑ i = 1 N ( V a r [ x i ] ) = 1 N ∑ i = 1 N ( σ 2 ) = σ 2 \begin{aligned} E[\frac{1}{N}\sum_{i=1}^N x_i^2 - \mu^2] &= E[\frac{1}{N}\sum_{i=1}^N (x_i^2 - \mu^2)] \\ &= \frac{1}{N} \sum_{i=1}^N E[(x_i^2 - \mu^2)] \\ &= \frac{1}{N} \sum_{i=1}^N (E[x_i^2] - \mu^2) \\ &= \frac{1}{N} \sum_{i=1}^N (Var[x_i]) \\ &= \frac{1}{N} \sum_{i=1}^N (\sigma^2) \\ & = \sigma^2 \end{aligned} E[N1i=1Nxi2μ2]=E[N1i=1N(xi2μ2)]=N1i=1NE[(xi2μ2)]=N1i=1N(E[xi2]μ2)=N1i=1N(Var[xi])=N1i=1N(σ2)=σ2
E [ μ M L E 2 − μ 2 ] = E [ μ M L E 2 ] − E [ μ 2 ] = E [ μ M L E 2 ] − μ 2 = E [ μ M L E 2 ] − E [ μ M L E ] 2 = V a r [ μ M L E ] = 1 N σ 2 \begin{aligned} E[ \mu_{MLE}^2 - \mu^2] &= E[\mu_{MLE}^2] - E[ \mu^2] \\ &= E[\mu_{MLE}^2] - \mu^2 \\ &= E[\mu_{MLE}^2] - {E[\mu _{MLE}]}^2 \\ & = Var[\mu _{MLE}] \\ & = \frac{1}{N} \sigma ^ 2 \end{aligned} E[μMLE2μ2]=E[μMLE2]E[μ2]=E[μMLE2]μ2=E[μMLE2]E[μMLE]2=Var[μMLE]=N1σ2
E [ σ M L E 2 ] = σ 2 − 1 N σ 2 = N − 1 N σ 2 E[\sigma^2_{MLE}] = \sigma ^ 2 - \frac{1}{N} \sigma ^ 2 = \frac{N-1}{N} \sigma ^ 2 E[σMLE2]=σ2N1σ2=NN1σ2

B站链接:
https://www.bilibili.com/video/av32905863?from=search&seid=8309397892501615322

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值