此文章主要是结合哔站shuhuai008大佬的白板推导视频:数学基础_150min
全部笔记的汇总贴:机器学习-白板推导系列笔记
一、概述
假设有以下数据:
X = ( x 1 , x 1 , ⋯ , x N ) T = ( x 1 T x 2 T ⋮ x N T ) N × p X=(x_{1},x_{1},\cdots ,x_{N})^{T}=\begin{pmatrix} x_{1}^{T}\\ x_{2}^{T}\\ \vdots \\ x_{N}^{T} \end{pmatrix}_{N \times p} X=(x1,x1,⋯,xN)T=⎝⎜⎜⎜⎛x1Tx2T⋮xNT⎠⎟⎟⎟⎞N×p
其中 x i ∈ R p x_{i}\in \mathbb{R}^{p} xi∈Rp且 x i ∼ i i d N ( μ , Σ ) x_{i}\overset{iid}{\sim }N(\mu ,\Sigma ) xi∼iidN(μ,Σ)
则参数 θ = ( μ , Σ ) \theta =(\mu ,\Sigma ) θ=(μ,Σ)
二、通过极大似然估计高斯分布的均值和方差
(一)极大似然
θ M L E = a r g m a x θ P ( X ∣ θ ) \theta_{MLE}=\underset{\theta }{argmax}P(X|\theta ) θMLE=θargmaxP(X∣θ)
(二)高斯分布
一维高斯分布: p ( x ) = 1 2 π σ e x p ( − ( x − μ ) 2 2 σ 2 ) p(x)=\frac{1}{\sqrt{2\pi }\sigma }exp(-\frac{(x-\mu )^{2}}{2\sigma ^{2}}) p(x)=2πσ1exp(−2σ2(x−μ)2)
多维高斯分布: p ( x ) = 1 ( 2 π ) D / 2 ∣ Σ ∣ 1 / 2 e x p ( − 1 2 ( x − μ ) T Σ − 1 ( x − μ ) ) p(x)=\frac{1}{(2\pi )^{D/2}|\Sigma |^{1/2}}exp(-\frac{1}{2}(x-\mu)^{T}\Sigma ^{-1}(x-\mu)) p(x)=(2π)D/2∣Σ∣1/21exp(−21(x−μ)TΣ−1(x−μ))
(三)一维高斯分布下的估计
1.关于 θ \theta θ的似然函数
l o g P ( X ∣ θ ) = l o g ∏ i = 1 N p ( x i ∣ θ ) = ∑ i = 1 N l o g 1 2 π σ e x p ( − ( x i − μ ) 2 2 σ 2 ) = ∑ i = 1 N [ l o g 1 2 π + l o g 1 σ − ( x i − μ ) 2 2 σ 2 ] logP(X|\theta )=log\prod_{i=1}^{N}p(x_{i}|\theta )\\ =\sum_{i=1}^{N}log\frac{1}{\sqrt{2\pi }\sigma }exp(-\frac{(x_{i}-\mu )^{2}}{2\sigma ^{2}})\\ =\sum_{i=1}^{N}[log\frac{1}{\sqrt{2\pi }}+log\frac{1}{\sigma }-\frac{(x_{i}-\mu )^{2}}{2\sigma ^{2}}] logP(X∣θ)=logi=1∏Np(xi∣θ)=i=1∑Nlog2πσ1exp(−2σ2(xi−μ)2)=i=1∑N[log2π1+logσ1−2σ2(xi−μ)2]
2.通过极大似然估计法求解 μ M L E \mu _{MLE} μMLE
μ M L E = a r g m a x μ l o g P ( X ∣ θ ) = a r g m a x μ ∑ i = 1 N − ( x i − μ ) 2 2 σ 2 = a r g m i n μ ∑ i = 1 N ( x i − μ ) 2 \mu _{MLE}=\underset{\mu }{argmax}logP(X|\theta)\\ =\underset{\mu }{argmax}\sum_{i=1}^{N}-\frac{(x_{i}-\mu )^{2}}{2\sigma ^{2}}\\ =\underset{\mu }{argmin}\sum_{i=1}^{N}(x_{i}-\mu )^{2} μMLE=μargmaxlogP(X∣θ)=μargmaxi=1∑N−2σ2(xi−μ)2=μargmini=1∑N(xi−μ)2
对 μ \mu μ求导
∂ ∑ i = 1 N ( x i − μ ) 2 ∂ μ = ∑ i = 1 N 2 ( x i − μ ) ( − 1 ) = 0 ⇔ ∑ i = 1 N ( x i − μ ) = 0 ⇔ ∑ i = 1 N x i − ∑ i = 1 N μ ⏟ N μ = 0 \frac{\partial \sum_{i=1}^{N}(x_{i}-\mu )^{2}}{\partial \mu}=\sum_{i=1}^{N}2(x_{i}-\mu )(-1)=0\\ \Leftrightarrow \sum_{i=1}^{N}(x_{i}-\mu )=0\\ \Leftrightarrow \sum_{i=1}^{N}x_{i}-\underset{N\mu }{\underbrace{\sum_{i=1}^{N}\mu }}=0 ∂μ∂∑i=1N(xi−μ)2=i=1∑N2(xi−μ)(−1)=0⇔i=1∑N(xi−μ)=0⇔i=1∑Nxi−Nμ i=1∑Nμ=0
解得 μ M L E = 1 N ∑ i = 1 N x i \mu _{MLE}=\frac{1}{N}\sum_{i=1}^{N}x_{i} μMLE=N1i=1∑Nxi
3.证明 μ M L E \mu _{MLE} μMLE是无偏估计
E [ μ M L E ] = 1 N ∑ i = 1 N E [ x i ] = 1 N ∑ i = 1 N μ = 1 N N μ = μ E[\mu _{MLE}]=\frac{1}{N}\sum_{i=1}^{N}E[x_{i}] =\frac{1}{N}\sum_{i=1}^{N}\mu =\frac{1}{N}N\mu =\mu E[μMLE]=N1i=1∑NE[xi]=N1i=1∑Nμ=N1Nμ=μ
4.通过极大似然估计法求解 σ M L E \sigma _{MLE} σMLE
σ M L E 2 = a r g m a x σ P ( X ∣ θ ) = a r g m a x σ ∑ i = 1 N ( − l o g σ − ( x i − μ ) 2 2 σ 2 ) ⏟ L ∂ L ∂ σ = ∑ i = 1 N [ − 1 σ + ( x i − μ ) 2 σ − 3 ] ⇔ ∑ i = 1 N [ − σ 2 + ( x i − μ ) 2 ] = 0 ⇔ − ∑ i = 1 N σ 2 + ∑ i = 1 N ( x i − μ ) 2 = 0 σ M L E 2 = 1 N ∑ i = 1 N ( x i − μ ) 2 \sigma _{MLE}^{2}=\underset{\sigma }{argmax}P(X|\theta )\\ =\underset{\sigma }{argmax}\underset{L}{\underbrace{\sum_{i=1}^{N}(-log\sigma -\frac{(x_{i}-\mu )^{2}}{2\sigma ^{2}})}}\\ \frac{\partial L}{\partial \sigma}=\sum_{i=1}^{N}[-\frac{1}{\sigma }+(x_{i}-\mu )^{2}\sigma ^{-3}]\\ \Leftrightarrow \sum_{i=1}^{N}[-\sigma ^{2}+(x_{i}-\mu )^{2}]=0\\ \Leftrightarrow -\sum_{i=1}^{N}\sigma ^{2}+\sum_{i=1}^{N}(x_{i}-\mu )^{2}=0\\ \sigma _{MLE}^{2}=\frac{1}{N}\sum_{i=1}^{N}(x_{i}-\mu )^{2} σMLE2=σargmaxP(X∣θ)=σargmaxL i=1∑N(−logσ−2σ2(xi−μ)2)∂σ∂L=i=1∑N[−σ1+(xi−μ)2σ−3]⇔i=1∑N[−σ2+(xi−μ)2]=0⇔−i=1∑Nσ2+i=1∑N(xi−μ)2=0σMLE2=N1i=1∑N(xi−μ)2
μ \mu μ取 μ M L E \mu_{MLE} μMLE时, σ M L E 2 = 1 N ∑ i = 1 N ( x i − μ M L E ) 2 \sigma _{MLE}^{2}=\frac{1}{N}\sum_{i=1}^{N}(x_{i}-\mu _{MLE})^{2} σMLE2=N1i=1∑N(xi−μMLE)2
5.证明 σ M L E 2 \sigma _{MLE}^{2} σMLE2是有偏估计
要证明 σ M L E 2 \sigma _{MLE}^{2} σMLE2是有偏估计就需要判断 E [ σ M L E 2 ] = ? σ 2 E[\sigma _{MLE}^{2}]\overset{?}{=}\sigma ^{2} E[σMLE2]=?σ2,证明如下:
V a r [ μ M L E ] = V a r [ 1 N ∑ i = 1 N x i ] = 1 N 2 ∑ i = 1 N V a r [ x i ] = 1 N 2 ∑ i =