一维高斯分布:
高斯分布
X : d a t a → X = ( x 1    x 2    . . .    x N ) N × p T x i ∈ R p x i ∼ i i d N ( μ , Σ ) θ = ( μ , Σ ) \begin{aligned} & X:data \rightarrow X=(x_1 \; x_2 \; ...\;x_N)^T_{N \times p} \\ & x_i \in R^p \\ & x_i \sim^{iid} N(\mu, \Sigma) \\ & \theta = (\mu, \Sigma) \end{aligned} X:data→X=(x1x2...xN)N×pTxi∈Rpxi∼iidN(μ,Σ)θ=(μ,Σ)
一维高斯分布:
p ( x ) = 1 2 π σ exp ( − ( x − μ ) 2 2 σ 2 ) p(x) = \frac{1}{\sqrt{2 \pi} \sigma} \exp \left( - \frac{(x - \mu)^2}{2 \sigma ^ 2} \right) p(x)=2πσ1exp(−2σ2(x−μ)2)
p维高斯分布:
p ( x ) = 1 2 π p 2 exp ( − 1 2 ( x − μ ) T Σ − 1 ( x − μ ) ) p(x) = \frac{1}{{2 \pi}^{\frac{p}{2}} } \exp \left( - \frac{1}{2} (x - \mu)^T \Sigma^{-1 } (x - \mu) \right) p(x)=2π2p1exp(−21(x−μ)TΣ−1(x−μ))
参数值公式推导
θ
M
L
E
=
a
r
g
max
θ
log
p
(
x
∣
θ
)
\theta_{MLE} = arg\max_\theta \log p(x|\theta)
θMLE=argθmaxlogp(x∣θ)
令
p
=
1
p=1
p=1,
θ
=
(
μ
,
σ
2
)
\theta=(\mu,\sigma^2)
θ=(μ,σ2)
log
p
(
x
∣
θ
)
=
log
∏
i
=
1
N
p
(
x
i
∣
θ
)
=
∑
i
=
1
N
log
p
(
x
i
∣
θ
)
=
∑
i
=
1
N
log
1
2
π
σ
exp
(
−
(
x
i
−
μ
)
2
2
σ
2
)
=
∑
i
=
1
N
[
log
1
2
π
+
log
1
σ
−
(
x
i
−
μ
)
2
2
σ
2
]
\begin{aligned} \log p(x|\theta) &= \log \prod_{i=1}^N p(x_i|\theta) \\ & = \sum_{i=1}^N \log p(x_i|\theta) \\ & = \sum_{i=1}^N \log \frac{1}{\sqrt{2 \pi} \sigma} \exp \left( - \frac{(x_i - \mu)^2}{2 \sigma ^ 2} \right) \\ & = \sum_{i=1}^N \left[ \log \frac{1}{\sqrt{2 \pi}} + \log \frac{1}{\sigma} - \frac{(x_i - \mu)^2}{2 \sigma^2} \right] \end{aligned}
logp(x∣θ)=logi=1∏Np(xi∣θ)=i=1∑Nlogp(xi∣θ)=i=1∑Nlog2πσ1exp(−2σ2(xi−μ)2)=i=1∑N[log2π1+logσ1−2σ2(xi−μ)2]
求
μ
\mu
μ:
μ
M
L
E
=
a
r
g
max
μ
log
p
(
x
∣
θ
)
=
a
r
g
max
μ
∑
i
=
1
N
−
(
x
i
−
μ
)
2
2
σ
2
=
a
r
g
min
μ
∑
i
=
1
N
(
x
i
−
μ
)
2
\begin{aligned} \mu_{MLE} &= arg \max_{\mu} \log p(x|\theta) \\ &= arg \max_{\mu} \sum_{i=1}^N - \frac{(x_i - \mu)^2}{2 \sigma^2} \\ & = arg \min_{\mu} \sum_{i=1}^N (x_i - \mu)^2 \end{aligned}
μMLE=argμmaxlogp(x∣θ)=argμmaxi=1∑N−2σ2(xi−μ)2=argμmini=1∑N(xi−μ)2
∂
∂
μ
∑
i
=
1
N
(
x
i
−
μ
)
2
=
∑
i
=
1
N
2
∗
(
x
i
−
μ
)
∗
(
−
1
)
=
0
∑
i
=
1
N
(
x
i
−
μ
)
=
0
∑
i
=
1
N
x
i
−
∑
i
=
1
N
μ
=
0
\begin{aligned} \frac {\partial}{\partial \mu} \sum_{i=1}^N (x_i - \mu)^2 &= \sum_{i=1}^N 2*(x_i - \mu)*(-1) = 0 \\ \sum_{i=1}^N (x_i - \mu) &= 0 \\ \sum_{i=1}^N x_i - \sum_{i=1}^N \mu &= 0 \end{aligned}
∂μ∂i=1∑N(xi−μ)2i=1∑N(xi−μ)i=1∑Nxi−i=1∑Nμ=i=1∑N2∗(xi−μ)∗(−1)=0=0=0
μ
M
L
E
=
1
N
∑
i
=
1
N
x
i
\mu_{MLE} = \frac{1}{N}\sum_{i=1}^N x_i
μMLE=N1i=1∑Nxi
求
σ
2
\sigma^2
σ2:
σ
M
L
E
2
=
a
r
g
max
σ
log
p
(
x
∣
θ
)
=
a
r
g
max
σ
∑
i
=
1
N
(
log
1
σ
−
(
x
i
−
μ
)
2
2
σ
2
)
\begin{aligned} \sigma^2_{MLE} &= arg \max_{\sigma} \log p(x|\theta) \\ & = arg \max_{\sigma} \sum_{i=1}^N \left( \log \frac{1}{\sigma} - \frac{(x_i - \mu)^2}{2 \sigma^2} \right) \end{aligned}
σMLE2=argσmaxlogp(x∣θ)=argσmaxi=1∑N(logσ1−2σ2(xi−μ)2)
∂
∂
σ
∑
i
=
1
N
(
log
1
σ
−
(
x
i
−
μ
)
2
2
σ
2
)
=
∑
i
=
1
N
(
−
1
σ
+
(
x
i
−
μ
)
2
∗
σ
−
3
)
=
0
∑
i
=
1
N
(
−
σ
−
2
+
(
x
i
−
μ
)
2
)
=
0
−
∑
i
=
1
N
σ
2
+
∑
i
=
1
N
(
x
i
−
μ
)
2
=
0
∑
i
=
1
N
σ
2
=
∑
i
=
1
N
(
x
i
−
μ
)
2
\begin{aligned} \frac {\partial}{\partial \sigma} \sum_{i=1}^N \left( \log \frac{1}{\sigma} - \frac{(x_i - \mu)^2}{2 \sigma^2} \right) &= \sum_{i=1}^N \left( -\frac{1}{\sigma} + (x_i - \mu)^2 * \sigma^{-3} \right) = 0 \\ \sum_{i=1}^N \left( -{\sigma}^{-2} + (x_i - \mu)^2 \right) &= 0 \\ -\sum_{i=1}^N \sigma^2 + \sum_{i=1}^N (x_i - \mu)^2 &= 0 \\ \sum_{i=1}^N \sigma^2 = \sum_{i=1}^N (x_i - \mu)^2 \end{aligned}
∂σ∂i=1∑N(logσ1−2σ2(xi−μ)2)i=1∑N(−σ−2+(xi−μ)2)−i=1∑Nσ2+i=1∑N(xi−μ)2i=1∑Nσ2=i=1∑N(xi−μ)2=i=1∑N(−σ1+(xi−μ)2∗σ−3)=0=0=0
σ
M
L
E
2
=
1
N
∑
i
=
1
N
(
x
i
−
μ
)
2
\sigma^2_{MLE} = \frac{1}{N}\sum_{i=1}^N (x_i - \mu)^2
σMLE2=N1i=1∑N(xi−μ)2
参数有偏无偏推导
μ
M
L
E
\mu_{MLE}
μMLE为无偏估计
E
[
μ
M
L
E
]
=
1
N
∑
i
=
1
N
E
[
x
i
]
=
1
N
∑
i
=
1
N
μ
=
μ
E[\mu_{MLE}]=\frac{1}{N}\sum_{i=1}^NE[x_i]=\frac{1}{N}\sum_{i=1}^N\mu=\mu
E[μMLE]=N1i=1∑NE[xi]=N1i=1∑Nμ=μ
V a r [ μ M L E ] = V a r [ 1 N ∑ i = 1 N x i ] = 1 N 2 ∑ i = 1 N V a r [ x i ] = 1 N 2 ∑ i = 1 N σ 2 = 1 N 2 N σ 2 = 1 N σ 2 \begin{aligned} Var[\mu_{MLE}] &= Var[\frac{1}{N} \sum_{i=1}^N x_i]=\frac{1}{N^2} \sum_{i=1}^N Var[x_i] \\ &=\frac{1}{N^2} \sum_{i=1}^N \sigma^2 = \frac{1}{N^2} N\sigma^2 = \frac{1}{N}\sigma^2 \end{aligned} Var[μMLE]=Var[N1i=1∑Nxi]=N21i=1∑NVar[xi]=N21i=1∑Nσ2=N21Nσ2=N1σ2
σ
M
L
E
2
\sigma^2_{MLE}
σMLE2 为有偏估计。
无偏估计值应为:
E
[
σ
M
L
E
2
]
=
N
−
1
N
σ
2
            
σ
^
=
1
N
−
1
∑
i
=
1
N
(
x
i
−
μ
)
2
E[\sigma^2_{MLE}]=\frac{N-1}{N}\sigma^2 \; \; \; \; \; \; \hat{\sigma}=\frac{1}{N-1}\sum_{i=1}^N (x_i - \mu)^2
E[σMLE2]=NN−1σ2σ^=N−11i=1∑N(xi−μ)2
σ
M
L
E
2
\sigma^2_{MLE}
σMLE2 公式推导:
σ
M
L
E
2
=
1
N
∑
i
=
1
N
(
x
i
−
μ
M
L
E
)
2
=
1
N
∑
i
=
1
N
(
x
i
2
−
2
x
i
μ
M
L
E
+
μ
M
L
E
2
)
=
1
N
∑
i
=
1
N
x
i
2
−
1
N
∑
i
=
1
N
2
x
i
μ
M
L
E
+
1
N
∑
i
=
1
N
μ
M
L
E
2
=
1
N
∑
i
=
1
N
x
i
2
−
μ
M
L
E
2
\begin{aligned} \sigma^2_{MLE} &= \frac{1}{N}\sum_{i=1}^N (x_i - \mu_{MLE})^2 \\ & = \frac{1}{N}\sum_{i=1}^N (x_i^2 - 2x_i\mu_{MLE}+\mu_{MLE}^2) \\ &= \frac{1}{N}\sum_{i=1}^N x_i^2 - \frac{1}{N}\sum_{i=1}^N 2x_i\mu_{MLE} + \frac{1}{N} \sum_{i=1}^N \mu_{MLE}^2 \\ &= \frac{1}{N}\sum_{i=1}^N x_i^2 - \mu_{MLE}^2 \end{aligned}
σMLE2=N1i=1∑N(xi−μMLE)2=N1i=1∑N(xi2−2xiμMLE+μMLE2)=N1i=1∑Nxi2−N1i=1∑N2xiμMLE+N1i=1∑NμMLE2=N1i=1∑Nxi2−μMLE2
E
[
σ
M
L
E
2
]
=
E
[
1
N
∑
i
=
1
N
x
i
2
−
μ
M
L
E
2
]
=
E
[
(
1
N
∑
i
=
1
N
x
i
2
−
μ
2
)
−
(
μ
M
L
E
2
−
μ
2
)
]
=
E
[
1
N
∑
i
=
1
N
x
i
2
−
μ
2
]
−
E
[
μ
M
L
E
2
−
μ
2
]
\begin{aligned} E[\sigma^2_{MLE}] &=E[ \frac{1}{N}\sum_{i=1}^N x_i^2 - \mu_{MLE}^2] \\ &=E[(\frac{1}{N}\sum_{i=1}^N x_i^2 - \mu^2)-( \mu_{MLE}^2 - \mu^2)] \\ &= E[\frac{1}{N}\sum_{i=1}^N x_i^2 - \mu^2] - E[ \mu_{MLE}^2 - \mu^2] \end{aligned}
E[σMLE2]=E[N1i=1∑Nxi2−μMLE2]=E[(N1i=1∑Nxi2−μ2)−(μMLE2−μ2)]=E[N1i=1∑Nxi2−μ2]−E[μMLE2−μ2]
E
[
1
N
∑
i
=
1
N
x
i
2
−
μ
2
]
=
E
[
1
N
∑
i
=
1
N
(
x
i
2
−
μ
2
)
]
=
1
N
∑
i
=
1
N
E
[
(
x
i
2
−
μ
2
)
]
=
1
N
∑
i
=
1
N
(
E
[
x
i
2
]
−
μ
2
)
=
1
N
∑
i
=
1
N
(
V
a
r
[
x
i
]
)
=
1
N
∑
i
=
1
N
(
σ
2
)
=
σ
2
\begin{aligned} E[\frac{1}{N}\sum_{i=1}^N x_i^2 - \mu^2] &= E[\frac{1}{N}\sum_{i=1}^N (x_i^2 - \mu^2)] \\ &= \frac{1}{N} \sum_{i=1}^N E[(x_i^2 - \mu^2)] \\ &= \frac{1}{N} \sum_{i=1}^N (E[x_i^2] - \mu^2) \\ &= \frac{1}{N} \sum_{i=1}^N (Var[x_i]) \\ &= \frac{1}{N} \sum_{i=1}^N (\sigma^2) \\ & = \sigma^2 \end{aligned}
E[N1i=1∑Nxi2−μ2]=E[N1i=1∑N(xi2−μ2)]=N1i=1∑NE[(xi2−μ2)]=N1i=1∑N(E[xi2]−μ2)=N1i=1∑N(Var[xi])=N1i=1∑N(σ2)=σ2
E
[
μ
M
L
E
2
−
μ
2
]
=
E
[
μ
M
L
E
2
]
−
E
[
μ
2
]
=
E
[
μ
M
L
E
2
]
−
μ
2
=
E
[
μ
M
L
E
2
]
−
E
[
μ
M
L
E
]
2
=
V
a
r
[
μ
M
L
E
]
=
1
N
σ
2
\begin{aligned} E[ \mu_{MLE}^2 - \mu^2] &= E[\mu_{MLE}^2] - E[ \mu^2] \\ &= E[\mu_{MLE}^2] - \mu^2 \\ &= E[\mu_{MLE}^2] - {E[\mu _{MLE}]}^2 \\ & = Var[\mu _{MLE}] \\ & = \frac{1}{N} \sigma ^ 2 \end{aligned}
E[μMLE2−μ2]=E[μMLE2]−E[μ2]=E[μMLE2]−μ2=E[μMLE2]−E[μMLE]2=Var[μMLE]=N1σ2
E
[
σ
M
L
E
2
]
=
σ
2
−
1
N
σ
2
=
N
−
1
N
σ
2
E[\sigma^2_{MLE}] = \sigma ^ 2 - \frac{1}{N} \sigma ^ 2 = \frac{N-1}{N} \sigma ^ 2
E[σMLE2]=σ2−N1σ2=NN−1σ2
B站链接:
https://www.bilibili.com/video/av32905863?from=search&seid=8309397892501615322