数据
设DataSet:X={(x1,y1),(x2,y2),(x3,y3)…(xn,yn))},其中
x
i
∈
R
p
xi \in R^p
xi∈Rp,
y
i
∈
R
yi \in R
yi∈R,也就是说X=
(
x
1
,
x
2
,
x
3.....
x
n
)
T
(x1,x2,x3.....xn)^T
(x1,x2,x3.....xn)T,其中这里,每个元素x
都是一个p维的列向量,我们继续化简,X=
[
x
11
x
12
.
.
.
x
1
p
x
21
x
22
.
.
.
x
2
p
.
.
.
.
.
.
x
n
1
x
n
2
.
.
.
x
n
p
]
(1)
\left[ \begin{matrix} x_{11} & x_{12} &... x_{1p} \\ x_{21} & x_{22} &... x_{2p} \\ \\...... \\x_{n1} & x_{n2} &... x_{_{np}} \end{matrix} \right]\tag{1}
⎣
⎡x11x21......xn1x12x22xn2...x1p...x2p...xnp⎦
⎤(1)
Y=
[
y
1
y
2
.
.
.
.
.
.
y
n
]
(2)
\left[ \begin{matrix} y_{1} \\ y_{2} \\ \\...... \\y_n \end{matrix} \right]\tag{2}
⎣
⎡y1y2......yn⎦
⎤(2)
x i ∈ R p x_i\in R^p xi∈Rp,每个元素x_i服从独立同分布,本文当中,为了方便计算,我们令p=1。设 θ = ( μ , σ 2 ) \theta=(\mu,\sigma^2) θ=(μ,σ2)一维标准高斯分布的pdf(概率密度函数): P ( X ∣ θ ) = 1 σ 2 Π + e x p ( − ( X − μ ) 2 2 σ 2 ) P(X|\theta) = \frac{1}{\sigma \sqrt{2\Pi}}+exp(-\frac{(X-\mu)^2}{2\sigma^2}) P(X∣θ)=σ2Π1+exp(−2σ2(X−μ)2)
参数估计均值
ln
θ
M
L
E
=
a
r
g
m
a
x
ln
P
(
X
∣
θ
)
\ln\theta _{MLE}=argmax \ln P(X|\theta)
lnθMLE=argmaxlnP(X∣θ)
=
a
r
g
m
a
x
∏
i
=
1
N
ln
P
(
x
i
∣
θ
)
=argmax\prod_{i=1}^N \ln P(x_i|\theta)
=argmax∏i=1NlnP(xi∣θ)
=
a
r
g
n
a
x
ln
∑
i
=
1
N
P
(
x
i
∣
θ
)
= argnax\ln\sum_{i=1}^NP(xi|\theta)
=argnaxln∑i=1NP(xi∣θ)
=
a
r
g
m
a
x
ln
∑
i
=
1
N
(
1
σ
2
Π
+
e
x
p
(
−
(
x
i
−
μ
)
2
2
σ
2
)
)
=argmax\ln\sum_{i=1}^N(\frac{1}{\sigma \sqrt{2\Pi}}+exp(-\frac{(x_i-\mu)^2}{2\sigma^2}))
=argmaxln∑i=1N(σ2Π1+exp(−2σ2(xi−μ)2))
=
a
r
g
m
a
x
∑
i
=
1
N
(
ln
1
2
Π
−
ln
σ
−
(
x
i
−
μ
)
2
2
σ
2
)
)
)
=argmax\sum_{i=1}^N(\ln\frac{1}{\sqrt2\Pi}-\ln\sigma-\frac{(x_i-\mu)^2}{2\sigma^2})))
=argmax∑i=1N(ln2Π1−lnσ−2σ2(xi−μ)2)))
化简到这里,我们的目标函数
L
(
θ
)
L(\theta)
L(θ)就化简完成了。
下面我们分别对
μ
,
σ
\mu,\sigma
μ,σ进行参数估计。
ln
μ
M
L
E
=
=
a
r
g
m
a
x
∑
i
=
1
N
(
ln
1
2
Π
−
ln
σ
−
(
x
i
−
μ
)
2
2
σ
2
)
)
)
\ln\mu_{MLE}= =argmax\sum_{i=1}^N(\ln\frac{1}{\sqrt2\Pi}-\ln\sigma-\frac{(x_i-\mu)^2}{2\sigma^2})))
lnμMLE==argmax∑i=1N(ln2Π1−lnσ−2σ2(xi−μ)2)))
因为其他两项都和系数
μ
\mu
μ无关,在求解偏导数的时候可以约去,所以:
ln
μ
M
L
E
=
a
r
g
m
a
x
∑
i
=
1
N
−
(
x
i
−
μ
)
2
2
σ
2
\ln\mu_{MLE}=argmax\sum_{i=1}^N-\frac{(x_i-\mu)^2}{2\sigma^2}
lnμMLE=argmax∑i=1N−2σ2(xi−μ)2
=
a
r
g
m
i
n
∑
i
=
1
N
(
x
i
−
μ
)
2
=argmin\sum_{i=1}^N(x_i-\mu)^2
=argmin∑i=1N(xi−μ)2
=
∂
∂
μ
∑
i
=
1
N
(
x
i
2
−
2
x
i
μ
+
μ
2
)
=
0
=\frac{\partial}{\partial \mu}\sum_{i=1}^N(x_i^2-2x_i\mu+\mu^2)=0
=∂μ∂∑i=1N(xi2−2xiμ+μ2)=0
=
∑
i
=
1
N
(
−
2
x
i
+
2
μ
)
=
0
=\sum_{i=1}^N(-2x_i+2\mu)=0
=∑i=1N(−2xi+2μ)=0
∑
i
=
1
N
x
i
=
N
μ
\sum_{i=1}^Nx_i=N\mu
∑i=1Nxi=Nμ
μ
M
L
E
=
1
N
∑
i
=
1
N
x
i
\mu_{MLE}=\frac{1}{N}\sum_{i=1}^Nx_i
μMLE=N1∑i=1Nxi
因为
E
[
μ
M
L
E
]
=
1
N
∑
i
=
1
N
E
[
x
i
]
=
1
N
∑
i
=
1
N
μ
=
μ
E[\mu_{MLE}]=\frac{1}{N}\sum_{i=1}^NE[x_i]=\frac{1}{N}\sum_{i=1}^N\mu=\mu
E[μMLE]=N1∑i=1NE[xi]=N1∑i=1Nμ=μ
所以此结果为无偏估计
参数估计方差
我们上面求出来的L(X)带入到这里
ln
θ
M
L
E
=
a
r
g
m
a
x
ln
P
(
X
∣
θ
)
\ln\theta _{MLE}=argmax \ln P(X|\theta)
lnθMLE=argmaxlnP(X∣θ)
=
a
r
g
m
a
x
∏
i
=
1
N
ln
P
(
x
i
∣
θ
)
=argmax\prod_{i=1}^N \ln P(x_i|\theta)
=argmax∏i=1NlnP(xi∣θ)
=
a
r
g
n
a
x
ln
∑
i
=
1
N
P
(
x
i
∣
θ
)
= argnax\ln\sum_{i=1}^NP(xi|\theta)
=argnaxln∑i=1NP(xi∣θ)
=
a
r
g
m
a
x
ln
∑
i
=
1
N
(
1
σ
2
Π
+
e
x
p
(
−
(
x
i
−
μ
)
2
2
σ
2
)
)
=argmax\ln\sum_{i=1}^N(\frac{1}{\sigma \sqrt{2\Pi}}+exp(-\frac{(x_i-\mu)^2}{2\sigma^2}))
=argmaxln∑i=1N(σ2Π1+exp(−2σ2(xi−μ)2))
=
a
r
g
m
a
x
∑
i
=
1
N
(
ln
1
2
Π
−
ln
σ
−
(
x
i
−
μ
)
2
2
σ
2
)
=argmax\sum_{i=1}^N(\ln\frac{1}{\sqrt2\Pi}-\ln\sigma-\frac{(x_i-\mu)^2}{2\sigma^2})
=argmax∑i=1N(ln2Π1−lnσ−2σ2(xi−μ)2)
σ
M
L
E
2
=
a
r
g
m
a
x
∑
i
=
1
N
(
ln
1
2
Π
−
ln
σ
−
(
x
i
−
μ
)
2
2
σ
2
)
\sigma^2_{MLE}=argmax\sum_{i=1}^N(\ln\frac{1}{\sqrt2\Pi}-\ln\sigma-\frac{(x_i-\mu)^2}{2\sigma^2})
σMLE2=argmax∑i=1N(ln2Π1−lnσ−2σ2(xi−μ)2)
=
∂
∂
σ
∑
i
=
1
N
(
1
2
Π
−
ln
σ
−
(
x
i
−
μ
)
2
2
σ
2
)
=
0
=\frac{\partial}{\partial \sigma}\sum_{i=1}^N(\frac{1}{\sqrt2\Pi}-\ln\sigma-\frac{(x_i-\mu)^2}{2\sigma^2})=0
=∂σ∂∑i=1N(2Π1−lnσ−2σ2(xi−μ)2)=0
=
∑
i
=
1
N
(
−
1
σ
−
(
−
2
)
σ
−
3
(
x
i
−
μ
)
2
2
)
=
0
=\sum_{i=1}^N(-\frac{1}{\sigma}-(-2)\sigma^{-3}\frac{(x_i-\mu)^2} {2})=0
=∑i=1N(−σ1−(−2)σ−32(xi−μ)2)=0
左右同时✖️
σ
3
\sigma^3
σ3
=
∑
i
=
1
N
(
(
−
σ
)
2
+
(
x
i
−
μ
)
2
)
=
0
=\sum_{i=1}^N((-\sigma)^2+(x_i-\mu)^2)=0
=∑i=1N((−σ)2+(xi−μ)2)=0
σ
M
L
E
2
=
1
N
∑
i
=
1
N
(
x
i
−
μ
)
2
=
0
\sigma^2_{MLE}=\frac{1}{N}\sum_{i=1}^N(x_i-\mu)^2=0
σMLE2=N1∑i=1N(xi−μ)2=0
因为
E
[
σ
M
L
E
2
]
=
N
−
1
N
σ
2
因为E[\sigma_{MLE}^2]=\frac{N-1}{N}\sigma^2
因为E[σMLE2]=NN−1σ2
所以此结果为有偏估计