最大似然估计的渐近分布
记似然函数为
L
(
θ
)
=
∏
i
=
1
n
f
(
X
i
;
θ
)
L(\theta)=\prod_{i=1}^{n}f(X_i;\theta)
L(θ)=i=1∏nf(Xi;θ)
令
l
(
θ
)
=
l
o
g
L
(
θ
)
l(\theta)=logL(\theta)
l(θ)=logL(θ)为对数似然函数,设
θ
\theta
θ为真值,
θ
^
\hat{\theta}
θ^为最大似然估计值。则有
∂
l
(
θ
^
)
∂
θ
=
∂
l
(
θ
)
∂
θ
+
∂
2
l
(
θ
)
∂
θ
2
(
θ
^
−
θ
)
=
0
\frac{\partial l(\hat{\theta})}{\partial \theta} = \frac{\partial l(\theta)}{\partial \theta}+\frac{\partial^2 l(\theta)}{\partial \theta^2}(\hat{\theta}-\theta)=0
∂θ∂l(θ^)=∂θ∂l(θ)+∂θ2∂2l(θ)(θ^−θ)=0
从而
n
(
θ
^
−
θ
)
=
−
n
l
′
(
θ
)
l
′
′
(
θ
)
=
(
1
/
n
)
l
′
(
θ
)
−
(
1
/
n
)
l
′
′
(
θ
)
\sqrt{n}(\hat{ \theta}-\theta)=-\sqrt{n}\frac{l'(\theta)}{l''(\theta)}=\frac{(1/\sqrt{n})l'(\theta)}{-(1/n)l''(\theta)}
n(θ^−θ)=−nl′′(θ)l′(θ)=−(1/n)l′′(θ)(1/n)l′(θ)
(i)由于
1
n
l
′
(
θ
)
=
1
n
∑
i
∂
l
o
g
f
(
X
i
;
θ
)
∂
θ
=
n
1
n
∑
i
∂
l
o
g
f
(
X
i
;
θ
)
∂
θ
\begin{aligned} \frac{1}{\sqrt{n}}l'(\theta)&=\frac{1}{\sqrt{n}}\sum_i \frac{\partial log f(X_i;\theta)}{\partial \theta}\\ &=\sqrt{n}\frac{1}{n}\sum_i\frac{\partial log f(X_i;\theta)}{\partial \theta}\\ \end{aligned}
n1l′(θ)=n1i∑∂θ∂logf(Xi;θ)=nn1i∑∂θ∂logf(Xi;θ)
由于
E
[
∂
l
o
g
f
(
X
i
;
θ
)
/
∂
θ
=
/
0
]
\mathbb{E}[\partial logf(X_i;\theta)/\partial\theta=/0]
E[∂logf(Xi;θ)/∂θ=/0]以及
V
[
∂
l
o
g
f
(
X
i
;
θ
)
/
∂
θ
]
=
I
(
θ
)
\mathbb{V}[\partial logf(X_i;\theta)/\partial\theta]=I(\theta)
V[∂logf(Xi;θ)/∂θ]=I(θ)(看我之前Fisher信息矩阵的博客),进而由中心极限定理,可知
1
n
∑
i
∂
l
o
g
f
(
X
i
;
θ
)
∂
θ
⇝
N
(
0
,
I
(
θ
)
/
n
)
\frac{1}{n}\sum_i\frac{\partial log f(X_i;\theta)}{\partial \theta}\rightsquigarrow N(0,I(\theta)/n)
n1i∑∂θ∂logf(Xi;θ)⇝N(0,I(θ)/n)
因此
n
1
n
∑
i
∂
l
o
g
f
(
X
i
;
θ
)
∂
θ
⇝
N
(
0
,
I
(
θ
)
)
\sqrt{n}\frac{1}{n}\sum_i\frac{\partial log f(X_i;\theta)}{\partial \theta}\rightsquigarrow N(0,I(\theta))
nn1i∑∂θ∂logf(Xi;θ)⇝N(0,I(θ))
(ii)由于
−
(
1
/
n
)
l
′
′
(
θ
)
=
−
1
n
∑
i
∂
l
o
g
2
f
(
X
i
;
θ
)
∂
θ
2
-(1/n)l''(\theta)=-\frac{1}{n}\sum_i \frac{\partial log^2 f(X_i;\theta)}{\partial \theta^2}
−(1/n)l′′(θ)=−n1i∑∂θ2∂log2f(Xi;θ)
由于
E
[
−
∂
l
o
g
2
f
(
X
i
;
θ
)
∂
θ
2
]
=
I
(
θ
)
E[-\frac{\partial log^2 f(X_i;\theta)}{\partial \theta^2}]=I(\theta)
E[−∂θ2∂log2f(Xi;θ)]=I(θ),因此
−
(
1
/
n
)
l
′
′
(
θ
)
→
I
(
θ
)
-(1/n)l''(\theta)\rightarrow I(\theta)
−(1/n)l′′(θ)→I(θ)
( 1 / n ) l ′ ( θ ) − ( 1 / n ) l ′ ′ ( θ ) ⇝ N ( 0 , I ( θ ) I ( θ ) 2 ) = N ( 0 , I ( θ ) − 1 ) \frac{(1/\sqrt{n})l'(\theta)}{-(1/n)l''(\theta)}\rightsquigarrow N(0,\frac{I(\theta)}{I(\theta)^2})=N(0,I(\theta)^{-1}) −(1/n)l′′(θ)(1/n)l′(θ)⇝N(0,I(θ)2I(θ))=N(0,I(θ)−1)
因此,最大似然估计量具有渐进正态分布。
下面我们证明加权最大似然估计具有渐进正态分布。
Theorem 1.(Hidetoshi, 2000)
在一定的正则条件下,即模型足够光滑等,设加权最小二乘估计器为
θ
\theta
θ,真实值为
θ
∗
\theta^*
θ∗,则
n
(
θ
−
θ
∗
)
\sqrt{n}(\theta-\theta^*)
n(θ−θ∗)的渐进正态分布为
N
(
0
,
H
−
1
G
H
−
1
)
N(0,H^{-1}GH^{-1})
N(0,H−1GH−1),其中,
H
H
H和
G
G
G均为
m
×
m
m\times m
m×m非奇异矩阵,定义为
G
=
E
[
∂
l
w
(
x
,
y
∣
θ
)
∂
θ
∣
θ
∗
∂
l
w
(
x
,
y
∣
θ
)
∂
θ
T
∣
θ
∗
]
G=E[\frac{\partial l_w(x,y|\theta)}{\partial \theta}|_{\theta^{*}}\frac{\partial l_w(x,y|\theta)}{\partial \theta^T}|_{\theta^{*}}]
G=E[∂θ∂lw(x,y∣θ)∣θ∗∂θT∂lw(x,y∣θ)∣θ∗]
H = E [ ∂ 2 l w ( x , y ∣ θ ) ∂ θ ∂ θ T ∣ θ ∗ ] H=E[\frac{\partial^2 l_w(x,y|\theta)}{\partial \theta\partial \theta^T}|_{\theta^{*}}] H=E[∂θ∂θT∂2lw(x,y∣θ)∣θ∗]
其中,
l
w
(
x
,
y
∣
θ
)
=
−
w
(
x
)
l
o
g
p
(
y
∣
x
,
θ
)
l_w(x,y|\theta)=-w(x)logp(y|x,\theta)
lw(x,y∣θ)=−w(x)logp(y∣x,θ)
Proof.
证明思路应该是与最大似然估计类似。
最大加权似然估计量满足
∑
i
∂
l
w
(
x
i
,
y
i
∣
θ
)
∂
θ
∣
θ
=
θ
∗
=
0
\sum_i \frac{\partial l_w(x_i,y_i|\theta)}{\partial\theta}|_{\theta=\theta*}=0
i∑∂θ∂lw(xi,yi∣θ)∣θ=θ∗=0
求导,有
∑
i
∂
l
w
(
x
i
,
y
i
∣
θ
)
∂
θ
∣
θ
=
θ
∗
+
∑
i
∂
2
l
w
(
x
i
,
y
i
∣
θ
)
∂
θ
∂
θ
′
∣
θ
=
θ
∗
(
θ
−
θ
∗
)
=
0
\sum_i \frac{\partial l_w(x_i,y_i|\theta)}{\partial\theta}|_{\theta=\theta*}+\sum_i \frac{\partial^2 l_w(x_i,y_i|\theta)}{\partial\theta \partial \theta'}|_{\theta=\theta^*}(\theta-\theta^*)=0
i∑∂θ∂lw(xi,yi∣θ)∣θ=θ∗+i∑∂θ∂θ′∂2lw(xi,yi∣θ)∣θ=θ∗(θ−θ∗)=0
进一步
n
1
2
(
θ
−
θ
∗
)
=
n
−
1
/
2
n
−
1
∑
i
∂
l
w
(
x
i
,
y
i
∣
θ
)
∂
θ
∣
θ
=
θ
∗
∑
i
∂
2
l
w
(
x
i
,
y
i
∣
θ
)
∂
θ
∂
θ
′
∣
θ
=
θ
∗
n^{\frac{1}{2}}(\theta-\theta^*)=\frac{n^{-1/2}}{n^{-1}}\frac{\sum_i \frac{\partial l_w(x_i,y_i|\theta)}{\partial\theta}|_{\theta=\theta*}}{\sum_i \frac{\partial^2 l_w(x_i,y_i|\theta)}{\partial\theta \partial \theta'}|_{\theta=\theta^*}}
n21(θ−θ∗)=n−1n−1/2∑i∂θ∂θ′∂2lw(xi,yi∣θ)∣θ=θ∗∑i∂θ∂lw(xi,yi∣θ)∣θ=θ∗
变形:
n
−
1
∑
i
∂
2
l
w
(
x
i
,
y
i
∣
θ
)
∂
θ
∂
θ
′
∣
θ
=
θ
∗
n
1
2
(
θ
−
θ
∗
)
=
n
−
1
/
2
∑
i
∂
l
w
(
x
i
,
y
i
∣
θ
)
∂
θ
∣
θ
=
θ
∗
n^{-1}\sum_i \frac{\partial^2 l_w(x_i,y_i|\theta)}{\partial\theta \partial \theta'}|_{\theta=\theta^*}n^{\frac{1}{2}}(\theta-\theta^*)=n^{-1/2}\sum_i \frac{\partial l_w(x_i,y_i|\theta)}{\partial\theta}|_{\theta=\theta*}
n−1i∑∂θ∂θ′∂2lw(xi,yi∣θ)∣θ=θ∗n21(θ−θ∗)=n−1/2i∑∂θ∂lw(xi,yi∣θ)∣θ=θ∗
根据中心极限定理,右侧
⇝
N
(
0
,
G
)
\rightsquigarrow N(0,G)
⇝N(0,G),而左侧依据概率收敛到
H
n
(
θ
−
θ
∗
)
H\sqrt{n}(\theta-\theta^*)
Hn(θ−θ∗),从而直接得到结论
n
(
θ
−
θ
∗
)
⇝
N
(
0
,
H
−
1
G
H
−
1
)
\sqrt{n}(\theta-\theta^*)\rightsquigarrow N(0,H^{-1}GH^{-1})
n(θ−θ∗)⇝N(0,H−1GH−1)