写在前面
最大似然估计具有很多好的性质,包括相合性,同变性,渐进正态性等。本文主要关注的是渐进正态性。渐近正态性表明,估计量的极限分布是正态分布。而该正态分布的方差,与Fisher信息有着密不可分的关系。
Fisher信息
(定义)记分函数(Score Function):
s
(
X
;
θ
)
=
∂
l
o
g
f
(
X
;
θ
)
∂
θ
.
s(X;\theta)=\frac{\partial logf(X;\theta)}{\partial \theta}.
s(X;θ)=∂θ∂logf(X;θ).
(定义)Fisher信息量(Fisher Information):
I
n
(
θ
)
=
V
(
∑
i
=
1
n
s
(
X
i
;
θ
)
)
=
∑
i
=
1
n
V
(
s
(
X
i
;
θ
)
)
\begin{aligned} I_n(\theta)&=\mathbb{V}(\sum_{i=1}^{n}s(X_i;\theta))\\ &=\sum_{i=1}^{n}\mathbb{V}(s(X_i;\theta)) \end{aligned}
In(θ)=V(i=1∑ns(Xi;θ))=i=1∑nV(s(Xi;θ))
(定理)
E
θ
[
s
(
X
;
θ
)
]
=
0
\mathbb{E}_\theta[s(X;\theta)]=0
Eθ[s(X;θ)]=0
证明:
E
θ
[
s
(
X
;
θ
)
]
=
∫
x
∂
l
o
g
f
(
x
;
θ
)
∂
θ
f
(
x
;
θ
)
d
x
=
∫
x
1
f
(
x
;
θ
)
∂
f
(
x
;
θ
)
∂
θ
f
(
x
;
θ
)
d
x
=
∫
x
∂
f
(
x
;
θ
)
∂
θ
d
x
=
∂
∂
θ
∫
x
f
(
x
;
θ
)
d
x
=
∂
∂
θ
1
=
0
\begin{aligned} \mathbb{E}_\theta[s(X;\theta)] &= \int_x\frac{\partial logf(x;\theta)}{\partial \theta}f(x;\theta)dx\\ &=\int_x\frac{1}{f(x;\theta)}\frac{\partial f(x;\theta)}{\partial \theta}f(x;\theta)dx\\ &=\int_x\frac{\partial f(x;\theta)}{\partial \theta}dx\\ &=\frac{\partial}{\partial \theta} \int_xf(x;\theta)dx=\frac{\partial}{\partial \theta}1\\ &=0 \end{aligned}
Eθ[s(X;θ)]=∫x∂θ∂logf(x;θ)f(x;θ)dx=∫xf(x;θ)1∂θ∂f(x;θ)f(x;θ)dx=∫x∂θ∂f(x;θ)dx=∂θ∂∫xf(x;θ)dx=∂θ∂1=0
(定理)若
f
(
X
;
θ
)
f(X;\theta)
f(X;θ)二阶可导,则Fisher信息矩阵可以写为如下形式:
I
n
(
θ
)
=
n
I
(
θ
)
=
−
n
∫
x
∂
2
l
o
g
f
(
x
;
θ
)
∂
θ
2
f
(
x
;
θ
)
d
x
I_n(\theta)=nI(\theta)=-n\int_x\frac{\partial^2logf(x;\theta)}{\partial\theta^2}f(x;\theta)dx
In(θ)=nI(θ)=−n∫x∂θ2∂2logf(x;θ)f(x;θ)dx
证明:
V
θ
[
s
(
X
;
θ
)
]
=
E
θ
[
s
(
X
;
θ
)
2
]
−
E
θ
[
s
(
X
;
θ
)
]
2
=
E
θ
[
s
(
X
;
θ
)
2
]
=
∫
x
∂
l
o
g
f
(
x
;
θ
)
∂
θ
∂
l
o
g
f
(
x
;
θ
)
∂
θ
f
(
x
;
θ
)
d
x
∫
x
∂
2
l
o
g
f
(
x
;
θ
)
∂
θ
2
f
(
x
;
θ
)
d
x
=
∫
x
∂
∂
θ
(
1
f
(
x
;
θ
)
∂
f
(
x
;
θ
)
∂
θ
)
d
x
=
∫
x
−
(
∂
f
(
x
;
θ
)
∂
θ
)
2
f
(
x
;
θ
)
2
+
(
∂
2
f
(
x
;
θ
)
∂
θ
2
)
f
(
x
;
θ
)
f
(
x
;
θ
)
d
x
=
∫
x
−
(
∂
f
(
x
;
θ
)
∂
θ
)
2
f
(
x
;
θ
)
2
d
x
=
−
∫
x
∂
2
l
o
g
f
(
x
;
θ
)
∂
θ
2
f
(
x
;
θ
)
d
x
\begin{aligned} \mathbb{V}_\theta[s(X;\theta)] &= E_{\theta}[s(X;\theta)^2]-E_\theta[s(X;\theta)]^2\\ &= E_{\theta}[s(X;\theta)^2]\\ &= \int_x\frac{\partial logf(x;\theta)}{\partial \theta}\frac{\partial logf(x;\theta)}{\partial \theta}f(x;\theta)dx\\ \int_x\frac{\partial^2logf(x;\theta)}{\partial\theta^2}f(x;\theta)dx &= \int_x \frac{\partial}{\partial \theta}(\frac{1}{f(x;\theta)}\frac{\partial f(x;\theta)}{\partial \theta})dx\\ &=\int_x-\frac{(\frac{\partial f(x;\theta)}{\partial \theta})^2}{f(x;\theta)^2}+\frac{(\frac{\partial ^2f(x;\theta)}{\partial \theta^2})}{f(x;\theta)}f(x;\theta)dx \\ &= \int_x-\frac{(\frac{\partial f(x;\theta)}{\partial \theta})^2}{f(x;\theta)^2}dx\\ &=-\int_x\frac{\partial^2logf(x;\theta)}{\partial\theta^2}f(x;\theta)dx \end{aligned}
Vθ[s(X;θ)]∫x∂θ2∂2logf(x;θ)f(x;θ)dx=Eθ[s(X;θ)2]−Eθ[s(X;θ)]2=Eθ[s(X;θ)2]=∫x∂θ∂logf(x;θ)∂θ∂logf(x;θ)f(x;θ)dx=∫x∂θ∂(f(x;θ)1∂θ∂f(x;θ))dx=∫x−f(x;θ)2(∂θ∂f(x;θ))2+f(x;θ)(∂θ2∂2f(x;θ))f(x;θ)dx=∫x−f(x;θ)2(∂θ∂f(x;θ))2dx=−∫x∂θ2∂2logf(x;θ)f(x;θ)dx
渐进正态性
极大似然估计具有渐进正态性
θ
^
n
−
θ
s
e
→
N
(
0
,
1
)
\frac{\hat{\theta}_n-\theta}{se}\rightarrow N(0,1)
seθ^n−θ→N(0,1)
其中,
s
e
≈
1
I
n
(
θ
)
≈
1
I
n
(
θ
^
)
se\approx\sqrt{\frac{1}{I_n(\theta)}}\approx\sqrt{\frac{1}{I_n(\hat{\theta})}}
se≈In(θ)1≈In(θ^)1
证明从略,资料比较多。
由此可以构建估计的置信区间。
Bernoulli分布的最大似然估计及其方差
设
X
1
,
⋯
,
X
n
∼
B
e
r
n
o
u
l
l
i
(
p
)
X_1, \cdots,X_n \sim Bernoulli(p)
X1,⋯,Xn∼Bernoulli(p),则其似然函数是
L
(
p
)
=
∏
i
=
1
n
p
X
i
(
1
−
p
)
1
−
X
i
L(p)=\prod_{i=1}^{n} p^{X_i}(1-p)^{1-X_i}
L(p)=i=1∏npXi(1−p)1−Xi
l
o
g
L
(
p
)
=
∑
i
n
X
i
l
o
g
p
+
(
1
−
X
i
)
l
o
g
(
1
−
p
)
logL(p)=\sum_{i}^{n}X_ilogp+(1-X_i)log(1-p)
logL(p)=i∑nXilogp+(1−Xi)log(1−p)
最大化对数似然,就得到:
d
d
x
l
o
g
L
(
p
)
=
0
∑
i
n
X
i
1
p
−
(
1
−
X
i
)
1
1
−
p
=
0
p
=
1
n
∑
i
=
1
n
X
i
\begin{aligned} &\frac{d}{dx}logL(p)=0\\ &\sum_{i}^{n}X_i \frac{1}{p}-(1-X_i) \frac{1}{1-p}=0\\ &p=\frac{1}{n}\sum_{i=1}^{n}X_i \end{aligned}
dxdlogL(p)=0i∑nXip1−(1−Xi)1−p1=0p=n1i=1∑nXi
其记分函数是:
∂
l
o
g
L
(
p
)
∂
p
=
X
p
−
1
−
X
1
−
p
\frac{\partial logL(p)}{\partial p}=\frac{X}{p}-\frac{1-X}{1-p}
∂p∂logL(p)=pX−1−p1−X
I
(
p
)
=
−
E
θ
[
d
(
X
p
−
1
−
X
1
−
p
)
d
p
]
=
1
1
−
p
+
1
p
=
1
p
(
1
−
p
)
I(p)=-E_\theta[\frac{d(\frac{X}{p}-\frac{1-X}{1-p})}{dp}]=\frac{1}{1-p}+\frac{1}{p}\\=\frac{1}{p(1-p)}
I(p)=−Eθ[dpd(pX−1−p1−X)]=1−p1+p1=p(1−p)1
I
n
(
p
)
=
n
I
(
p
)
I_n(p)=nI(p)
In(p)=nI(p),估计的方差
V
(
p
)
=
n
p
(
1
−
p
)
≈
n
p
^
(
1
−
p
^
)
V(p)=np(1-p) \approx n\hat{p}(1-\hat{p})
V(p)=np(1−p)≈np^(1−p^)