Reference:
Kay S M. Fundamentals of statistical signal processing[M]. Prentice Hall PTR, 1993. (Chapter 3-3.5)
Slides of ET4386, TUD
Content
Estimator Accuracy Considerations
When the PDF is viewed as a function of the unknown parameter (with x \mathbf x x fixed), it is termed the likelihood function. Two examples of likelihood functions were shown in Figure 3.1.
Intuitively,
curvature: the negative of the second derivative of the logarithm of the likelihood function
⇓
the sharpness of the likelihood function
⇓
how accurately we can estimate the unknown parameter
\begin{array}{c} \text{curvature: the negative of the second derivative of the logarithm of the likelihood function}\\\Downarrow\\ \text{the sharpness of the likelihood function}\\\Downarrow\\ \text{how accurately we can estimate the unknown parameter} \end{array}
curvature: the negative of the second derivative of the logarithm of the likelihood function⇓the sharpness of the likelihood function⇓how accurately we can estimate the unknown parameter
Define a measure of curvature:
−
E
[
∂
2
ln
p
(
x
[
0
]
;
A
)
∂
A
2
]
-E\left[\frac{\partial^2\ln p(x[0];A)}{\partial A^2}\right]
−E[∂A2∂2lnp(x[0];A)]
which measures the average curvature of the log-likelihood function. The expectation is taken with respect to
p
(
x
[
0
]
;
A
)
p(x[0]; A)
p(x[0];A), resulting in a function of
A
A
A only. The larger the quantity, the smaller the variance of the estimator.
To prove the intuition above mathematically, we first introduce some definitions.
Score function and regularity condition
The score function is the gradient of the log-likelihood function
s
(
x
;
θ
)
=
∂
ln
p
(
x
;
θ
)
∂
θ
s(\mathbf x;\theta)=\frac{\partial \ln p(\mathbf x;\theta)}{\partial \theta}
s(x;θ)=∂θ∂lnp(x;θ)
which indicates the steepness of the log-likelihood function.
If
∂
∂
θ
ln
p
(
x
;
θ
)
\frac{\partial}{\partial \theta}\ln p(\mathbf x;\theta)
∂θ∂lnp(x;θ) exists and is finite, and
∫
∂
p
(
x
;
θ
)
∂
θ
d
x
=
∂
∂
θ
∫
p
(
x
;
θ
)
d
x
\int \frac{\partial p(\mathbf x;\theta)}{\partial \theta}d \mathbf x=\frac{\partial}{\partial \theta}\int p(\mathbf x;\theta)d\mathbf x
∫∂θ∂p(x;θ)dx=∂θ∂∫p(x;θ)dx
then the PDF
p
(
x
;
θ
)
p(\mathbf x;\theta)
p(x;θ) satisfies the following regularity condition:
E
[
s
(
x
;
θ
)
]
=
E
[
∂
ln
p
(
x
;
θ
)
∂
θ
]
=
∫
∂
ln
p
(
x
;
θ
)
∂
θ
p
(
x
;
θ
)
d
x
=
∫
∂
p
(
x
;
θ
)
∂
θ
d
x
=
∂
∂
θ
∫
p
(
x
;
θ
)
d
x
=
0
,
for all
θ
\begin{aligned} E[s(\mathbf x;\theta)]&=E[\frac{\partial \ln p(\mathbf x;\theta)}{\partial \theta}]=\int \frac{\partial \ln p(\mathbf x;\theta)}{\partial \theta} p(\mathbf x;\theta)d \mathbf x\\ &=\int \frac{\partial p(\mathbf x;\theta)}{\partial \theta} d \mathbf x=\frac{\partial}{\partial \theta}\int p(\mathbf x;\theta) d \mathbf x=0, \text{ for all }\theta \end{aligned}
E[s(x;θ)]=E[∂θ∂lnp(x;θ)]=∫∂θ∂lnp(x;θ)p(x;θ)dx=∫∂θ∂p(x;θ)dx=∂θ∂∫p(x;θ)dx=0, for all θ
unless the domain of the PDF for which it is nonzero depends on the unknown parameter. For instance
x
[
n
]
∼
U
[
0
,
θ
]
x[n]\sim \mathcal{U}[0,\theta]
x[n]∼U[0,θ].
Fisher information
The variance of the score function is the Fisher information
I
(
θ
)
=
−
E
[
∂
2
ln
p
(
x
;
θ
)
)
∂
θ
2
]
=
E
[
(
∂
ln
p
(
x
;
θ
)
∂
θ
)
2
]
I(\theta)=-{E}\left[\frac{\left.\partial^{2} \ln p(\mathbf{x} ; \theta)\right)}{\partial \theta^2}\right]={E}\left[\left(\frac{\partial \ln p(\mathbf{x} ; \theta)}{\partial \theta}\right)^{2}\right]
I(θ)=−E[∂θ2∂2lnp(x;θ))]=E[(∂θ∂lnp(x;θ))2]
Proof: From the regularity conditions, we obtain
∂
∂
θ
E
[
∂
ln
p
(
x
;
θ
)
∂
θ
]
=
0
⇒
∂
∂
θ
∫
∂
ln
p
(
x
;
θ
)
∂
θ
p
(
x
;
θ
)
d
x
=
0
\frac{\partial}{\partial \theta} {E}\left[\frac{\partial \ln p(\mathbf{x} ; \theta)}{\partial \theta}\right]=0 \Rightarrow \frac{\partial}{\partial \theta} \int \frac{\partial \ln p(\mathbf{x} ; \theta)}{\partial \theta} p(\mathbf{x} ; \theta) d \mathbf{x}=0
∂θ∂E[∂θ∂lnp(x;θ)]=0⇒∂θ∂∫∂θ∂lnp(x;θ)p(x;θ)dx=0
or,
∫
[
∂
2
ln
p
(
x
;
θ
)
∂
θ
2
p
(
x
;
θ
)
+
∂
ln
p
(
x
;
θ
)
∂
θ
∂
p
(
x
;
θ
)
∂
θ
]
d
x
=
0
\int\left[\frac{\partial^{2} \ln p(\mathbf{x} ; \theta)}{\partial \theta^{2}} p(\mathbf{x} ; \theta)+\frac{\partial \ln p(\mathbf{x} ; \theta)}{\partial \theta} \frac{\partial p(\mathbf{x} ; \theta)}{\partial \theta}\right] d \mathbf{x}=0
∫[∂θ2∂2lnp(x;θ)p(x;θ)+∂θ∂lnp(x;θ)∂θ∂p(x;θ)]dx=0
and rearranging the terms,
−
∫
∂
2
ln
p
(
x
;
θ
)
∂
θ
2
p
(
x
;
θ
)
d
x
=
∫
(
∂
ln
p
(
x
;
θ
)
∂
θ
)
2
p
(
x
;
θ
)
d
x
−
E
[
∂
2
ln
p
(
x
;
θ
)
∂
θ
2
]
=
E
[
(
∂
ln
p
(
x
;
θ
)
∂
θ
)
2
]
\begin{aligned} -\int \frac{\partial^{2} \ln p(\mathbf{x} ; \theta)}{\partial \theta^{2}} p(\mathbf{x} ; \theta) d \mathbf{x} &=\int\left(\frac{\partial \ln p(\mathbf{x} ; \theta)}{\partial \theta}\right)^{2} p(\mathbf{x} ; \theta) d \mathbf{x} \\ -{E}\left[\frac{\partial^{2} \ln p(\mathbf{x} ; \theta)}{\partial \theta^{2}}\right] &={E}\left[\left(\frac{\partial \ln p(\mathbf{x} ; \theta)}{\partial \theta}\right)^{2}\right] \end{aligned}
−∫∂θ2∂2lnp(x;θ)p(x;θ)dx−E[∂θ2∂2lnp(x;θ)]=∫(∂θ∂lnp(x;θ))2p(x;θ)dx=E[(∂θ∂lnp(x;θ))2]
The fisher information is
-
Non-negative, and
-
Additive for independent observations, i.e., when
ln p ( x ; θ ) = ∑ n = 0 N − 1 ln p ( x [ n ] ; θ ) , \ln p(\mathbf x;\theta)=\sum_{n=0}^{N-1} \ln p(x[n];\theta), lnp(x;θ)=n=0∑N−1lnp(x[n];θ),
then
− E [ ∂ 2 ln p ( x ; θ ) ∂ θ 2 ] = ∑ n = 0 N − 1 − E [ ∂ 2 ln p ( x [ n ] ; θ ) ∂ θ 2 ] -{E}\left[\frac{\partial^{2} \ln p(\mathbf{x} ; \theta)}{\partial \theta^{2}}\right]=\sum_{n=0}^{N-1}-{E}\left[\frac{\partial^{2} \ln p(x[n] ; \theta)}{\partial \theta^{2}}\right] −E[∂θ2∂2lnp(x;θ)]=n=0∑N−1−E[∂θ2∂2lnp(x[n];θ)]
and for identically distributed observations
I ( θ ) = N i ( θ ) , where i ( θ ) = − E [ ∂ 2 ln p ( x [ n ] ; θ ) ∂ θ 2 ] I(\theta)=Ni(\theta),\text{ where } i(\theta)=-{E}\left[\frac{\partial^{2} \ln p(x[n] ; \theta)}{\partial \theta^{2}}\right] I(θ)=Ni(θ), where i(θ)=−E[∂θ2∂2lnp(x[n];θ)]
Cramer-Rao Lower Bound Theorem
It is assumed that the PDF p ( x ; θ ) p(\mathbf{x} ; \theta) p(x;θ) satisfies the “regularity” condition
E [ ∂ ln p ( x ; θ ) ∂ θ ] = 0 for all θ (CR.0) E\left[\frac{\partial \ln p(\mathbf{x} ; \theta)}{\partial \theta}\right]=0 \quad \text { for all } \theta \tag{CR.0} E[∂θ∂lnp(x;θ)]=0 for all θ(CR.0)
where the expectation is taken with respect to p ( x ; θ ) . p(\mathbf{x} ; \theta) . p(x;θ). Then, the variance of any unbiased estimator θ ^ \hat{\theta} θ^ must satisfy
var ( θ ^ ) ≥ 1 − E [ ∂ 2 ln p ( x ; θ ) ∂ θ 2 ] = 1 E [ ( ∂ ln p ( x ; θ ) ∂ θ ) 2 ] = 1 I ( θ ) (CR.6) \operatorname{var}(\hat{\theta}) \geq \frac{1}{-E\left[\frac{\partial^{2} \ln p(\mathbf{x} ; \theta)}{\partial \theta^{2}}\right]}=\frac{1}{{E}\left[\left(\frac{\partial \ln p(\mathbf{x} ; \theta)}{\partial \theta}\right)^{2}\right]}=\frac{1}{I(\theta)}\tag{CR.6} var(θ^)≥−E[∂θ2∂2lnp(x;θ)]1=E[(∂θ∂lnp(x;θ))2]1=I(θ)1(CR.6)
where the derivative is evaluated at the true value of θ \theta θ and the expectation is taken with respect to p ( x ; θ ) . p(\mathbf{x} ; \theta) . p(x;θ). Furthermore, an unbiased estimator may be found that attains the bound for all θ \theta θ if and only if
∂ ln p ( x ; θ ) ∂ θ = I ( θ ) ( g ( x ) − θ ) (CR.7) \frac{\partial \ln p(\mathbf{x} ; \theta)}{\partial \theta}=I(\theta)(g(\mathbf{x})-\theta)\tag{CR.7} ∂θ∂lnp(x;θ)=I(θ)(g(x)−θ)(CR.7)
for some functions g g g and I . I . I. That estimator, which is the M V U M V U MVU estimator, is θ ^ = g ( x ) \hat{\theta}=g(x) θ^=g(x) and the minimum variance is 1 / I ( θ ) . 1 / I(\theta) . 1/I(θ).
Proof: Consider a scalar parameter
α
=
g
(
θ
)
\alpha=g(\theta)
α=g(θ) where the PDF is parameterized by
θ
\theta
θ. Assume the estimators are unbiased, i.e.,
E
(
α
^
)
=
α
=
g
(
θ
)
E(\hat \alpha)=\alpha=g(\theta)
E(α^)=α=g(θ)
or
∫
α
^
p
(
x
;
θ
)
d
x
=
g
(
θ
)
(CR.1)
\int \hat \alpha p(\mathbf x;\theta)d \mathbf x=g(\theta)\tag{CR.1}
∫α^p(x;θ)dx=g(θ)(CR.1)
From Section [Score function and regularity condition](# Score function and regularity condition) we already know that the regularity condition will be satisfied if the order of differentiation and integration may be interchanged. This is generally true except when the domain of the PDF for which it is nonzero depends on the unknown parameter.
Now differentiating both sides of
(
C
R
.
1
)
(CR.1)
(CR.1) with respect to
θ
\theta
θ and interchanging the partial differentiation and integration produces
∫
α
^
∂
p
(
x
;
θ
)
∂
θ
d
x
=
∂
g
(
θ
)
∂
θ
\int \hat \alpha \frac{\partial p(\mathbf x;\theta)}{\partial \theta}d\mathbf x=\frac{\partial g(\theta)}{\partial \theta}
∫α^∂θ∂p(x;θ)dx=∂θ∂g(θ)
or
∫
α
^
∂
ln
p
(
x
;
θ
)
∂
θ
p
(
x
;
θ
)
d
x
=
∂
g
(
θ
)
∂
θ
(CR.2)
\int \hat{\alpha} \frac{\partial \ln p(\mathbf{x} ; \theta)}{\partial \theta} p(\mathbf{x} ; \theta) d \mathbf{x}=\frac{\partial g(\theta)}{\partial \theta}\tag{CR.2}
∫α^∂θ∂lnp(x;θ)p(x;θ)dx=∂θ∂g(θ)(CR.2)
We can modify this using the regularity condition to produce
∫
(
α
^
−
α
)
∂
ln
p
(
x
;
θ
)
∂
θ
p
(
x
;
θ
)
d
x
=
∂
g
(
θ
)
∂
θ
(CR.3)
\int(\hat{\alpha}-\alpha) \frac{\partial \ln p(\mathbf{x} ; \theta)}{\partial \theta} p(\mathbf{x} ; \theta) d \mathbf{x}=\frac{\partial g(\theta)}{\partial \theta}\tag{CR.3}
∫(α^−α)∂θ∂lnp(x;θ)p(x;θ)dx=∂θ∂g(θ)(CR.3)
since
∫
α
∂
ln
p
(
x
;
θ
)
∂
θ
p
(
x
;
θ
)
d
x
=
α
E
[
∂
ln
p
(
x
;
θ
)
∂
θ
]
=
0
\int \alpha \frac{\partial \ln p(\mathbf{x} ; \theta)}{\partial \theta} p(\mathbf{x} ; \theta) d \mathbf{x}=\alpha E\left[\frac{\partial \ln p(\mathbf{x} ; \theta)}{\partial \theta}\right]=0
∫α∂θ∂lnp(x;θ)p(x;θ)dx=αE[∂θ∂lnp(x;θ)]=0
We now apply the Cauchy-Schwarz inequality
[
∫
w
(
x
)
g
(
x
)
h
(
x
)
d
x
]
2
≤
∫
w
(
x
)
g
2
(
x
)
d
x
∫
w
(
x
)
h
2
(
x
)
d
x
(CR.4)
\left[\int w(\mathbf{x}) g(\mathbf{x}) h(\mathbf{x}) d \mathbf{x}\right]^{2} \leq \int w(\mathbf{x}) g^{2}(\mathbf{x}) d \mathbf{x} \int w(\mathbf{x}) h^{2}(\mathbf{x}) d \mathbf{x}\tag{CR.4}
[∫w(x)g(x)h(x)dx]2≤∫w(x)g2(x)dx∫w(x)h2(x)dx(CR.4)
which holds with equality if and only if
g
(
x
)
=
c
h
(
x
)
g(\mathbf{x})={ch}(\mathbf{x})
g(x)=ch(x) for
c
c
c some constant not dependent on
x
x
x. The functions
g
g
g and
h
h
h are arbitrary scalar functions, while
w
(
x
)
≥
0
w(x) \geq 0
w(x)≥0 for all
x
x
x. Now let
w
(
x
)
=
p
(
x
;
θ
)
g
(
x
)
=
α
^
−
α
h
(
x
)
=
∂
ln
p
(
x
;
θ
)
∂
θ
\begin{aligned} w(\mathbf{x}) &=p(\mathbf{x} ; \theta) \\ g(\mathbf{x}) &=\hat{\alpha}-\alpha \\ h(\mathbf{x}) &=\frac{\partial \ln p(\mathbf{x} ; \theta)}{\partial \theta} \end{aligned}
w(x)g(x)h(x)=p(x;θ)=α^−α=∂θ∂lnp(x;θ)
and apply the Cauchy-Schwarz inequality to
(
C
R
.
3
)
(CR.3)
(CR.3) to produce
(
∂
g
(
θ
)
∂
θ
)
2
≤
∫
(
α
^
−
α
)
2
p
(
x
;
θ
)
d
x
∫
(
∂
ln
p
(
x
;
θ
)
∂
θ
)
2
p
(
x
;
θ
)
d
x
\left(\frac{\partial g(\theta)}{\partial \theta}\right)^{2} \leq \int(\hat{\alpha}-\alpha)^{2} p(\mathbf{x} ; \theta) d \mathbf{x} \int\left(\frac{\partial \ln p(\mathbf{x} ; \theta)}{\partial \theta}\right)^{2} p(\mathbf{x} ; \theta) d \mathbf{x}
(∂θ∂g(θ))2≤∫(α^−α)2p(x;θ)dx∫(∂θ∂lnp(x;θ))2p(x;θ)dx
or
var
(
α
^
)
≥
(
∂
g
(
θ
)
∂
θ
)
2
E
[
(
∂
ln
p
(
x
;
θ
)
∂
θ
)
2
]
=
(
∂
g
(
θ
)
∂
θ
)
2
−
E
[
∂
2
ln
p
(
x
;
θ
)
∂
θ
2
]
(CR.5)
\operatorname{var}(\hat{\alpha}) \geq \frac{\left(\frac{\partial g(\theta)}{\partial \theta}\right)^{2}}{E\left[\left(\frac{\partial \ln p(\mathbf{x} ; \theta)}{\partial \theta}\right)^{2}\right]}=\frac{\left(\frac{\partial g(\theta)}{\partial \theta}\right)^{2}}{-E\left[\frac{\partial^{2} \ln p(\mathbf{x} ; \theta)}{\partial \theta^{2}}\right]}\tag{CR.5}
var(α^)≥E[(∂θ∂lnp(x;θ))2](∂θ∂g(θ))2=−E[∂θ2∂2lnp(x;θ)](∂θ∂g(θ))2(CR.5)
If
α
=
g
(
θ
)
=
θ
\alpha=g(\theta)=\theta
α=g(θ)=θ, we have
var
(
α
^
)
≥
1
E
[
(
∂
ln
p
(
x
;
θ
)
∂
θ
)
2
]
=
1
−
E
[
∂
2
ln
p
(
x
;
θ
)
∂
θ
2
]
=
1
I
(
θ
)
(CR.6)
\operatorname{var}(\hat{\alpha}) \geq \frac{1}{E\left[\left(\frac{\partial \ln p(\mathbf{x} ; \theta)}{\partial \theta}\right)^{2}\right]}=\frac{1}{-E\left[\frac{\partial^{2} \ln p(\mathbf{x} ; \theta)}{\partial \theta^{2}}\right]}=\frac{1}{I(\theta)} \tag{CR.6}
var(α^)≥E[(∂θ∂lnp(x;θ))2]1=−E[∂θ2∂2lnp(x;θ)]1=I(θ)1(CR.6)
Note that the condition for equality is
∂
ln
p
(
x
;
θ
)
∂
θ
=
1
c
(
α
^
−
α
)
\frac{\partial \ln p(\mathbf{x} ; \theta)}{\partial \theta}=\frac{1}{c}(\hat{\alpha}-\alpha)
∂θ∂lnp(x;θ)=c1(α^−α)
where
c
c
c can depend on
θ
\theta
θ but not on
x
.
\mathbf{x} .
x. If
α
=
g
(
θ
)
=
θ
,
\alpha=g(\theta)=\theta,
α=g(θ)=θ, we have
∂
ln
p
(
x
;
θ
)
∂
θ
=
1
c
(
θ
)
(
θ
^
−
θ
)
\frac{\partial \ln p(\mathbf{x} ; \theta)}{\partial \theta}=\frac{1}{c(\theta)}(\hat{\theta}-\theta)
∂θ∂lnp(x;θ)=c(θ)1(θ^−θ)
To determine
c
(
θ
)
c(\theta)
c(θ),
∂
2
ln
p
(
x
;
θ
)
∂
θ
2
=
−
1
c
(
θ
)
+
∂
(
1
c
(
θ
)
)
∂
θ
(
θ
^
−
θ
)
−
E
[
∂
2
ln
p
(
x
;
θ
)
∂
θ
2
]
=
1
c
(
θ
)
\begin{aligned} \frac{\partial^{2} \ln p(\mathbf{x} ; \theta)}{\partial \theta^{2}} &=-\frac{1}{c(\theta)}+\frac{\partial\left(\frac{1}{c(\theta)}\right)}{\partial \theta}(\hat{\theta}-\theta) \\ -E\left[\frac{\partial^{2} \ln p(\mathbf{x} ; \theta)}{\partial \theta^{2}}\right] &=\frac{1}{c(\theta)} \end{aligned}
∂θ2∂2lnp(x;θ)−E[∂θ2∂2lnp(x;θ)]=−c(θ)1+∂θ∂(c(θ)1)(θ^−θ)=c(θ)1
or finally
c
(
θ
)
=
1
−
E
[
∂
2
ln
p
(
x
;
θ
)
∂
θ
2
]
=
1
I
(
θ
)
\begin{aligned} c(\theta) &=\frac{1}{-E\left[\frac{\partial^{2} \ln p(\mathbf{x} ; \theta)}{\partial \theta^{2}}\right]}=\frac{1}{I(\theta)} \end{aligned}
c(θ)=−E[∂θ2∂2lnp(x;θ)]1=I(θ)1
i.e.,
∂
ln
p
(
x
;
θ
)
∂
θ
=
I
(
θ
)
(
g
(
x
)
−
θ
)
(CR.7)
\frac{\partial \ln p(\mathbf{x} ; \theta)}{\partial \theta}=I(\theta)(g(\mathbf{x})-\theta)\tag{CR.7}
∂θ∂lnp(x;θ)=I(θ)(g(x)−θ)(CR.7)
An estimator is efficient if it meets the CRLB with equality, in which case the estimator is the MVU. However, the converse is not necessarily true.
CRLB for the Gaussian Model
General Gaussian model
Let us assume a Gaussian distribution for the noise
w
∼
N
(
0
,
C
w
)
\mathbf w\sim \mathcal{N}(\mathbf 0,\mathbf C_w)
w∼N(0,Cw). Then the Gaussian model is defined as
x
=
h
(
θ
)
+
w
x
∼
N
(
h
(
θ
)
,
C
w
)
\mathbf x=\mathbf h(\theta)+\mathbf w \quad \mathbf x\sim \mathcal{N}(\mathbf h(\theta),\mathbf C_w)
x=h(θ)+wx∼N(h(θ),Cw)
or
p
(
x
)
=
1
(
2
π
)
N
2
det
(
C
w
)
1
2
exp
[
−
1
2
(
x
−
h
(
θ
)
)
T
C
w
−
1
(
x
−
h
(
θ
)
)
]
p(\mathbf{x})=\frac{1}{(2 \pi)^{\frac{N}{2}} \operatorname{det}\left(\mathbf{C}_{w}\right)^{\frac{1}{2}}} \exp \left[-\frac{1}{2}(\mathbf{x}-\mathbf{h}(\theta))^{T} \mathbf{C}_{w}^{-1}(\mathbf{x}-\mathbf{h}(\theta))\right]
p(x)=(2π)2Ndet(Cw)211exp[−21(x−h(θ))TCw−1(x−h(θ))]
The score function:
∂
ln
p
(
x
;
θ
)
∂
θ
=
∂
h
T
(
θ
)
∂
θ
C
w
−
1
(
x
−
h
(
θ
)
)
\frac{\partial \ln p(\mathbf{x} ; \theta)}{\partial \theta}=\frac{\partial \mathbf{h}^{T}(\theta)}{\partial \theta} \mathbf{C}_{w}^{-1}(\mathbf{x}-\mathbf{h}(\theta))
∂θ∂lnp(x;θ)=∂θ∂hT(θ)Cw−1(x−h(θ))
and
∂
2
ln
p
(
x
;
θ
)
∂
θ
2
=
∂
2
h
T
(
θ
)
∂
θ
2
C
w
−
1
(
x
−
h
(
θ
)
)
−
∂
h
T
(
θ
)
∂
θ
C
w
−
1
∂
h
(
θ
)
∂
θ
\frac{\partial^{2} \ln p(\mathbf{x} ; \theta)}{\partial \theta^{2}}=\frac{\partial^{2} \mathbf{h}^{T}(\theta)}{\partial \theta^{2}} \mathbf{C}_{w}^{-1}(\mathbf{x}-\mathbf{h}(\theta))-\frac{\partial \mathbf{h}^{T}(\theta)}{\partial \theta} \mathbf{C}_{w}^{-1} \frac{\partial \mathbf{h}(\theta)}{\partial \theta}
∂θ2∂2lnp(x;θ)=∂θ2∂2hT(θ)Cw−1(x−h(θ))−∂θ∂hT(θ)Cw−1∂θ∂h(θ)
Fisher information:
I
(
θ
)
=
−
E
[
∂
2
ln
p
(
x
;
θ
)
∂
θ
2
]
=
∂
h
T
(
θ
)
∂
θ
C
w
−
1
∂
h
(
θ
)
∂
θ
I(\theta)=-{E}\left[\frac{\partial^{2} \ln p(\mathbf{x} ; \theta)}{\partial \theta^{2}}\right]=\frac{\partial \mathbf{h}^{T}(\theta)}{\partial \theta} \mathbf{C}_{w}^{-1} \frac{\partial \mathbf{h}(\theta)}{\partial \theta}
I(θ)=−E[∂θ2∂2lnp(x;θ)]=∂θ∂hT(θ)Cw−1∂θ∂h(θ)
CRLB:
var
(
θ
^
)
≥
1
∂
h
T
(
θ
)
∂
θ
C
w
−
1
∂
h
(
θ
)
∂
θ
\operatorname{var}(\hat{\theta}) \geq \frac{1}{\frac{\partial \mathbf{h}^{T}(\theta)}{\partial \theta} \mathbf{C}_{w}^{-1} \frac{\partial \mathbf{h}(\theta)}{\partial \theta}}
var(θ^)≥∂θ∂hT(θ)Cw−1∂θ∂h(θ)1
Linear Gaussian model
Consider the linear Gaussian model with
h
(
θ
)
=
h
θ
\mathbf h(\theta)=\mathbf h \theta
h(θ)=hθ:
x
=
h
θ
+
w
w
∼
N
(
0
,
C
w
)
\mathbf x=\mathbf h\theta+\mathbf w \quad \mathbf w\sim \mathcal{N}(\mathbf 0,\mathbf C_w)
x=hθ+ww∼N(0,Cw)
From CRLB for a general Gaussian model, we can directly know
∂
ln
p
(
x
;
θ
)
∂
θ
=
∂
h
T
(
θ
)
∂
θ
C
w
−
1
(
x
−
h
(
θ
)
)
,
var
(
θ
^
)
≥
1
h
T
C
w
−
1
h
\frac{\partial \ln p(\mathbf{x} ; \theta)}{\partial \theta}=\frac{\partial \mathbf{h}^{T}(\theta)}{\partial \theta} \mathbf{C}_{w}^{-1}(\mathbf{x}-\mathbf{h}(\theta)), \quad \operatorname{var}(\hat{\theta}) \geq \frac{1}{\mathbf{h}^{T} \mathbf{C}_{w}^{-1} \mathbf{h}}
∂θ∂lnp(x;θ)=∂θ∂hT(θ)Cw−1(x−h(θ)),var(θ^)≥hTCw−1h1
Furthermore,
∂
ln
p
(
x
;
θ
)
∂
θ
=
h
T
C
w
−
1
(
x
−
h
θ
)
=
h
T
C
w
−
1
h
[
(
h
T
C
w
−
1
h
)
−
1
h
T
C
w
−
1
x
−
θ
]
\begin{aligned} \frac{\partial \ln p(\mathbf{x} ; \theta)}{\partial \theta} &=\mathbf{h}^{T} \mathbf{C}_{w}^{-1}(\mathbf{x}-\mathbf{h} \theta) \\ &=\mathbf{h}^{T} \mathbf{C}_{w}^{-1} \mathbf{h}\left[\left(\mathbf{h}^{T} \mathbf{C}_{w}^{-1} \mathbf{h}\right)^{-1} \mathbf{h}^{T} \mathbf{C}_{w}^{-1} \mathbf{x}-\theta\right] \end{aligned}
∂θ∂lnp(x;θ)=hTCw−1(x−hθ)=hTCw−1h[(hTCw−1h)−1hTCw−1x−θ]
Thus, the MVU exists and its solution reaches the CRLB:
θ
^
=
(
h
T
C
w
−
1
h
)
−
1
h
T
C
w
−
1
x
\hat{\theta}=\left(\mathbf{h}^{T} \mathbf{C}_{w}^{-1} \mathbf{h}\right)^{-1} \mathbf{h}^{T} \mathbf{C}_{w}^{-1} \mathbf{x}
θ^=(hTCw−1h)−1hTCw−1x