Cramer-Rao Lower Bound

Reference:

Kay S M. Fundamentals of statistical signal processing[M]. Prentice Hall PTR, 1993. (Chapter 3-3.5)
Slides of ET4386, TUD

Estimator Accuracy Considerations

When the PDF is viewed as a function of the unknown parameter (with x \mathbf x x fixed), it is termed the likelihood function. Two examples of likelihood functions were shown in Figure 3.1.

在这里插入图片描述

Intuitively,
curvature: the negative of the second derivative of the logarithm of the likelihood function ⇓ the sharpness of the likelihood function ⇓ how accurately we can estimate the unknown parameter \begin{array}{c} \text{curvature: the negative of the second derivative of the logarithm of the likelihood function}\\\Downarrow\\ \text{the sharpness of the likelihood function}\\\Downarrow\\ \text{how accurately we can estimate the unknown parameter} \end{array} curvature: the negative of the second derivative of the logarithm of the likelihood functionthe sharpness of the likelihood functionhow accurately we can estimate the unknown parameter
Define a measure of curvature:
− E [ ∂ 2 ln ⁡ p ( x [ 0 ] ; A ) ∂ A 2 ] -E\left[\frac{\partial^2\ln p(x[0];A)}{\partial A^2}\right] E[A22lnp(x[0];A)]
which measures the average curvature of the log-likelihood function. The expectation is taken with respect to p ( x [ 0 ] ; A ) p(x[0]; A) p(x[0];A), resulting in a function of A A A only. The larger the quantity, the smaller the variance of the estimator.

To prove the intuition above mathematically, we first introduce some definitions.

Score function and regularity condition

The score function is the gradient of the log-likelihood function
s ( x ; θ ) = ∂ ln ⁡ p ( x ; θ ) ∂ θ s(\mathbf x;\theta)=\frac{\partial \ln p(\mathbf x;\theta)}{\partial \theta} s(x;θ)=θlnp(x;θ)
which indicates the steepness of the log-likelihood function.

If ∂ ∂ θ ln ⁡ p ( x ; θ ) \frac{\partial}{\partial \theta}\ln p(\mathbf x;\theta) θlnp(x;θ) exists and is finite, and
∫ ∂ p ( x ; θ ) ∂ θ d x = ∂ ∂ θ ∫ p ( x ; θ ) d x \int \frac{\partial p(\mathbf x;\theta)}{\partial \theta}d \mathbf x=\frac{\partial}{\partial \theta}\int p(\mathbf x;\theta)d\mathbf x θp(x;θ)dx=θp(x;θ)dx
then the PDF p ( x ; θ ) p(\mathbf x;\theta) p(x;θ) satisfies the following regularity condition:
E [ s ( x ; θ ) ] = E [ ∂ ln ⁡ p ( x ; θ ) ∂ θ ] = ∫ ∂ ln ⁡ p ( x ; θ ) ∂ θ p ( x ; θ ) d x = ∫ ∂ p ( x ; θ ) ∂ θ d x = ∂ ∂ θ ∫ p ( x ; θ ) d x = 0 ,  for all  θ \begin{aligned} E[s(\mathbf x;\theta)]&=E[\frac{\partial \ln p(\mathbf x;\theta)}{\partial \theta}]=\int \frac{\partial \ln p(\mathbf x;\theta)}{\partial \theta} p(\mathbf x;\theta)d \mathbf x\\ &=\int \frac{\partial p(\mathbf x;\theta)}{\partial \theta} d \mathbf x=\frac{\partial}{\partial \theta}\int p(\mathbf x;\theta) d \mathbf x=0, \text{ for all }\theta \end{aligned} E[s(x;θ)]=E[θlnp(x;θ)]=θlnp(x;θ)p(x;θ)dx=θp(x;θ)dx=θp(x;θ)dx=0, for all θ
unless the domain of the PDF for which it is nonzero depends on the unknown parameter. For instance x [ n ] ∼ U [ 0 , θ ] x[n]\sim \mathcal{U}[0,\theta] x[n]U[0,θ].

Fisher information

The variance of the score function is the Fisher information
I ( θ ) = − E [ ∂ 2 ln ⁡ p ( x ; θ ) ) ∂ θ 2 ] = E [ ( ∂ ln ⁡ p ( x ; θ ) ∂ θ ) 2 ] I(\theta)=-{E}\left[\frac{\left.\partial^{2} \ln p(\mathbf{x} ; \theta)\right)}{\partial \theta^2}\right]={E}\left[\left(\frac{\partial \ln p(\mathbf{x} ; \theta)}{\partial \theta}\right)^{2}\right] I(θ)=E[θ22lnp(x;θ))]=E[(θlnp(x;θ))2]
Proof: From the regularity conditions, we obtain
∂ ∂ θ E [ ∂ ln ⁡ p ( x ; θ ) ∂ θ ] = 0 ⇒ ∂ ∂ θ ∫ ∂ ln ⁡ p ( x ; θ ) ∂ θ p ( x ; θ ) d x = 0 \frac{\partial}{\partial \theta} {E}\left[\frac{\partial \ln p(\mathbf{x} ; \theta)}{\partial \theta}\right]=0 \Rightarrow \frac{\partial}{\partial \theta} \int \frac{\partial \ln p(\mathbf{x} ; \theta)}{\partial \theta} p(\mathbf{x} ; \theta) d \mathbf{x}=0 θE[θlnp(x;θ)]=0θθlnp(x;θ)p(x;θ)dx=0
or,
∫ [ ∂ 2 ln ⁡ p ( x ; θ ) ∂ θ 2 p ( x ; θ ) + ∂ ln ⁡ p ( x ; θ ) ∂ θ ∂ p ( x ; θ ) ∂ θ ] d x = 0 \int\left[\frac{\partial^{2} \ln p(\mathbf{x} ; \theta)}{\partial \theta^{2}} p(\mathbf{x} ; \theta)+\frac{\partial \ln p(\mathbf{x} ; \theta)}{\partial \theta} \frac{\partial p(\mathbf{x} ; \theta)}{\partial \theta}\right] d \mathbf{x}=0 [θ22lnp(x;θ)p(x;θ)+θlnp(x;θ)θp(x;θ)]dx=0
and rearranging the terms,
− ∫ ∂ 2 ln ⁡ p ( x ; θ ) ∂ θ 2 p ( x ; θ ) d x = ∫ ( ∂ ln ⁡ p ( x ; θ ) ∂ θ ) 2 p ( x ; θ ) d x − E [ ∂ 2 ln ⁡ p ( x ; θ ) ∂ θ 2 ] = E [ ( ∂ ln ⁡ p ( x ; θ ) ∂ θ ) 2 ] \begin{aligned} -\int \frac{\partial^{2} \ln p(\mathbf{x} ; \theta)}{\partial \theta^{2}} p(\mathbf{x} ; \theta) d \mathbf{x} &=\int\left(\frac{\partial \ln p(\mathbf{x} ; \theta)}{\partial \theta}\right)^{2} p(\mathbf{x} ; \theta) d \mathbf{x} \\ -{E}\left[\frac{\partial^{2} \ln p(\mathbf{x} ; \theta)}{\partial \theta^{2}}\right] &={E}\left[\left(\frac{\partial \ln p(\mathbf{x} ; \theta)}{\partial \theta}\right)^{2}\right] \end{aligned} θ22lnp(x;θ)p(x;θ)dxE[θ22lnp(x;θ)]=(θlnp(x;θ))2p(x;θ)dx=E[(θlnp(x;θ))2]
The fisher information is

  • Non-negative, and

  • Additive for independent observations, i.e., when
    ln ⁡ p ( x ; θ ) = ∑ n = 0 N − 1 ln ⁡ p ( x [ n ] ; θ ) , \ln p(\mathbf x;\theta)=\sum_{n=0}^{N-1} \ln p(x[n];\theta), lnp(x;θ)=n=0N1lnp(x[n];θ),
    then
    − E [ ∂ 2 ln ⁡ p ( x ; θ ) ∂ θ 2 ] = ∑ n = 0 N − 1 − E [ ∂ 2 ln ⁡ p ( x [ n ] ; θ ) ∂ θ 2 ] -{E}\left[\frac{\partial^{2} \ln p(\mathbf{x} ; \theta)}{\partial \theta^{2}}\right]=\sum_{n=0}^{N-1}-{E}\left[\frac{\partial^{2} \ln p(x[n] ; \theta)}{\partial \theta^{2}}\right] E[θ22lnp(x;θ)]=n=0N1E[θ22lnp(x[n];θ)]
    and for identically distributed observations
    I ( θ ) = N i ( θ ) ,  where  i ( θ ) = − E [ ∂ 2 ln ⁡ p ( x [ n ] ; θ ) ∂ θ 2 ] I(\theta)=Ni(\theta),\text{ where } i(\theta)=-{E}\left[\frac{\partial^{2} \ln p(x[n] ; \theta)}{\partial \theta^{2}}\right] I(θ)=Ni(θ), where i(θ)=E[θ22lnp(x[n];θ)]

Cramer-Rao Lower Bound Theorem

It is assumed that the PDF p ( x ; θ ) p(\mathbf{x} ; \theta) p(x;θ) satisfies the “regularity” condition
E [ ∂ ln ⁡ p ( x ; θ ) ∂ θ ] = 0  for all  θ (CR.0) E\left[\frac{\partial \ln p(\mathbf{x} ; \theta)}{\partial \theta}\right]=0 \quad \text { for all } \theta \tag{CR.0} E[θlnp(x;θ)]=0 for all θ(CR.0)
where the expectation is taken with respect to p ( x ; θ ) . p(\mathbf{x} ; \theta) . p(x;θ). Then, the variance of any unbiased estimator θ ^ \hat{\theta} θ^ must satisfy
var ⁡ ( θ ^ ) ≥ 1 − E [ ∂ 2 ln ⁡ p ( x ; θ ) ∂ θ 2 ] = 1 E [ ( ∂ ln ⁡ p ( x ; θ ) ∂ θ ) 2 ] = 1 I ( θ ) (CR.6) \operatorname{var}(\hat{\theta}) \geq \frac{1}{-E\left[\frac{\partial^{2} \ln p(\mathbf{x} ; \theta)}{\partial \theta^{2}}\right]}=\frac{1}{{E}\left[\left(\frac{\partial \ln p(\mathbf{x} ; \theta)}{\partial \theta}\right)^{2}\right]}=\frac{1}{I(\theta)}\tag{CR.6} var(θ^)E[θ22lnp(x;θ)]1=E[(θlnp(x;θ))2]1=I(θ)1(CR.6)
where the derivative is evaluated at the true value of θ \theta θ and the expectation is taken with respect to p ( x ; θ ) . p(\mathbf{x} ; \theta) . p(x;θ). Furthermore, an unbiased estimator may be found that attains the bound for all θ \theta θ if and only if
∂ ln ⁡ p ( x ; θ ) ∂ θ = I ( θ ) ( g ( x ) − θ ) (CR.7) \frac{\partial \ln p(\mathbf{x} ; \theta)}{\partial \theta}=I(\theta)(g(\mathbf{x})-\theta)\tag{CR.7} θlnp(x;θ)=I(θ)(g(x)θ)(CR.7)
for some functions g g g and I . I . I. That estimator, which is the M V U M V U MVU estimator, is θ ^ = g ( x ) \hat{\theta}=g(x) θ^=g(x) and the minimum variance is 1 / I ( θ ) . 1 / I(\theta) . 1/I(θ).

Proof: Consider a scalar parameter α = g ( θ ) \alpha=g(\theta) α=g(θ) where the PDF is parameterized by θ \theta θ. Assume the estimators are unbiased, i.e.,
E ( α ^ ) = α = g ( θ ) E(\hat \alpha)=\alpha=g(\theta) E(α^)=α=g(θ)
or
∫ α ^ p ( x ; θ ) d x = g ( θ ) (CR.1) \int \hat \alpha p(\mathbf x;\theta)d \mathbf x=g(\theta)\tag{CR.1} α^p(x;θ)dx=g(θ)(CR.1)
From Section [Score function and regularity condition](# Score function and regularity condition) we already know that the regularity condition will be satisfied if the order of differentiation and integration may be interchanged. This is generally true except when the domain of the PDF for which it is nonzero depends on the unknown parameter.

Now differentiating both sides of ( C R . 1 ) (CR.1) (CR.1) with respect to θ \theta θ and interchanging the partial differentiation and integration produces
∫ α ^ ∂ p ( x ; θ ) ∂ θ d x = ∂ g ( θ ) ∂ θ \int \hat \alpha \frac{\partial p(\mathbf x;\theta)}{\partial \theta}d\mathbf x=\frac{\partial g(\theta)}{\partial \theta} α^θp(x;θ)dx=θg(θ)
or
∫ α ^ ∂ ln ⁡ p ( x ; θ ) ∂ θ p ( x ; θ ) d x = ∂ g ( θ ) ∂ θ (CR.2) \int \hat{\alpha} \frac{\partial \ln p(\mathbf{x} ; \theta)}{\partial \theta} p(\mathbf{x} ; \theta) d \mathbf{x}=\frac{\partial g(\theta)}{\partial \theta}\tag{CR.2} α^θlnp(x;θ)p(x;θ)dx=θg(θ)(CR.2)
We can modify this using the regularity condition to produce
∫ ( α ^ − α ) ∂ ln ⁡ p ( x ; θ ) ∂ θ p ( x ; θ ) d x = ∂ g ( θ ) ∂ θ (CR.3) \int(\hat{\alpha}-\alpha) \frac{\partial \ln p(\mathbf{x} ; \theta)}{\partial \theta} p(\mathbf{x} ; \theta) d \mathbf{x}=\frac{\partial g(\theta)}{\partial \theta}\tag{CR.3} (α^α)θlnp(x;θ)p(x;θ)dx=θg(θ)(CR.3)
since
∫ α ∂ ln ⁡ p ( x ; θ ) ∂ θ p ( x ; θ ) d x = α E [ ∂ ln ⁡ p ( x ; θ ) ∂ θ ] = 0 \int \alpha \frac{\partial \ln p(\mathbf{x} ; \theta)}{\partial \theta} p(\mathbf{x} ; \theta) d \mathbf{x}=\alpha E\left[\frac{\partial \ln p(\mathbf{x} ; \theta)}{\partial \theta}\right]=0 αθlnp(x;θ)p(x;θ)dx=αE[θlnp(x;θ)]=0
We now apply the Cauchy-Schwarz inequality
[ ∫ w ( x ) g ( x ) h ( x ) d x ] 2 ≤ ∫ w ( x ) g 2 ( x ) d x ∫ w ( x ) h 2 ( x ) d x (CR.4) \left[\int w(\mathbf{x}) g(\mathbf{x}) h(\mathbf{x}) d \mathbf{x}\right]^{2} \leq \int w(\mathbf{x}) g^{2}(\mathbf{x}) d \mathbf{x} \int w(\mathbf{x}) h^{2}(\mathbf{x}) d \mathbf{x}\tag{CR.4} [w(x)g(x)h(x)dx]2w(x)g2(x)dxw(x)h2(x)dx(CR.4)
which holds with equality if and only if g ( x ) = c h ( x ) g(\mathbf{x})={ch}(\mathbf{x}) g(x)=ch(x) for c c c some constant not dependent on x x x. The functions g g g and h h h are arbitrary scalar functions, while w ( x ) ≥ 0 w(x) \geq 0 w(x)0 for all x x x. Now let
w ( x ) = p ( x ; θ ) g ( x ) = α ^ − α h ( x ) = ∂ ln ⁡ p ( x ; θ ) ∂ θ \begin{aligned} w(\mathbf{x}) &=p(\mathbf{x} ; \theta) \\ g(\mathbf{x}) &=\hat{\alpha}-\alpha \\ h(\mathbf{x}) &=\frac{\partial \ln p(\mathbf{x} ; \theta)}{\partial \theta} \end{aligned} w(x)g(x)h(x)=p(x;θ)=α^α=θlnp(x;θ)
and apply the Cauchy-Schwarz inequality to ( C R . 3 ) (CR.3) (CR.3) to produce
( ∂ g ( θ ) ∂ θ ) 2 ≤ ∫ ( α ^ − α ) 2 p ( x ; θ ) d x ∫ ( ∂ ln ⁡ p ( x ; θ ) ∂ θ ) 2 p ( x ; θ ) d x \left(\frac{\partial g(\theta)}{\partial \theta}\right)^{2} \leq \int(\hat{\alpha}-\alpha)^{2} p(\mathbf{x} ; \theta) d \mathbf{x} \int\left(\frac{\partial \ln p(\mathbf{x} ; \theta)}{\partial \theta}\right)^{2} p(\mathbf{x} ; \theta) d \mathbf{x} (θg(θ))2(α^α)2p(x;θ)dx(θlnp(x;θ))2p(x;θ)dx
or
var ⁡ ( α ^ ) ≥ ( ∂ g ( θ ) ∂ θ ) 2 E [ ( ∂ ln ⁡ p ( x ; θ ) ∂ θ ) 2 ] = ( ∂ g ( θ ) ∂ θ ) 2 − E [ ∂ 2 ln ⁡ p ( x ; θ ) ∂ θ 2 ] (CR.5) \operatorname{var}(\hat{\alpha}) \geq \frac{\left(\frac{\partial g(\theta)}{\partial \theta}\right)^{2}}{E\left[\left(\frac{\partial \ln p(\mathbf{x} ; \theta)}{\partial \theta}\right)^{2}\right]}=\frac{\left(\frac{\partial g(\theta)}{\partial \theta}\right)^{2}}{-E\left[\frac{\partial^{2} \ln p(\mathbf{x} ; \theta)}{\partial \theta^{2}}\right]}\tag{CR.5} var(α^)E[(θlnp(x;θ))2](θg(θ))2=E[θ22lnp(x;θ)](θg(θ))2(CR.5)
If α = g ( θ ) = θ \alpha=g(\theta)=\theta α=g(θ)=θ, we have
var ⁡ ( α ^ ) ≥ 1 E [ ( ∂ ln ⁡ p ( x ; θ ) ∂ θ ) 2 ] = 1 − E [ ∂ 2 ln ⁡ p ( x ; θ ) ∂ θ 2 ] = 1 I ( θ ) (CR.6) \operatorname{var}(\hat{\alpha}) \geq \frac{1}{E\left[\left(\frac{\partial \ln p(\mathbf{x} ; \theta)}{\partial \theta}\right)^{2}\right]}=\frac{1}{-E\left[\frac{\partial^{2} \ln p(\mathbf{x} ; \theta)}{\partial \theta^{2}}\right]}=\frac{1}{I(\theta)} \tag{CR.6} var(α^)E[(θlnp(x;θ))2]1=E[θ22lnp(x;θ)]1=I(θ)1(CR.6)
Note that the condition for equality is
∂ ln ⁡ p ( x ; θ ) ∂ θ = 1 c ( α ^ − α ) \frac{\partial \ln p(\mathbf{x} ; \theta)}{\partial \theta}=\frac{1}{c}(\hat{\alpha}-\alpha) θlnp(x;θ)=c1(α^α)
where c c c can depend on θ \theta θ but not on x . \mathbf{x} . x. If α = g ( θ ) = θ , \alpha=g(\theta)=\theta, α=g(θ)=θ, we have
∂ ln ⁡ p ( x ; θ ) ∂ θ = 1 c ( θ ) ( θ ^ − θ ) \frac{\partial \ln p(\mathbf{x} ; \theta)}{\partial \theta}=\frac{1}{c(\theta)}(\hat{\theta}-\theta) θlnp(x;θ)=c(θ)1(θ^θ)
To determine c ( θ ) c(\theta) c(θ),
∂ 2 ln ⁡ p ( x ; θ ) ∂ θ 2 = − 1 c ( θ ) + ∂ ( 1 c ( θ ) ) ∂ θ ( θ ^ − θ ) − E [ ∂ 2 ln ⁡ p ( x ; θ ) ∂ θ 2 ] = 1 c ( θ ) \begin{aligned} \frac{\partial^{2} \ln p(\mathbf{x} ; \theta)}{\partial \theta^{2}} &=-\frac{1}{c(\theta)}+\frac{\partial\left(\frac{1}{c(\theta)}\right)}{\partial \theta}(\hat{\theta}-\theta) \\ -E\left[\frac{\partial^{2} \ln p(\mathbf{x} ; \theta)}{\partial \theta^{2}}\right] &=\frac{1}{c(\theta)} \end{aligned} θ22lnp(x;θ)E[θ22lnp(x;θ)]=c(θ)1+θ(c(θ)1)(θ^θ)=c(θ)1
or finally
c ( θ ) = 1 − E [ ∂ 2 ln ⁡ p ( x ; θ ) ∂ θ 2 ] = 1 I ( θ ) \begin{aligned} c(\theta) &=\frac{1}{-E\left[\frac{\partial^{2} \ln p(\mathbf{x} ; \theta)}{\partial \theta^{2}}\right]}=\frac{1}{I(\theta)} \end{aligned} c(θ)=E[θ22lnp(x;θ)]1=I(θ)1
i.e.,
∂ ln ⁡ p ( x ; θ ) ∂ θ = I ( θ ) ( g ( x ) − θ ) (CR.7) \frac{\partial \ln p(\mathbf{x} ; \theta)}{\partial \theta}=I(\theta)(g(\mathbf{x})-\theta)\tag{CR.7} θlnp(x;θ)=I(θ)(g(x)θ)(CR.7)


在这里插入图片描述

An estimator is efficient if it meets the CRLB with equality, in which case the estimator is the MVU. However, the converse is not necessarily true.

CRLB for the Gaussian Model

General Gaussian model

Let us assume a Gaussian distribution for the noise w ∼ N ( 0 , C w ) \mathbf w\sim \mathcal{N}(\mathbf 0,\mathbf C_w) wN(0,Cw). Then the Gaussian model is defined as
x = h ( θ ) + w x ∼ N ( h ( θ ) , C w ) \mathbf x=\mathbf h(\theta)+\mathbf w \quad \mathbf x\sim \mathcal{N}(\mathbf h(\theta),\mathbf C_w) x=h(θ)+wxN(h(θ),Cw)
or
p ( x ) = 1 ( 2 π ) N 2 det ⁡ ( C w ) 1 2 exp ⁡ [ − 1 2 ( x − h ( θ ) ) T C w − 1 ( x − h ( θ ) ) ] p(\mathbf{x})=\frac{1}{(2 \pi)^{\frac{N}{2}} \operatorname{det}\left(\mathbf{C}_{w}\right)^{\frac{1}{2}}} \exp \left[-\frac{1}{2}(\mathbf{x}-\mathbf{h}(\theta))^{T} \mathbf{C}_{w}^{-1}(\mathbf{x}-\mathbf{h}(\theta))\right] p(x)=(2π)2Ndet(Cw)211exp[21(xh(θ))TCw1(xh(θ))]
The score function:
∂ ln ⁡ p ( x ; θ ) ∂ θ = ∂ h T ( θ ) ∂ θ C w − 1 ( x − h ( θ ) ) \frac{\partial \ln p(\mathbf{x} ; \theta)}{\partial \theta}=\frac{\partial \mathbf{h}^{T}(\theta)}{\partial \theta} \mathbf{C}_{w}^{-1}(\mathbf{x}-\mathbf{h}(\theta)) θlnp(x;θ)=θhT(θ)Cw1(xh(θ))
and
∂ 2 ln ⁡ p ( x ; θ ) ∂ θ 2 = ∂ 2 h T ( θ ) ∂ θ 2 C w − 1 ( x − h ( θ ) ) − ∂ h T ( θ ) ∂ θ C w − 1 ∂ h ( θ ) ∂ θ \frac{\partial^{2} \ln p(\mathbf{x} ; \theta)}{\partial \theta^{2}}=\frac{\partial^{2} \mathbf{h}^{T}(\theta)}{\partial \theta^{2}} \mathbf{C}_{w}^{-1}(\mathbf{x}-\mathbf{h}(\theta))-\frac{\partial \mathbf{h}^{T}(\theta)}{\partial \theta} \mathbf{C}_{w}^{-1} \frac{\partial \mathbf{h}(\theta)}{\partial \theta} θ22lnp(x;θ)=θ22hT(θ)Cw1(xh(θ))θhT(θ)Cw1θh(θ)
Fisher information:
I ( θ ) = − E [ ∂ 2 ln ⁡ p ( x ; θ ) ∂ θ 2 ] = ∂ h T ( θ ) ∂ θ C w − 1 ∂ h ( θ ) ∂ θ I(\theta)=-{E}\left[\frac{\partial^{2} \ln p(\mathbf{x} ; \theta)}{\partial \theta^{2}}\right]=\frac{\partial \mathbf{h}^{T}(\theta)}{\partial \theta} \mathbf{C}_{w}^{-1} \frac{\partial \mathbf{h}(\theta)}{\partial \theta} I(θ)=E[θ22lnp(x;θ)]=θhT(θ)Cw1θh(θ)
CRLB:
var ⁡ ( θ ^ ) ≥ 1 ∂ h T ( θ ) ∂ θ C w − 1 ∂ h ( θ ) ∂ θ \operatorname{var}(\hat{\theta}) \geq \frac{1}{\frac{\partial \mathbf{h}^{T}(\theta)}{\partial \theta} \mathbf{C}_{w}^{-1} \frac{\partial \mathbf{h}(\theta)}{\partial \theta}} var(θ^)θhT(θ)Cw1θh(θ)1

Linear Gaussian model

Consider the linear Gaussian model with h ( θ ) = h θ \mathbf h(\theta)=\mathbf h \theta h(θ)=hθ:
x = h θ + w w ∼ N ( 0 , C w ) \mathbf x=\mathbf h\theta+\mathbf w \quad \mathbf w\sim \mathcal{N}(\mathbf 0,\mathbf C_w) x=hθ+wwN(0,Cw)
From CRLB for a general Gaussian model, we can directly know
∂ ln ⁡ p ( x ; θ ) ∂ θ = ∂ h T ( θ ) ∂ θ C w − 1 ( x − h ( θ ) ) , var ⁡ ( θ ^ ) ≥ 1 h T C w − 1 h \frac{\partial \ln p(\mathbf{x} ; \theta)}{\partial \theta}=\frac{\partial \mathbf{h}^{T}(\theta)}{\partial \theta} \mathbf{C}_{w}^{-1}(\mathbf{x}-\mathbf{h}(\theta)), \quad \operatorname{var}(\hat{\theta}) \geq \frac{1}{\mathbf{h}^{T} \mathbf{C}_{w}^{-1} \mathbf{h}} θlnp(x;θ)=θhT(θ)Cw1(xh(θ)),var(θ^)hTCw1h1
Furthermore,
∂ ln ⁡ p ( x ; θ ) ∂ θ = h T C w − 1 ( x − h θ ) = h T C w − 1 h [ ( h T C w − 1 h ) − 1 h T C w − 1 x − θ ] \begin{aligned} \frac{\partial \ln p(\mathbf{x} ; \theta)}{\partial \theta} &=\mathbf{h}^{T} \mathbf{C}_{w}^{-1}(\mathbf{x}-\mathbf{h} \theta) \\ &=\mathbf{h}^{T} \mathbf{C}_{w}^{-1} \mathbf{h}\left[\left(\mathbf{h}^{T} \mathbf{C}_{w}^{-1} \mathbf{h}\right)^{-1} \mathbf{h}^{T} \mathbf{C}_{w}^{-1} \mathbf{x}-\theta\right] \end{aligned} θlnp(x;θ)=hTCw1(xhθ)=hTCw1h[(hTCw1h)1hTCw1xθ]

Thus, the MVU exists and its solution reaches the CRLB:
θ ^ = ( h T C w − 1 h ) − 1 h T C w − 1 x \hat{\theta}=\left(\mathbf{h}^{T} \mathbf{C}_{w}^{-1} \mathbf{h}\right)^{-1} \mathbf{h}^{T} \mathbf{C}_{w}^{-1} \mathbf{x} θ^=(hTCw1h)1hTCw1x

  • 2
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值