Cramer-Rao Lower Bound-CSDN博客

本文链接：https://blog.csdn.net/qq_39599295/article/details/109734128

Reference:

Kay S M. Fundamentals of statistical signal processing[M]. Prentice Hall PTR, 1993. (Chapter 3-3.5)
Slides of ET4386, TUD

Content

Estimator Accuracy Considerations

When the PDF is viewed as a function of the unknown parameter (with $\mathbf x$ fixed), it is termed the likelihood function. Two examples of likelihood functions were shown in Figure 3.1.

在这里插入图片描述

Intuitively,
$\begin{array}{c} \text{curvature: the negative of the second derivative of the logarithm of the likelihood function}\\\Downarrow\\ \text{the sharpness of the likelihood function}\\\Downarrow\\ \text{how accurately we can estimate the unknown parameter} \end{array}$
Define a measure of curvature:
$-E\left[\frac{\partial^2\ln p(x[0];A)}{\partial A^2}\right]$
which measures the average curvature of the log-likelihood function. The expectation is taken with respect to $p (x [0]; A)$ , resulting in a function of $A$ only. The larger the quantity, the smaller the variance of the estimator.

To prove the intuition above mathematically, we first introduce some definitions.

Score function and regularity condition

The score function is the gradient of the log-likelihood function
$s(\mathbf x;\theta)=\frac{\partial \ln p(\mathbf x;\theta)}{\partial \theta}$
which indicates the steepness of the log-likelihood function.

If $\frac{\partial}{\partial \theta}\ln p(\mathbf x;\theta)$ exists and is finite, and
$\int \frac{\partial p(\mathbf x;\theta)}{\partial \theta}d \mathbf x=\frac{\partial}{\partial \theta}\int p(\mathbf x;\theta)d\mathbf x$
then the PDF $p(\mathbf x;\theta)$ satisfies the following regularity condition:
$\begin{aligned} E[s(\mathbf x;\theta)]&=E[\frac{\partial \ln p(\mathbf x;\theta)}{\partial \theta}]=\int \frac{\partial \ln p(\mathbf x;\theta)}{\partial \theta} p(\mathbf x;\theta)d \mathbf x\\ &=\int \frac{\partial p(\mathbf x;\theta)}{\partial \theta} d \mathbf x=\frac{\partial}{\partial \theta}\int p(\mathbf x;\theta) d \mathbf x=0, \text{ for all }\theta \end{aligned}$
unless the domain of the PDF for which it is nonzero depends on the unknown parameter. For instance $x[n]\sim \mathcal{U}[0,\theta]$ .

Fisher information

The variance of the score function is the Fisher information
$I(\theta)=-{E}\left[\frac{\left.\partial^{2} \ln p(\mathbf{x} ; \theta)\right)}{\partial \theta^2}\right]={E}\left[\left(\frac{\partial \ln p(\mathbf{x} ; \theta)}{\partial \theta}\right)^{2}\right]$
Proof: From the regularity conditions, we obtain
$\frac{\partial}{\partial \theta} {E}\left[\frac{\partial \ln p(\mathbf{x} ; \theta)}{\partial \theta}\right]=0 \Rightarrow \frac{\partial}{\partial \theta} \int \frac{\partial \ln p(\mathbf{x} ; \theta)}{\partial \theta} p(\mathbf{x} ; \theta) d \mathbf{x}=0$
or,
$\int\left[\frac{\partial^{2} \ln p(\mathbf{x} ; \theta)}{\partial \theta^{2}} p(\mathbf{x} ; \theta)+\frac{\partial \ln p(\mathbf{x} ; \theta)}{\partial \theta} \frac{\partial p(\mathbf{x} ; \theta)}{\partial \theta}\right] d \mathbf{x}=0$
and rearranging the terms,
$\begin{aligned} -\int \frac{\partial^{2} \ln p(\mathbf{x} ; \theta)}{\partial \theta^{2}} p(\mathbf{x} ; \theta) d \mathbf{x} &=\int\left(\frac{\partial \ln p(\mathbf{x} ; \theta)}{\partial \theta}\right)^{2} p(\mathbf{x} ; \theta) d \mathbf{x} \\ -{E}\left[\frac{\partial^{2} \ln p(\mathbf{x} ; \theta)}{\partial \theta^{2}}\right] &={E}\left[\left(\frac{\partial \ln p(\mathbf{x} ; \theta)}{\partial \theta}\right)^{2}\right] \end{aligned}$
The fisher information is

Non-negative, and
Additive for independent observations, i.e., when
$\ln p(\mathbf x;\theta)=\sum_{n=0}^{N-1} \ln p(x[n];\theta),$
then
$-{E}\left[\frac{\partial^{2} \ln p(\mathbf{x} ; \theta)}{\partial \theta^{2}}\right]=\sum_{n=0}^{N-1}-{E}\left[\frac{\partial^{2} \ln p(x[n] ; \theta)}{\partial \theta^{2}}\right]$
and for identically distributed observations
$I(\theta)=Ni(\theta),\text{ where } i(\theta)=-{E}\left[\frac{\partial^{2} \ln p(x[n] ; \theta)}{\partial \theta^{2}}\right]$

Cramer-Rao Lower Bound Theorem

It is assumed that the PDF $p(\mathbf{x} ; \theta)$ satisfies the “regularity” condition
$E\left[\frac{\partial \ln p(\mathbf{x} ; \theta)}{\partial \theta}\right]=0 \quad \text { for all } \theta \tag{CR.0}$
where the expectation is taken with respect to $p(\mathbf{x} ; \theta) .$ Then, the variance of any unbiased estimator $\hat{\theta}$ must satisfy
$\operatorname{var}(\hat{\theta}) \geq \frac{1}{-E\left[\frac{\partial^{2} \ln p(\mathbf{x} ; \theta)}{\partial \theta^{2}}\right]}=\frac{1}{{E}\left[\left(\frac{\partial \ln p(\mathbf{x} ; \theta)}{\partial \theta}\right)^{2}\right]}=\frac{1}{I(\theta)}\tag{CR.6}$
where the derivative is evaluated at the true value of $\theta$ and the expectation is taken with respect to $p(\mathbf{x} ; \theta) .$ Furthermore, an unbiased estimator may be found that attains the bound for all $\theta$ if and only if
$\frac{\partial \ln p(\mathbf{x} ; \theta)}{\partial \theta}=I(\theta)(g(\mathbf{x})-\theta)\tag{CR.7}$
for some functions $g$ and $I .$ That estimator, which is the $M V U$ estimator, is $\hat{\theta}=g(x)$ and the minimum variance is $I(\theta) .$

Proof: Consider a scalar parameter $\alpha=g(\theta)$ where the PDF is parameterized by $\theta$ . Assume the estimators are unbiased, i.e.,
$E(\hat \alpha)=\alpha=g(\theta)$
or
$\int \hat \alpha p(\mathbf x;\theta)d \mathbf x=g(\theta)\tag{CR.1}$
From Section [Score function and regularity condition](# Score function and regularity condition) we already know that the regularity condition will be satisfied if the order of differentiation and integration may be interchanged. This is generally true except when the domain of the PDF for which it is nonzero depends on the unknown parameter.

Now differentiating both sides of $(C R . 1)$ with respect to $\theta$ and interchanging the partial differentiation and integration produces
$\int \hat \alpha \frac{\partial p(\mathbf x;\theta)}{\partial \theta}d\mathbf x=\frac{\partial g(\theta)}{\partial \theta}$
or
$\int \hat{\alpha} \frac{\partial \ln p(\mathbf{x} ; \theta)}{\partial \theta} p(\mathbf{x} ; \theta) d \mathbf{x}=\frac{\partial g(\theta)}{\partial \theta}\tag{CR.2}$
We can modify this using the regularity condition to produce
$\int(\hat{\alpha}-\alpha) \frac{\partial \ln p(\mathbf{x} ; \theta)}{\partial \theta} p(\mathbf{x} ; \theta) d \mathbf{x}=\frac{\partial g(\theta)}{\partial \theta}\tag{CR.3}$
since
$\int \alpha \frac{\partial \ln p(\mathbf{x} ; \theta)}{\partial \theta} p(\mathbf{x} ; \theta) d \mathbf{x}=\alpha E\left[\frac{\partial \ln p(\mathbf{x} ; \theta)}{\partial \theta}\right]=0$
We now apply the Cauchy-Schwarz inequality
$\left[\int w(\mathbf{x}) g(\mathbf{x}) h(\mathbf{x}) d \mathbf{x}\right]^{2} \leq \int w(\mathbf{x}) g^{2}(\mathbf{x}) d \mathbf{x} \int w(\mathbf{x}) h^{2}(\mathbf{x}) d \mathbf{x}\tag{CR.4}$
which holds with equality if and only if $g(\mathbf{x})={ch}(\mathbf{x})$ for $c$ some constant not dependent on $x$ . The functions $g$ and $h$ are arbitrary scalar functions, while $\geq 0$ for all $x$ . Now let
$\begin{aligned} w(\mathbf{x}) &=p(\mathbf{x} ; \theta) \\ g(\mathbf{x}) &=\hat{\alpha}-\alpha \\ h(\mathbf{x}) &=\frac{\partial \ln p(\mathbf{x} ; \theta)}{\partial \theta} \end{aligned}$
and apply the Cauchy-Schwarz inequality to $(C R . 3)$ to produce
$\left(\frac{\partial g(\theta)}{\partial \theta}\right)^{2} \leq \int(\hat{\alpha}-\alpha)^{2} p(\mathbf{x} ; \theta) d \mathbf{x} \int\left(\frac{\partial \ln p(\mathbf{x} ; \theta)}{\partial \theta}\right)^{2} p(\mathbf{x} ; \theta) d \mathbf{x}$
or
$\operatorname{var}(\hat{\alpha}) \geq \frac{\left(\frac{\partial g(\theta)}{\partial \theta}\right)^{2}}{E\left[\left(\frac{\partial \ln p(\mathbf{x} ; \theta)}{\partial \theta}\right)^{2}\right]}=\frac{\left(\frac{\partial g(\theta)}{\partial \theta}\right)^{2}}{-E\left[\frac{\partial^{2} \ln p(\mathbf{x} ; \theta)}{\partial \theta^{2}}\right]}\tag{CR.5}$
If $\alpha=g(\theta)=\theta$ , we have
$\operatorname{var}(\hat{\alpha}) \geq \frac{1}{E\left[\left(\frac{\partial \ln p(\mathbf{x} ; \theta)}{\partial \theta}\right)^{2}\right]}=\frac{1}{-E\left[\frac{\partial^{2} \ln p(\mathbf{x} ; \theta)}{\partial \theta^{2}}\right]}=\frac{1}{I(\theta)} \tag{CR.6}$
Note that the condition for equality is
$\frac{\partial \ln p(\mathbf{x} ; \theta)}{\partial \theta}=\frac{1}{c}(\hat{\alpha}-\alpha)$
where $c$ can depend on $\theta$ but not on $\mathbf{x} .$ If $\alpha=g(\theta)=\theta,$ we have
$\frac{\partial \ln p(\mathbf{x} ; \theta)}{\partial \theta}=\frac{1}{c(\theta)}(\hat{\theta}-\theta)$
To determine $c(\theta)$ ,
$\begin{aligned} \frac{\partial^{2} \ln p(\mathbf{x} ; \theta)}{\partial \theta^{2}} &=-\frac{1}{c(\theta)}+\frac{\partial\left(\frac{1}{c(\theta)}\right)}{\partial \theta}(\hat{\theta}-\theta) \\ -E\left[\frac{\partial^{2} \ln p(\mathbf{x} ; \theta)}{\partial \theta^{2}}\right] &=\frac{1}{c(\theta)} \end{aligned}$
or finally
$\begin{aligned} c(\theta) &=\frac{1}{-E\left[\frac{\partial^{2} \ln p(\mathbf{x} ; \theta)}{\partial \theta^{2}}\right]}=\frac{1}{I(\theta)} \end{aligned}$
i.e.,
$\frac{\partial \ln p(\mathbf{x} ; \theta)}{\partial \theta}=I(\theta)(g(\mathbf{x})-\theta)\tag{CR.7}$

在这里插入图片描述

An estimator is efficient if it meets the CRLB with equality, in which case the estimator is the MVU. However, the converse is not necessarily true.

CRLB for the Gaussian Model

General Gaussian model

Let us assume a Gaussian distribution for the noise $\mathbf w\sim \mathcal{N}(\mathbf 0,\mathbf C_w)$ . Then the Gaussian model is defined as
$\mathbf x=\mathbf h(\theta)+\mathbf w \quad \mathbf x\sim \mathcal{N}(\mathbf h(\theta),\mathbf C_w)$
or
$p(\mathbf{x})=\frac{1}{(2 \pi)^{\frac{N}{2}} \operatorname{det}\left(\mathbf{C}_{w}\right)^{\frac{1}{2}}} \exp \left[-\frac{1}{2}(\mathbf{x}-\mathbf{h}(\theta))^{T} \mathbf{C}_{w}^{-1}(\mathbf{x}-\mathbf{h}(\theta))\right]$
The score function:
$\frac{\partial \ln p(\mathbf{x} ; \theta)}{\partial \theta}=\frac{\partial \mathbf{h}^{T}(\theta)}{\partial \theta} \mathbf{C}_{w}^{-1}(\mathbf{x}-\mathbf{h}(\theta))$
and
$\frac{\partial^{2} \ln p(\mathbf{x} ; \theta)}{\partial \theta^{2}}=\frac{\partial^{2} \mathbf{h}^{T}(\theta)}{\partial \theta^{2}} \mathbf{C}_{w}^{-1}(\mathbf{x}-\mathbf{h}(\theta))-\frac{\partial \mathbf{h}^{T}(\theta)}{\partial \theta} \mathbf{C}_{w}^{-1} \frac{\partial \mathbf{h}(\theta)}{\partial \theta}$
Fisher information:
$I(\theta)=-{E}\left[\frac{\partial^{2} \ln p(\mathbf{x} ; \theta)}{\partial \theta^{2}}\right]=\frac{\partial \mathbf{h}^{T}(\theta)}{\partial \theta} \mathbf{C}_{w}^{-1} \frac{\partial \mathbf{h}(\theta)}{\partial \theta}$
CRLB:
$\operatorname{var}(\hat{\theta}) \geq \frac{1}{\frac{\partial \mathbf{h}^{T}(\theta)}{\partial \theta} \mathbf{C}_{w}^{-1} \frac{\partial \mathbf{h}(\theta)}{\partial \theta}}$

Linear Gaussian model

Consider the linear Gaussian model with $\mathbf h(\theta)=\mathbf h \theta$ :
$\mathbf x=\mathbf h\theta+\mathbf w \quad \mathbf w\sim \mathcal{N}(\mathbf 0,\mathbf C_w)$
From CRLB for a general Gaussian model, we can directly know
$\frac{\partial \ln p(\mathbf{x} ; \theta)}{\partial \theta}=\frac{\partial \mathbf{h}^{T}(\theta)}{\partial \theta} \mathbf{C}_{w}^{-1}(\mathbf{x}-\mathbf{h}(\theta)), \quad \operatorname{var}(\hat{\theta}) \geq \frac{1}{\mathbf{h}^{T} \mathbf{C}_{w}^{-1} \mathbf{h}}$
Furthermore,
$\begin{aligned} \frac{\partial \ln p(\mathbf{x} ; \theta)}{\partial \theta} &=\mathbf{h}^{T} \mathbf{C}_{w}^{-1}(\mathbf{x}-\mathbf{h} \theta) \\ &=\mathbf{h}^{T} \mathbf{C}_{w}^{-1} \mathbf{h}\left[\left(\mathbf{h}^{T} \mathbf{C}_{w}^{-1} \mathbf{h}\right)^{-1} \mathbf{h}^{T} \mathbf{C}_{w}^{-1} \mathbf{x}-\theta\right] \end{aligned}$

Thus, the MVU exists and its solution reaches the CRLB:
$\hat{\theta}=\left(\mathbf{h}^{T} \mathbf{C}_{w}^{-1} \mathbf{h}\right)^{-1} \mathbf{h}^{T} \mathbf{C}_{w}^{-1} \mathbf{x}$