Minimum Variance Unbiased Estimation (MVU)

Reference:
Kay S M. Fundamentals of statistical signal processing[M]. Prentice Hall PTR, 1993. (Chapter 2)
Slides of ET4386, TUD

An Example

在这里插入图片描述

Consider a process e.g., a constant in noise
x [ n ] = A + w [ n ] , n = 0 , … , N − 1 x[n]=A+w[n], \quad n=0, \ldots, N-1 x[n]=A+w[n],n=0,,N1
where, we assume

  • A A A is deterministic and unknown,
  • w [ n ] w[n] w[n] is a zero-mean random process with variance σ 2 \sigma^{2} σ2,
  • x [ n ] x[n] x[n] is the measured data.

Potential estimators for A A A:

  • A ^ 1 = x [ 0 ] \hat{A}_{1}=x[0] A^1=x[0]
  • A ^ 2 = 1 N ∑ n = 0 N − 1 x [ n ] \hat{A}_{2}=\frac{1}{N} \sum_{n=0}^{N-1} x[n] A^2=N1n=0N1x[n]
  • A ^ 3 = a N ∑ n = 0 N − 1 x [ n ] \hat{A}_{3}=\frac{a}{N} \sum_{n=0}^{N-1} x[n] A^3=Nan=0N1x[n], for some constant a a a
  • ⋯ \cdots

Which estimator is good (or optimal) ?

Mean Square Error Criterion

In searching for optimal estimators, we need to adopt some optimality criterion. A natural one is the mean square error (MSE), defined as
m s e ( θ ^ ) = E [ ( θ ^ − θ ) 2 ] \mathrm{mse}(\hat \theta)=E\left[(\hat \theta-\theta)^2\right] mse(θ^)=E[(θ^θ)2]
To get more insight, we can rewrite MSE as
m s e ( θ ^ ) = E [ ( θ ^ − E ( θ ^ ) + E ( θ ^ ) − θ ) 2 ] = E [ ( θ ^ − E ( θ ^ ) ) 2 ] + [ E ( θ ^ ) − θ ] 2 = var ⁡ ( θ ^ ) + b 2 ( θ ) \begin{aligned} \mathrm{mse}(\hat \theta)&=E\left[(\hat \theta-E(\hat \theta)+E(\hat \theta)-\theta)^2\right]\\ &=E[(\hat \theta-E(\hat \theta))^2]+[E(\hat \theta)-\theta]^2\\ &=\operatorname{var}(\hat \theta)+b^2(\theta) \end{aligned} mse(θ^)=E[(θ^E(θ^)+E(θ^)θ)2]=E[(θ^E(θ^))2]+[E(θ^)θ]2=var(θ^)+b2(θ)
which shows that the MSE is composed of errors due to the variance of the estimator as well as the bias. Unfortunately, adoption of this natural criterion leads to unrealizable estimators, ones that cannot be written solely as a function of the data.

For instance, consider the estimator
A ˇ = a 1 N ∑ n = 0 N − 1 x [ n ] \check A=a\frac{1}{N}\sum_{n=0}^{N-1}x[n] Aˇ=aN1n=0N1x[n]
for our example with some constant a a a. We will attempt to find the a a a which results in the minimum MSE. Since E ( A ˇ ) = a A E(\check A)=a A E(Aˇ)=aA and var ⁡ ( A ˇ ) = a 2 σ 2 / N , \operatorname{var}(\check A)=a^{2} \sigma^{2} / N, var(Aˇ)=a2σ2/N, we have
mse ⁡ ( A ˇ ) = a 2 σ 2 N + ( a − 1 ) 2 A 2 \operatorname{mse}(\check{A})=\frac{a^{2} \sigma^{2}}{N}+(a-1)^{2} A^{2} mse(Aˇ)=Na2σ2+(a1)2A2
Differentiating the MSE with respect to a a a yields
d mse ⁡ ( A ˇ ) d a = 2 a σ 2 N + 2 ( a − 1 ) A 2 \frac{d \operatorname{mse}(\check{A})}{d a}=\frac{2 a \sigma^{2}}{N}+2(a-1) A^{2} dadmse(Aˇ)=N2aσ2+2(a1)A2
which upon setting to zero and solving yields the optimum value
a o p t = A 2 A 2 + σ 2 / N a_{\mathrm{opt}}=\frac{A^{2}}{A^{2}+\sigma^{2} / N} aopt=A2+σ2/NA2
It is seen that, the optimal value of a a a depends upon the unknown parameter A A A. The estimator is therefore not realizable.

From a practical viewpoint the minimum MSE estimator needs to be abandoned. An alternative approach is to constrain the bias to be zero and find the estimator which minimizes the variance. Such an estimator is termed the minimum variance unbiased (MVU) estimator.

Minimum Variance Unbiased Estimator

Constrain the bias of the MSE to zero, i.e., consider E ( θ ^ ) = θ , {E}(\hat{\theta})=\theta, E(θ^)=θ, then
m s e ( θ ^ ) = E [ ( θ ^ − E ( θ ^ ) ) 2 ] + ( E ( θ ^ ) − θ ) 2 = var ⁡ ( θ ^ ) \mathrm{m s e}(\hat{\theta})={E}\left[(\hat{\theta}-{E}(\hat{\theta}))^{2}\right]+({E}(\hat{\theta})-\theta)^{2}=\operatorname{var}(\hat{\theta}) mse(θ^)=E[(θ^E(θ^))2]+(E(θ^)θ)2=var(θ^)
where θ ^ \hat{\theta} θ^ is an unbiased estimator, and let
var ⁡ ( θ ^ ) ≤ var ⁡ ( θ ~ ) \operatorname{var}(\hat{\theta}) \leq \operatorname{var}(\tilde{\theta}) var(θ^)var(θ~)
for any other unbiased estimator θ ~ , \tilde{\theta}, θ~, then θ ^ \hat{\theta} θ^ is the minimum variance unbiased estimator (MVU) for all θ \theta θ.

For the example, consider a more general estimator
A ^ = ∑ n = 0 N − 1 a n x [ n ] \hat A=\sum_{n=0}^{N-1}a_n x[n] A^=n=0N1anx[n]
To achieve unbiasedness, we should have
∑ n = 0 N − 1 a n = 1 \sum_{n=0}^{N-1}a_n=1 n=0N1an=1
The variance of A ^ \hat A A^ is
var ⁡ ( A ^ ) = ∑ n = 0 N − 1 a n 2 var ⁡ ( x [ n ] ) = σ 2 ∑ n = 0 N − 1 a n 2 \operatorname{var}(\hat A)=\sum_{n=0}^{N-1}a_n^2 \operatorname{var}(x[n])=\sigma^2\sum_{n=0}^{N-1}a_n^2 var(A^)=n=0N1an2var(x[n])=σ2n=0N1an2
Use Lagrangian multipliers with unbiasedness as the constraint equation. Let
L ( a , λ ) = σ 2 a T a − λ ( 1 T a ) L(\mathbf a,\lambda)=\sigma^2 \mathbf a^T \mathbf a-\lambda(\mathbf 1^T\mathbf a) L(a,λ)=σ2aTaλ(1Ta)
Differentiate L L L with respect to a \mathbf a a and set the result to zero:
2 σ 2 a − λ 1 = 0 2\sigma^2\mathbf a-\lambda \mathbf 1=0 2σ2aλ1=0
Combine it with the constraint ∑ n = 0 N − 1 a n = 1 \sum_{n=0}^{N-1}a_n=1 n=0N1an=1, we obtain
a = 1 N 1 , \mathbf a=\frac{1}{N}\mathbf 1, a=N11,
i.e.,
A ^ = 1 N ∑ n = 0 N − 1 x [ n ] \hat A=\frac{1}{N}\sum_{n=0}^{N-1}x[n] A^=N1n=0N1x[n]


Existence of the Minimum Variance Unbiased Estimator

The question arises as to whether a MVU estimator exists, i.e., an unbiased estimator with minimum variance for all θ \theta θ.

在这里插入图片描述

In general, the MVU estimator does not always exist.

Another example: Given a single observation x [ 0 ] x[0] x[0] from the distribution U [ 0 , 1 / θ ] \mathcal{U}[0,1/\theta] U[0,1/θ], it is desired to estimate θ \theta θ. It is assumed that θ > 0 \theta >0 θ>0. For an unbiased estimator, we must have
∫ 0 1 / θ θ g ( u ) d u = θ    ⟺    ∫ 0 1 / θ g ( u ) d u = 1 \int_0^{1/\theta}\theta g(u)du=\theta\iff \int_0^{1/\theta} g(u)du=1 01/θθg(u)du=θ01/θg(u)du=1
Assume that we can find a function g ( u ) g(u) g(u) such that for all θ > 0 \theta>0 θ>0, the condition above will be satisfied. Then for any θ 1 > θ 2 > 0 \theta_1>\theta_2>0 θ1>θ2>0, we have
∫ 0 1 / θ 1 g ( u ) d u = 1 , ∫ 0 1 / θ 2 g ( u ) d u = 1 ⟹ ∫ 1 / θ 1 1 / θ 2 g ( u ) d u = 0 \int_0^{1/\theta_1} g(u)du=1,\int_0^{1/\theta_2} g(u)du=1 \Longrightarrow \int_{1/\theta_1}^{1/\theta_2} g(u)du=0 01/θ1g(u)du=1,01/θ2g(u)du=11/θ11/θ2g(u)du=0
Clearly, we must have g ( u ) = 0 g(u)=0 g(u)=0 for all u u u, which produces a biased estimator.


Finding the Minimum Variance Unbiased Estimator

Even if a MVU estimator exists, we may not be able to find it. In the next few chapters we shall discuss several possible approaches. They are:

  1. Determine the Cramer-Rao lower bound (CRLB) and check to see if some estimator satisfies it (Chapters 3 and 4).
  2. Apply the Rao-Blackwell-Lehmann-Scheffe (RBLS) theorem (Chapter 5).
  3. Further restrict the class of estimators to be not only unbiased but also linear. Then, find the minimum variance estimator within this restricted class (Chapter 6).

Appendix: Some Useful Supplements

An estimator is unbiased does not necessarily mean that it is a good estimator. It only guarantees that on the average it will attain the true value. On the other hand, biased estimators are ones that are characterized by a systematic error, which presumably should not be present. A persistent bias will always result in a poor estimator.

在这里插入图片描述

It sometimes occurs that multiple estimates of the same parameter are available, i.e., { θ ^ 1 , θ ^ 2 , ⋯   , θ ^ n } \{\hat{\theta}_1,\hat{\theta}_2,\cdots,\hat{\theta}_n\} {θ^1,θ^2,,θ^n}. A reasonable procedure is to combine these estimates into a better one by averaging them to form
θ = 1 n ∑ i = 1 n θ ^ i \theta=\frac{1}{n}\sum_{i=1}^n \hat{\theta}_i θ=n1i=1nθ^i
Assuming the estimators are unbiased, with the same variance, and uncorrelated with each other,
E ( θ ^ ) = θ , var ⁡ ( θ ^ ) = 1 n 2 ∑ i = 1 n var ⁡ ( θ ^ i ) = var ⁡ ( θ ^ 1 ) n E(\hat \theta)=\theta,\quad \operatorname{var}(\hat \theta)=\frac{1}{n^2}\sum_{i=1}^n \operatorname{var}(\hat {\theta}_i)=\frac{\operatorname{var}(\hat {\theta}_1)}{n} E(θ^)=θ,var(θ^)=n21i=1nvar(θ^i)=nvar(θ^1)
so that as more estimates are averaged, the variance will decrease. Ultimately, as n → ∞ , θ ^ → θ n \to \infty, \hat \theta \to \theta n,θ^θ. However, if the estimators are biased, then no matter how many estimators are averaged, θ ^ \hat \theta θ^ will not converge to the true value, as is shown in the figure above.


The PDF of A ^ = 1 N ∑ n = 0 N − 1 x [ n ] \hat A=\frac{1}{N} \sum_{n=0}^{N-1} x[n] A^=N1n=0N1x[n] given in the example is N ( A , σ 2 / N ) \mathcal{N}(A,\sigma^2/N) N(A,σ2/N):

Note that w [ n ] ∼ N ( 0 , σ 2 ) w[n]\sim \mathcal{N}(0,\sigma^2) w[n]N(0,σ2), then x [ n ] ∼ N ( A , σ 2 ) x[n]\sim \mathcal{N}(A,\sigma^2) x[n]N(A,σ2). Since x [ n ] x[n] x[n] is independent to each other, A ^ \hat A A^ follows Gaussian distribution. It is easy to verify that E ( A ^ ) = A , var ⁡ ( A ^ ) = σ 2 / N E(\hat A)=A,\operatorname{var}(\hat A)=\sigma^2/N E(A^)=A,var(A^)=σ2/N. Thus
A ^ ∼ N ( A , σ 2 / N ) \hat A\sim \mathcal{N}(A,\sigma^2/N) A^N(A,σ2/N)
The estimator can be proved to be consistent, i.e., as N → ∞ , A ^ → A N\to \infty,\hat A \to A N,A^A by showing that
lim ⁡ N → ∞ Pr ⁡ { ∣ A ^ − A ∣ > ϵ } = 0 \lim_{N\to \infty}\Pr\{|\hat A-A|>\epsilon\}=0 NlimPr{A^A>ϵ}=0
for any ϵ > 0 \epsilon>0 ϵ>0:

Since
A ^ − A σ 2 / N ∼ N ( 0 , 1 ) \frac{\hat A-A}{\sqrt{\sigma^2/N}}\sim \mathcal{N}(0,1) σ2/N A^AN(0,1)

lim ⁡ N → ∞ Pr ⁡ { ∣ A ^ − A ∣ > ϵ } = lim ⁡ N → ∞ Pr ⁡ { ∣ A ^ − A σ 2 / N ∣ > ϵ σ 2 / N } = 0 \lim_{N\to \infty}\Pr\{|\hat A-A|>\epsilon\}=\lim_{N\to \infty}\Pr\left\{\left|\frac{\hat A -A}{\sqrt{\sigma^2/N}} \right|>\frac{\epsilon}{\sqrt{\sigma^2/N}}\right\}=0 NlimPr{A^A>ϵ}=NlimPr{σ2/N A^A>σ2/N ϵ}=0


A probabilistic perspective of minimum variance:

Two unbiased estimators are proposed whose variances satisfy var ⁡ ( θ ^ ) < var ⁡ ( θ ˇ ) \operatorname{var}(\hat \theta)<\operatorname{var}(\check \theta) var(θ^)<var(θˇ). If both estimators are Gaussian, prove that
Pr ⁡ { ∣ θ ^ − θ ∣ > ϵ } < Pr ⁡ { ∣ θ ˇ − θ ∣ > ϵ } \Pr \{|\hat \theta -\theta|>\epsilon\}<\Pr \{|\check \theta -\theta|>\epsilon\} Pr{θ^θ>ϵ}<Pr{θˇθ>ϵ}
for any ϵ \epsilon ϵ. This says that the estimator with less variance is to be preferred since its PDF is more concentrated about the true value.

Since
θ ^ − θ var ⁡ ( θ ^ ) ∼ N ( 0 , 1 ) , θ ˇ − θ var ⁡ ( θ ˇ ) ∼ N ( 0 , 1 ) \frac{\hat \theta-\theta}{\sqrt{\operatorname{var}(\hat \theta)}}\sim \mathcal{N}(0,1),\quad \frac{\check \theta-\theta}{\sqrt{\operatorname{var}(\check \theta)}}\sim \mathcal{N}(0,1) var(θ^) θ^θN(0,1),var(θˇ) θˇθN(0,1)
Let the cumulative distribution function for N ( 0 , 1 ) \mathcal{N}(0,1) N(0,1)
Φ ( x ) = ∫ − ∞ x 1 2 π e − 1 2 t 2 d t \Phi (x)=\int_{-\infty}^x \frac{1}{\sqrt{2\pi}}e^{-\frac{1}{2}t^2}dt Φ(x)=x2π 1e21t2dt
Then
Pr ⁡ { ∣ θ ^ − θ ∣ > ϵ } = Pr ⁡ { ∣ θ ^ − θ var ⁡ ( θ ^ ) ∣ > ϵ var ⁡ ( θ ^ ) } = 2 Φ { − ϵ var ⁡ ( θ ^ ) } \Pr\{|\hat \theta -\theta|>\epsilon\}=\Pr\left\{\left|\frac{\hat \theta -\theta}{\sqrt{\operatorname{var}(\hat \theta)}} \right|>\frac{\epsilon}{\sqrt{\operatorname{var}(\hat \theta)}}\right\}=2\Phi\left\{\frac{-\epsilon}{\sqrt{\operatorname{var}(\hat \theta)}} \right\} Pr{θ^θ>ϵ}=Prvar(θ^) θ^θ>var(θ^) ϵ=2Φvar(θ^) ϵ
If var ⁡ ( θ ^ ) < var ⁡ ( θ ˇ ) {\operatorname{var}(\hat \theta)}<{\operatorname{var}(\check \theta)} var(θ^)<var(θˇ),
Φ { − ϵ var ⁡ ( θ ^ ) } < Φ { − ϵ var ⁡ ( θ ˇ ) } \Phi\left\{\frac{-\epsilon}{\sqrt{\operatorname{var}(\hat \theta)}} \right\}<\Phi\left\{\frac{-\epsilon}{\sqrt{\operatorname{var}(\check \theta)}} \right\} Φvar(θ^) ϵ<Φvar(θˇ) ϵ
or Pr ⁡ { ∣ θ ^ − θ ∣ > ϵ } < Pr ⁡ { ∣ θ ˇ − θ ∣ > ϵ } \Pr \{|\hat \theta -\theta|>\epsilon\}<\Pr \{|\check \theta -\theta|>\epsilon\} Pr{θ^θ>ϵ}<Pr{θˇθ>ϵ}.


What will happen if an unbiased estimator undergoes a nonlinear transformation? For instance, if we choose to estimate the unknown parameter θ = A 2 \theta=A^2 θ=A2 by
θ ^ = ( 1 N ∑ n = 0 N − 1 x [ n ] ) 2 , \hat \theta =\left( \frac{1}{N}\sum_{n=0}^{N-1}x[n]\right)^2, θ^=(N1n=0N1x[n])2,
can we say that the estimator is unbiased? What happens as N → ∞ N\to \infty N?

We know that
θ ^ = A ^ 2 A ^ ∼ N ( A , σ 2 / N ) \hat \theta={\hat A}^2\quad \hat A \sim \mathcal{N}(A,\sigma^2/N) θ^=A^2A^N(A,σ2/N)
Therefore,
E ( θ ^ ) = E ( A ^ 2 ) = var ⁡ ( A ^ ) + E 2 ( A ^ ) = σ 2 / N + A 2 = σ 2 / N + θ ≠ θ E(\hat \theta)=E(\hat {A}^2)=\operatorname{var}(\hat A)+E^2(\hat A)=\sigma^2/N+A^2=\sigma^2/N+\theta\ne \theta E(θ^)=E(A^2)=var(A^)+E2(A^)=σ2/N+A2=σ2/N+θ=θ
which is biased but asymptotically unbiased.


In our example, if the value of σ 2 \sigma^2 σ2 is also unknown, an unbiased estimator is
θ ^ = [ A ^ σ ^ 2 ] = [ 1 N ∑ n = 0 N − 1 x [ n ] 1 N − 1 ∑ n = 0 N − 1 ( x [ n ] − A ^ ) 2 ] \hat {\boldsymbol{\theta}}=\left[\begin{matrix}\hat A\\\hat {\sigma}^2\end{matrix}\right]=\left[\begin{matrix}\frac{1}{N} \sum_{n=0}^{N-1} x[n]\\\frac{1}{N-1} \sum_{n=0}^{N-1} (x[n]-\hat A)^2\end{matrix}\right] θ^=[A^σ^2]=[N1n=0N1x[n]N11n=0N1(x[n]A^)2]

在Python中实现Minimum Variance Distortionless Response (MVDR)(最小均方误差无失真响应)信号频率估计方法通常用于处理噪声环境下的方向性滤波。以下是一个简化的步骤和基本的代码示例: **步骤**: 1. **数据预处理**:首先,你需要有一个二维的数据矩阵,其中每一行代表一个频域样本,可以由快速傅立叶变换(FFT)生成。 2. **计算协方差矩阵**:从数据中计算噪声的自相关矩阵,这通常是通过样本之间的平均值减去单个样本的均值得到的。 3. **确定参考方向**:选择一个假设的目标信号方向作为MVDR的方向,例如最大幅度的方向。 4. **构造MVDR滤波器**:基于噪声自相关矩阵和目标方向,计算MVDR滤波器系数。这涉及求解逆协方差矩阵与单位向量指向参考方向的点积。 5. **应用滤波器**:将滤波器应用于原始数据,以得到去噪后的信号估计。 **代码示例(简化版)**(假设`data`是二维的频域数据矩阵,`noise_covariance`是噪声自相关矩阵,`target_direction`是目标角度): ```python import numpy as np # 假设你已经有了噪声自相关矩阵和目标方向 target_direction = np.pi / 2 # 单位圆上,这里假设为目标信号在正半轴 data = np.random.randn(*data.shape) # 步骤 3 和 4 合并 mvdr_filter = -np.linalg.inv(noise_covariance).dot(np.cos(target_direction)) # 步骤 5 filtered_data = np.dot(data, mvdr_filter) # 请注意,这只是一个基础版本的MVDR估计,实际应用中可能会有更复杂的处理如迭代调整、谱估计算等 ```
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值