Minimum Variance Unbiased Estimation (MVU)

Reference:
Kay S M. Fundamentals of statistical signal processing[M]. Prentice Hall PTR, 1993. (Chapter 2)
Slides of ET4386, TUD

An Example

在这里插入图片描述

Consider a process e.g., a constant in noise
x [ n ] = A + w [ n ] , n = 0 , … , N − 1 x[n]=A+w[n], \quad n=0, \ldots, N-1 x[n]=A+w[n],n=0,,N1
where, we assume

  • A A A is deterministic and unknown,
  • w [ n ] w[n] w[n] is a zero-mean random process with variance σ 2 \sigma^{2} σ2,
  • x [ n ] x[n] x[n] is the measured data.

Potential estimators for A A A:

  • A ^ 1 = x [ 0 ] \hat{A}_{1}=x[0] A^1=x[0]
  • A ^ 2 = 1 N ∑ n = 0 N − 1 x [ n ] \hat{A}_{2}=\frac{1}{N} \sum_{n=0}^{N-1} x[n] A^2=N1n=0N1x[n]
  • A ^ 3 = a N ∑ n = 0 N − 1 x [ n ] \hat{A}_{3}=\frac{a}{N} \sum_{n=0}^{N-1} x[n] A^3=Nan=0N1x[n], for some constant a a a
  • ⋯ \cdots

Which estimator is good (or optimal) ?

Mean Square Error Criterion

In searching for optimal estimators, we need to adopt some optimality criterion. A natural one is the mean square error (MSE), defined as
m s e ( θ ^ ) = E [ ( θ ^ − θ ) 2 ] \mathrm{mse}(\hat \theta)=E\left[(\hat \theta-\theta)^2\right] mse(θ^)=E[(θ^θ)2]
To get more insight, we can rewrite MSE as
m s e ( θ ^ ) = E [ ( θ ^ − E ( θ ^ ) + E ( θ ^ ) − θ ) 2 ] = E [ ( θ ^ − E ( θ ^ ) ) 2 ] + [ E ( θ ^ ) − θ ] 2 = var ⁡ ( θ ^ ) + b 2 ( θ ) \begin{aligned} \mathrm{mse}(\hat \theta)&=E\left[(\hat \theta-E(\hat \theta)+E(\hat \theta)-\theta)^2\right]\\ &=E[(\hat \theta-E(\hat \theta))^2]+[E(\hat \theta)-\theta]^2\\ &=\operatorname{var}(\hat \theta)+b^2(\theta) \end{aligned} mse(θ^)=E[(θ^E(θ^)+E(θ^)θ)2]=E[(θ^E(θ^))2]+[E(θ^)θ]2=var(θ^)+b2(θ)
which shows that the MSE is composed of errors due to the variance of the estimator as well as the bias. Unfortunately, adoption of this natural criterion leads to unrealizable estimators, ones that cannot be written solely as a function of the data.

For instance, consider the estimator
A ˇ = a 1 N ∑ n = 0 N − 1 x [ n ] \check A=a\frac{1}{N}\sum_{n=0}^{N-1}x[n] Aˇ=aN1n=0N1x[n]
for our example with some constant a a a. We will attempt to find the a a a which results in the minimum MSE. Since E ( A ˇ ) = a A E(\check A)=a A E(Aˇ)=aA and var ⁡ ( A ˇ ) = a 2 σ 2 / N , \operatorname{var}(\check A)=a^{2} \sigma^{2} / N, var(Aˇ)=a2σ2/N, we have
mse ⁡ ( A ˇ ) = a 2 σ 2 N + ( a − 1 ) 2 A 2 \operatorname{mse}(\check{A})=\frac{a^{2} \sigma^{2}}{N}+(a-1)^{2} A^{2} mse(Aˇ)=Na2σ2+(a1)2A2
Differentiating the MSE with respect to a a a yields
d mse ⁡ ( A ˇ ) d a = 2 a σ 2 N + 2 ( a − 1 ) A 2 \frac{d \operatorname{mse}(\check{A})}{d a}=\frac{2 a \sigma^{2}}{N}+2(a-1) A^{2} dadmse(Aˇ)=N2aσ2+2(a1)A2
which upon setting to zero and solving yields the optimum value
a o p t = A 2 A 2 + σ 2 / N a_{\mathrm{opt}}=\frac{A^{2}}{A^{2}+\sigma^{2} / N} aopt=A2+σ2/NA2
It is seen that, the optimal value of a a a depends upon the unknown parameter A A A. The estimator is therefore not realizable.

From a practical viewpoint the minimum MSE estimator needs to be abandoned. An alternative approach is to constrain the bias to be zero and find the estimator which minimizes the variance. Such an estimator is termed the minimum variance unbiased (MVU) estimator.

Minimum Variance Unbiased Estimator

Constrain the bias of the MSE to zero, i.e., consider E ( θ ^ ) = θ , {E}(\hat{\theta})=\theta, E(θ^)=θ, then
m s e ( θ ^ ) = E [ ( θ ^ − E ( θ ^ ) ) 2 ] + ( E ( θ ^ ) − θ ) 2 = var ⁡ ( θ ^ ) \mathrm{m s e}(\hat{\theta})={E}\left[(\hat{\theta}-{E}(\hat{\theta}))^{2}\right]+({E}(\hat{\theta})-\theta)^{2}=\operatorname{var}(\hat{\theta}) mse(θ^)=E[(θ^E(θ^))2]+(E(θ^)θ)2=var(θ^)
where θ ^ \hat{\theta} θ^ is an unbiased estimator, and let
var ⁡ ( θ ^ ) ≤ var ⁡ ( θ ~ ) \operatorname{var}(\hat{\theta}) \leq \operatorname{var}(\tilde{\theta}) var(θ^)var(θ~)
for any other unbiased estimator θ ~ , \tilde{\theta}, θ~, then θ ^ \hat{\theta} θ^ is the minimum variance unbiased estimator (MVU) for all θ \theta θ.

For the example, consider a more general estimator
A ^ = ∑ n = 0 N − 1 a n x [ n ] \hat A=\sum_{n=0}^{N-1}a_n x[n] A^=n=0N1anx[n]
To achieve unbiasedness, we should have
∑ n = 0 N − 1 a n = 1 \sum_{n=0}^{N-1}a_n=1 n=0N1an=1
The variance of A ^ \hat A A^ is
var ⁡ ( A ^ ) = ∑ n = 0 N − 1 a n 2 var ⁡ ( x [ n ] ) = σ 2 ∑ n = 0 N − 1 a n 2 \operatorname{var}(\hat A)=\sum_{n=0}^{N-1}a_n^2 \operatorname{var}(x[n])=\sigma^2\sum_{n=0}^{N-1}a_n^2 var(A^)=n=0N1an2var(x[n])=σ2n=0N1an2
Use Lagrangian multipliers with unbiasedness as the constraint equation. Let
L ( a , λ ) = σ 2 a T a − λ ( 1 T a ) L(\mathbf a,\lambda)=\sigma^2 \mathbf a^T \mathbf a-\lambda(\mathbf 1^T\mathbf a) L(a,λ)=σ2aTaλ(1Ta)
Differentiate L L L with respect to a \mathbf a a and set the result to zero:
2 σ 2 a − λ 1 = 0 2\sigma^2\mathbf a-\lambda \mathbf 1=0 2σ2aλ1=0
Combine it with the constraint ∑ n = 0 N − 1 a n = 1 \sum_{n=0}^{N-1}a_n=1 n=0N1an=1, we obtain
a = 1 N 1 , \mathbf a=\frac{1}{N}\mathbf 1, a=N11,
i.e.,
A ^ = 1 N ∑ n = 0 N − 1 x [ n ] \hat A=\frac{1}{N}\sum_{n=0}^{N-1}x[n] A^=N1n=0N1x[n]


Existence of the Minimum Variance Unbiased Estimator

The question arises as to whether a MVU estimator exists, i.e., an unbiased estimator with minimum variance for all θ \theta θ.

在这里插入图片描述

In general, the MVU estimator does not always exist.

Another example: Given a single observation x [ 0 ] x[0] x[0] from the distribution U [ 0 , 1 / θ ] \mathcal{U}[0,1/\theta] U[0,1/θ], it is desired to estimate θ \theta θ. It is assumed that θ > 0 \theta >0 θ>0. For an unbiased estimator, we must have
∫ 0 1 / θ θ g ( u ) d u = θ    ⟺    ∫ 0 1 / θ g ( u ) d u = 1 \int_0^{1/\theta}\theta g(u)du=\theta\iff \int_0^{1/\theta} g(u)du=1 01/θθg(u)du=θ01/θg(u)du=1
Assume that we can find a function g ( u ) g(u) g(u) such that for all θ > 0 \theta>0 θ>0, the condition above will be satisfied. Then for any θ 1 > θ 2 > 0 \theta_1>\theta_2>0 θ1>θ2>0, we have
∫ 0 1 / θ 1 g ( u ) d u = 1 , ∫ 0 1 / θ 2 g ( u ) d u = 1 ⟹ ∫ 1 / θ 1 1 / θ 2 g ( u ) d u = 0 \int_0^{1/\theta_1} g(u)du=1,\int_0^{1/\theta_2} g(u)du=1 \Longrightarrow \int_{1/\theta_1}^{1/\theta_2} g(u)du=0 01/θ1g(u)du=1,01/θ2g(u)du=11/θ11/θ2g(u)du=0
Clearly, we must have g ( u ) = 0 g(u)=0 g(u)=0 for all u u u, which produces a biased estimator.


Finding the Minimum Variance Unbiased Estimator

Even if a MVU estimator exists, we may not be able to find it. In the next few chapters we shall discuss several possible approaches. They are:

  1. Determine the Cramer-Rao lower bound (CRLB) and check to see if some estimator satisfies it (Chapters 3 and 4).
  2. Apply the Rao-Blackwell-Lehmann-Scheffe (RBLS) theorem (Chapter 5).
  3. Further restrict the class of estimators to be not only unbiased but also linear. Then, find the minimum variance estimator within this restricted class (Chapter 6).

Appendix: Some Useful Supplements

An estimator is unbiased does not necessarily mean that it is a good estimator. It only guarantees that on the average it will attain the true value. On the other hand, biased estimators are ones that are characterized by a systematic error, which presumably should not be present. A persistent bias will always result in a poor estimator.

在这里插入图片描述

It sometimes occurs that multiple estimates of the same parameter are available, i.e., { θ ^ 1 , θ ^ 2 , ⋯   , θ ^ n } \{\hat{\theta}_1,\hat{\theta}_2,\cdots,\hat{\theta}_n\} {θ^1,θ^2,,θ^n}. A reasonable procedure is to combine these estimates into a better one by averaging them to form
θ = 1 n ∑ i = 1 n θ ^ i \theta=\frac{1}{n}\sum_{i=1}^n \hat{\theta}_i θ=n1i=1nθ^i
Assuming the estimators are unbiased, with the same variance, and uncorrelated with each other,
E ( θ ^ ) = θ , var ⁡ ( θ ^ ) = 1 n 2 ∑ i = 1 n var ⁡ ( θ ^ i ) = var ⁡ ( θ ^ 1 ) n E(\hat \theta)=\theta,\quad \operatorname{var}(\hat \theta)=\frac{1}{n^2}\sum_{i=1}^n \operatorname{var}(\hat {\theta}_i)=\frac{\operatorname{var}(\hat {\theta}_1)}{n} E(θ^)=θ,var(θ^)=n21i=1nvar(θ^i)=nvar(θ^1)
so that as more estimates are averaged, the variance will decrease. Ultimately, as n → ∞ , θ ^ → θ n \to \infty, \hat \theta \to \theta n,θ^θ. However, if the estimators are biased, then no matter how many estimators are averaged, θ ^ \hat \theta θ^ will not converge to the true value, as is shown in the figure above.


The PDF of A ^ = 1 N ∑ n = 0 N − 1 x [ n ] \hat A=\frac{1}{N} \sum_{n=0}^{N-1} x[n] A^=N1n=0N1x[n] given in the example is N ( A , σ 2 / N ) \mathcal{N}(A,\sigma^2/N) N(A,σ2/N):

Note that w [ n ] ∼ N ( 0 , σ 2 ) w[n]\sim \mathcal{N}(0,\sigma^2) w[n]N(0,σ2), then x [ n ] ∼ N ( A , σ 2 ) x[n]\sim \mathcal{N}(A,\sigma^2) x[n]N(A,σ2). Since x [ n ] x[n] x[n] is independent to each other, A ^ \hat A A^ follows Gaussian distribution. It is easy to verify that E ( A ^ ) = A , var ⁡ ( A ^ ) = σ 2 / N E(\hat A)=A,\operatorname{var}(\hat A)=\sigma^2/N E(A^)=A,var(A^)=σ2/N. Thus
A ^ ∼ N ( A , σ 2 / N ) \hat A\sim \mathcal{N}(A,\sigma^2/N) A^N(A,σ2/N)
The estimator can be proved to be consistent, i.e., as N → ∞ , A ^ → A N\to \infty,\hat A \to A N,A^A by showing that
lim ⁡ N → ∞ Pr ⁡ { ∣ A ^ − A ∣ > ϵ } = 0 \lim_{N\to \infty}\Pr\{|\hat A-A|>\epsilon\}=0 NlimPr{A^A>ϵ}=0
for any ϵ > 0 \epsilon>0 ϵ>0:

Since
A ^ − A σ 2 / N ∼ N ( 0 , 1 ) \frac{\hat A-A}{\sqrt{\sigma^2/N}}\sim \mathcal{N}(0,1) σ2/N A^AN(0,1)

lim ⁡ N → ∞ Pr ⁡ { ∣ A ^ − A ∣ > ϵ } = lim ⁡ N → ∞ Pr ⁡ { ∣ A ^ − A σ 2 / N ∣ > ϵ σ 2 / N } = 0 \lim_{N\to \infty}\Pr\{|\hat A-A|>\epsilon\}=\lim_{N\to \infty}\Pr\left\{\left|\frac{\hat A -A}{\sqrt{\sigma^2/N}} \right|>\frac{\epsilon}{\sqrt{\sigma^2/N}}\right\}=0 NlimPr{A^A>ϵ}=NlimPr{σ2/N A^A>σ2/N ϵ}=0


A probabilistic perspective of minimum variance:

Two unbiased estimators are proposed whose variances satisfy var ⁡ ( θ ^ ) < var ⁡ ( θ ˇ ) \operatorname{var}(\hat \theta)<\operatorname{var}(\check \theta) var(θ^)<var(θˇ). If both estimators are Gaussian, prove that
Pr ⁡ { ∣ θ ^ − θ ∣ > ϵ } < Pr ⁡ { ∣ θ ˇ − θ ∣ > ϵ } \Pr \{|\hat \theta -\theta|>\epsilon\}<\Pr \{|\check \theta -\theta|>\epsilon\} Pr{θ^θ>ϵ}<Pr{θˇθ>ϵ}
for any ϵ \epsilon ϵ. This says that the estimator with less variance is to be preferred since its PDF is more concentrated about the true value.

Since
θ ^ − θ var ⁡ ( θ ^ ) ∼ N ( 0 , 1 ) , θ ˇ − θ var ⁡ ( θ ˇ ) ∼ N ( 0 , 1 ) \frac{\hat \theta-\theta}{\sqrt{\operatorname{var}(\hat \theta)}}\sim \mathcal{N}(0,1),\quad \frac{\check \theta-\theta}{\sqrt{\operatorname{var}(\check \theta)}}\sim \mathcal{N}(0,1) var(θ^) θ^θN(0,1),var(θˇ) θˇθN(0,1)
Let the cumulative distribution function for N ( 0 , 1 ) \mathcal{N}(0,1) N(0,1)
Φ ( x ) = ∫ − ∞ x 1 2 π e − 1 2 t 2 d t \Phi (x)=\int_{-\infty}^x \frac{1}{\sqrt{2\pi}}e^{-\frac{1}{2}t^2}dt Φ(x)=x2π 1e21t2dt
Then
Pr ⁡ { ∣ θ ^ − θ ∣ > ϵ } = Pr ⁡ { ∣ θ ^ − θ var ⁡ ( θ ^ ) ∣ > ϵ var ⁡ ( θ ^ ) } = 2 Φ { − ϵ var ⁡ ( θ ^ ) } \Pr\{|\hat \theta -\theta|>\epsilon\}=\Pr\left\{\left|\frac{\hat \theta -\theta}{\sqrt{\operatorname{var}(\hat \theta)}} \right|>\frac{\epsilon}{\sqrt{\operatorname{var}(\hat \theta)}}\right\}=2\Phi\left\{\frac{-\epsilon}{\sqrt{\operatorname{var}(\hat \theta)}} \right\} Pr{θ^θ>ϵ}=Prvar(θ^) θ^θ>var(θ^) ϵ=2Φvar(θ^) ϵ
If var ⁡ ( θ ^ ) < var ⁡ ( θ ˇ ) {\operatorname{var}(\hat \theta)}<{\operatorname{var}(\check \theta)} var(θ^)<var(θˇ),
Φ { − ϵ var ⁡ ( θ ^ ) } < Φ { − ϵ var ⁡ ( θ ˇ ) } \Phi\left\{\frac{-\epsilon}{\sqrt{\operatorname{var}(\hat \theta)}} \right\}<\Phi\left\{\frac{-\epsilon}{\sqrt{\operatorname{var}(\check \theta)}} \right\} Φvar(θ^) ϵ<Φvar(θˇ) ϵ
or Pr ⁡ { ∣ θ ^ − θ ∣ > ϵ } < Pr ⁡ { ∣ θ ˇ − θ ∣ > ϵ } \Pr \{|\hat \theta -\theta|>\epsilon\}<\Pr \{|\check \theta -\theta|>\epsilon\} Pr{θ^θ>ϵ}<Pr{θˇθ>ϵ}.


What will happen if an unbiased estimator undergoes a nonlinear transformation? For instance, if we choose to estimate the unknown parameter θ = A 2 \theta=A^2 θ=A2 by
θ ^ = ( 1 N ∑ n = 0 N − 1 x [ n ] ) 2 , \hat \theta =\left( \frac{1}{N}\sum_{n=0}^{N-1}x[n]\right)^2, θ^=(N1n=0N1x[n])2,
can we say that the estimator is unbiased? What happens as N → ∞ N\to \infty N?

We know that
θ ^ = A ^ 2 A ^ ∼ N ( A , σ 2 / N ) \hat \theta={\hat A}^2\quad \hat A \sim \mathcal{N}(A,\sigma^2/N) θ^=A^2A^N(A,σ2/N)
Therefore,
E ( θ ^ ) = E ( A ^ 2 ) = var ⁡ ( A ^ ) + E 2 ( A ^ ) = σ 2 / N + A 2 = σ 2 / N + θ ≠ θ E(\hat \theta)=E(\hat {A}^2)=\operatorname{var}(\hat A)+E^2(\hat A)=\sigma^2/N+A^2=\sigma^2/N+\theta\ne \theta E(θ^)=E(A^2)=var(A^)+E2(A^)=σ2/N+A2=σ2/N+θ=θ
which is biased but asymptotically unbiased.


In our example, if the value of σ 2 \sigma^2 σ2 is also unknown, an unbiased estimator is
θ ^ = [ A ^ σ ^ 2 ] = [ 1 N ∑ n = 0 N − 1 x [ n ] 1 N − 1 ∑ n = 0 N − 1 ( x [ n ] − A ^ ) 2 ] \hat {\boldsymbol{\theta}}=\left[\begin{matrix}\hat A\\\hat {\sigma}^2\end{matrix}\right]=\left[\begin{matrix}\frac{1}{N} \sum_{n=0}^{N-1} x[n]\\\frac{1}{N-1} \sum_{n=0}^{N-1} (x[n]-\hat A)^2\end{matrix}\right] θ^=[A^σ^2]=[N1n=0N1x[n]N11n=0N1(x[n]A^)2]

1、资源项目源码均已通过严格测试验证,保证能够正常运行; 2、项目问题、技术讨论,可以给博主私信或留言,博主看到后会第一时间与您进行沟通; 3、本项目比较适合计算机领域相关的毕业设计课题、课程作业等使用,尤其对于人工智能、计算机科学与技术等相关专业,更为适合; 4、下载使用后,可先查看README.md文件(如有),本项目仅用作交流学习参考,请切勿用于商业用途。1、资源项目源码均已通过严格测试验证,保证能够正常运行; 2、项目问题、技术讨论,可以给博主私信或留言,博主看到后会第一时间与您进行沟通; 3、本项目比较适合计算机领域相关的毕业设计课题、课程作业等使用,尤其对于人工智能、计算机科学与技术等相关专业,更为适合; 4、下载使用后,可先查看README.md文件(如有),本项目仅用作交流学习参考,请切勿用于商业用途。1、资源项目源码均已通过严格测试验证,保证能够正常运行; 2、项目问题、技术讨论,可以给博主私信或留言,博主看到后会第一时间与您进行沟通; 3、本项目比较适合计算机领域相关的毕业设计课题、课程作业等使用,尤其对于人工智能、计算机科学与技术等相关专业,更为适合; 4、下载使用后,可先查看README.md文件(如有),本项目仅用作交流学习参考,请切勿用于商业用途。1、资源项目源码均已通过严格测试验证,保证能够正常运行; 2、项目问题、技术讨论,可以给博主私信或留言,博主看到后会第一时间与您进行沟通; 3、本项目比较适合计算机领域相关的毕业设计课题、课程作业等使用,尤其对于人工智能、计算机科学与技术等相关专业,更为适合; 4、下载使用后,可先查看README.md文件(如有),本项目仅用作交流学习参考,请切勿用于商业用途。1、资源项目源码均已通过严格测试验证,保证能够正常运行; 2、项目问题、技术讨论,可以给博主私信或留言,博主看到后会第一时间与您进行沟通; 3、本项目比较适合计算机领域相关的毕业设计课题、课程作业等使用,尤其对于人工智能、计算机科学与技术等相关专业,更为适合; 4、下载使用后,可先查看README.md文件(如有),本项目仅用作交流学习参考,请切勿用于商业用途。1、资源项目源码均已通过严格测试验证,保证能够正常运行; 2、项目问题、技术讨论,可以给博主私信或留言,博主看到后会第一时间与您进行沟通; 3、本项目比较适合计算机领域相关的毕业设计课题、课程作业等使用,尤其对于人工智能、计算机科学与技术等相关专业,更为适合; 4、下载使用后,可先查看README.md文件(如有),本项目仅用作交流学习参考,请切勿用于商业用途。1、资源项目源码均已通过严格测试验证,保证能够正常运行; 2、项目问题、技术讨论,可以给博主私信或留言,博主看到后会第一时间与您进行沟通; 3、本项目比较适合计算机领域相关的毕业设计课题、课程作业等使用,尤其对于人工智能、计算机科学与技术等相关专业,更为适合; 4、下载使用后,可先查看README.md文件(如有),本项目仅用作交流学习参考,请切勿用于商业用途。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值