本文为 I n t r o d u c t i o n Introduction Introduction t o to to P r o b a b i l i t y Probability Probability 的读书笔记
目录
Bayesian Linear LMS Estimation
- In this section, we derive an estimator that minimizes the mean squared error within a restricted class of estimators: those that are linear functions of the observations. While this estimator may result in higher mean squared error, it has a significant practical advantage: it requires simple calculations. It is thus a useful alternative to the conditional expectation/LMS estimator in cases where the latter is hard to compute.
- A linear estimator of a random variable
Θ
\Theta
Θ, based on observations
X
1
,
.
.
.
,
X
n
X_1, ... , X_n
X1,...,Xn, has the form
Θ ^ = a 1 X 1 + . . . + a n X n + b \hat\Theta=a_1X_1+...+a_nX_n+b Θ^=a1X1+...+anXn+bGiven a particular choice of the scalars a 1 , . . . , a n , b a_1, ... , a_n,b a1,...,an,b, the corresponding mean squared error is
E [ ( Θ − a 1 X 1 − . . . − a n X n − b ) 2 ] E[(\Theta-a_1X_1-...-a_nX_n-b)^2] E[(Θ−a1X1−...−anXn−b)2]The linear LMS estimator chooses a 1 , . . . , a n , b a_1, ... , a_n,b a1,...,an,b to minimize the above expression.
Linear LMS Estimation Based on a Single Observation
- We are interested in finding a a a and b b b that minimize the mean squared estimation E [ ( Θ − a X − b ) 2 ] E[(\Theta-aX-b)^2] E[(Θ−aX−b)2] associated with linear estimator a X + b aX+b aX+b of Θ \Theta Θ.
- Suppose that
a
a
a has already been chosen. How should we choose
b
b
b? This is the same as choosing a constant
b
b
b to estimate the random variable
Θ
−
a
X
\Theta -aX
Θ−aX.
E [ ( Θ − a X − b ) 2 ] = v a r ( Θ − a X − b ) + ( E [ Θ − a X − b ] ) 2 = v a r ( Θ − a X ) + ( E [ Θ − a X ] − b ) 2 \begin{aligned}E[(\Theta -aX-b)^2]&=var(\Theta -aX-b)+(E[\Theta -aX-b])^2 \\&=var(\Theta -aX)+(E[\Theta -aX]-b)^2\end{aligned} E[(Θ−aX−b)2]=var(Θ−aX−b)+(E[Θ−aX−b])2=var(Θ−aX)+(E[Θ−aX]−b)2The best choice is
b = E [ Θ − a X ] = E [ Θ ] − a E [ X ] b=E[\Theta -aX]=E[\Theta]-aE[X] b=E[Θ−aX]=E[Θ]−aE[X] - With this choice of
b
b
b, it remains to minimize, with respect to
a
a
a, the expression
E [ ( Θ − a X − E [ Θ ] + a E [ X ] ) 2 ] = v a r ( Θ − a X ) = σ Θ 2 + a 2 σ X 2 − 2 a ⋅ c o v ( Θ , X ) \begin{aligned}E[(\Theta -aX-E[\Theta]+aE[X])^2]&=var(\Theta -aX)\\ &=\sigma_\Theta^2+a^2\sigma_X^2-2a\cdot cov(\Theta,X) \end{aligned} E[(Θ−aX−E[Θ]+aE[X])2]=var(Θ−aX)=σΘ2+a2σX2−2a⋅cov(Θ,X)To minimize it (a quadratic function of a a a), we set its derivative to zero and solve for a a a. This yields
a = c o v ( Θ , X ) σ X 2 = ρ σ Θ σ X σ X 2 = ρ σ Θ σ X a=\frac{cov(\Theta,X)}{\sigma_X^2}=\frac{\rho\sigma_\Theta\sigma_X}{\sigma_X^2}=\rho\frac{\sigma_\Theta}{\sigma_X} a=σX2cov(Θ,X)=σX2ρσΘσX=ρσXσΘ - With this choice of
a
a
a, the linear LMS estimator
Θ
^
\hat\Theta
Θ^ of
Θ
\Theta
Θ based on
X
X
X is
Θ ^ = a ( X − E [ X ] ) + E [ Θ ] = ρ σ Θ σ X ( X − E [ X ] ) + E [ Θ ] \hat\Theta=a(X-E[X])+E[\Theta]=\rho\frac{\sigma_\Theta}{\sigma_X}(X-E[X])+E[\Theta] Θ^=a(X−E[X])+E[Θ]=ρσXσΘ(X−E[X])+E[Θ]The mean squared estimation error of the resulting linear estimator Θ ^ \hat\Theta Θ^ is given by
E [ ( Θ − Θ ^ ) 2 ] = σ Θ 2 + a 2 σ X 2 − 2 a ⋅ c o v ( Θ , X ) = ( 1 − ρ 2 ) σ Θ 2 E[(\Theta-\hat\Theta)^2]=\sigma_\Theta^2+a^2\sigma_X^2-2a\cdot cov(\Theta,X)=(1-\rho^2)\sigma_\Theta^2 E[(Θ−Θ^)2]=σΘ2+a2σX2−2a⋅cov(Θ,X)=(1−ρ2)σΘ2
- The formula for the linear LMS estimator only involves the means, variances, and covariance of Θ \Theta Θ and X X X.
- Furthermore, it has an intuitive interpretation. Suppose, for concreteness, that the correlation coefficient ρ \rho ρ is positive. The estimater starts with the baseline estimate E [ Θ ] E[\Theta] E[Θ] for Θ \Theta Θ, which it then adjusts by taking into account the value of X − E [ X ] X - E[X] X−E[X]. For example, when X X X is larger than its mean, the positive correlation between X X X and Θ \Theta Θ suggests that Θ \Theta Θ is expected to be larger than its mean. Accordingly, the resulting estimate is set to a value larger than E [ Θ ] E[\Theta] E[Θ]. The value of ρ \rho ρ also affects the quality of the estimate.
Properties of LMS estimation.
- Let
Θ
\Theta
Θ and
X
X
X be two random variables with positive variances. Let
Θ
^
\hat\Theta
Θ^ be the linear LMS estimator of
Θ
\Theta
Θ based on
X
X
X, and let
Θ
~
L
=
Θ
^
L
−
Θ
\tilde \Theta_L = \hat\Theta_L-\Theta
Θ~L=Θ^L−Θ be the associated error. Similarly, let
Θ
^
\hat\Theta
Θ^ be the LMS estimator
E
[
Θ
∣
X
]
E[\Theta |X]
E[Θ∣X] of
Θ
\Theta
Θ based on
X
X
X, and let
Θ
~
=
Θ
^
−
Θ
\tilde \Theta = \hat\Theta-\Theta
Θ~=Θ^−Θ be the associated error. It can be shown that
- E [ Θ ~ L ] = 0 E[\tilde \Theta_L]=0 E[Θ~L]=0
- c o v ( Θ ~ L , X ) = 0 cov(\tilde\Theta_L,X)=0 cov(Θ~L,X)=0, i.e. the estimation error Θ ~ L \tilde\Theta_L Θ~L is uncorrelated with the observation X X X.
- v a r ( Θ ) = v a r ( Θ ^ L ) + v a r ( Θ ~ L ) var(\Theta) = var(\hat\Theta_L) + var(\tilde\Theta_L) var(Θ)=var(Θ^L)+var(Θ~L)
- The LMS estimation error Θ ~ \tilde\Theta Θ~ is uncorrelated with any function h ( X ) h(X) h(X) of the observation X X X.
The proof can be found in Problem 23.
Example 8.16. Linear LMS Estimation of the Bias of a Coin.
We revisit the coin tossing problem, and derive the linear LMS estimator. Here, the probability of heads of the coin is modeled as a random variable
Θ
\Theta
Θ whose prior distribution is uniform over the interval
[
0
,
1
]
[0, 1]
[0,1]. The coin is tossed
n
n
n times, independently, resulting in a random number of heads, denoted by
X
X
X. Thus, if
Θ
\Theta
Θ is equal to
θ
\theta
θ, the random variable
X
X
X has a binomial distribution with parameters
n
n
n and
θ
\theta
θ.
SOLUTION
- We have
E
[
Θ
]
=
1
/
2
E[\Theta] = 1/2
E[Θ]=1/2, and
E [ X ] = E [ E [ X ∣ Θ ] ] = E [ n Θ ] = n 2 E[X]=E[E[X|\Theta]]=E[n\Theta]=\frac{n}{2} E[X]=E[E[X∣Θ]]=E[nΘ]=2n - The variance of
Θ
\Theta
Θ is
1
/
12
1/12
1/12, so that
σ
Θ
=
1
/
12
\sigma_\Theta = 1/\sqrt{12}
σΘ=1/12. Also,
E
[
Θ
2
]
=
1
/
3
E[\Theta^2]= 1/3
E[Θ2]=1/3. If
Θ
\Theta
Θ takes the value
θ
\theta
θ, the (conditional) variance of
X
X
X is
n
θ
(
1
−
θ
)
n\theta(1 - \theta)
nθ(1−θ). Using the law of total variance, we obtain
v a r ( X ) = E [ v a r ( X ∣ Θ ) ] + v a r ( E [ X ∣ Θ ] ) = E [ n Θ ( 1 − Θ ) ] + v a r ( n Θ ) = n E [ Θ ] − n E [ Θ 2 ] + n 2 v a r ( Θ ) = n ( n + 2 ) 12 \begin{aligned}var(X)&=E[var(X|\Theta)]+var(E[X|\Theta]) \\&=E[n\Theta(1-\Theta)]+var(n\Theta) \\&=nE[\Theta]-nE[\Theta^2]+n^2var(\Theta) \\&=\frac{n(n+2)}{12}\end{aligned} var(X)=E[var(X∣Θ)]+var(E[X∣Θ])=E[nΘ(1−Θ)]+var(nΘ)=nE[Θ]−nE[Θ2]+n2var(Θ)=12n(n+2) - In order to find the covariance of
X
X
X and
Θ
\Theta
Θ, we use the formula
c o v ( Θ , X ) = E [ Θ X ] − E [ Θ ] E [ X ] = E [ E [ Θ X ∣ Θ ] ] − n 4 = E [ Θ E [ X ∣ Θ ] ] − n 4 = E [ n Θ 2 ] − n 4 = n 12 \begin{aligned}cov(\Theta,X)&=E[\Theta X]-E[\Theta]E[X] \\&=E[E[\Theta X|\Theta ]]-\frac{n}{4} \\&=E[\Theta E[X|\Theta ]]-\frac{n}{4} \\&=E[n\Theta^2]-\frac{n}{4} \\&=\frac{n}{12}\end{aligned} cov(Θ,X)=E[ΘX]−E[Θ]E[X]=E[E[ΘX∣Θ]]−4n=E[ΘE[X∣Θ]]−4n=E[nΘ2]−4n=12n - Putting everything together, we conclude that the linear LMS estimator takes the form
Θ ^ = 1 2 + n / 12 n ( n + 2 ) / 12 ( X − n 2 ) = X + 1 n + 2 \hat\Theta=\frac{1}{2}+\frac{n/12}{n(n+2)/12}(X-\frac{n}{2})=\frac{X+1}{n+2} Θ^=21+n(n+2)/12n/12(X−2n)=n+2X+1
Problem 16.
The joint PDF of random variables
X
X
X and
Θ
\Theta
Θ is of the form
where
c
c
c is a constant and
S
S
S is the set
S
=
{
(
x
,
θ
)
∣
0
≤
x
≤
2
,
0
≤
θ
≤
2
,
x
−
1
≤
θ
≤
x
}
S=\{(x,\theta)|0\leq x\leq2,0\leq\theta\leq2,x-1\leq\theta\leq x\}
S={(x,θ)∣0≤x≤2,0≤θ≤2,x−1≤θ≤x}We want to estimate
θ
\theta
θ based on
X
X
X.
- (a) Find the LMS estimator g ( X ) g(X) g(X) of Θ \Theta Θ.
- (b) Calculate E [ ( Θ − g ( X ) ) 2 ∣ X = x ] E[(\Theta - g(X))^2| X =x] E[(Θ−g(X))2∣X=x], E [ g ( X ) ] E[g(X)] E[g(X)], and v a r ( g ( X ) ) var(g(X)) var(g(X)).
- ( c c c) Calculate the mean squared error E [ ( Θ − g ( X ) ) 2 ] E[(\Theta - g(X))^2] E[(Θ−g(X))2]. Is it the same as E [ v a r ( Θ ∣ X ) ] E[var(\Theta|X)] E[var(Θ∣X)]?
- (d) Calculate v a r ( Θ ) var(\Theta) var(Θ) using the law of total variance.
- (e) Derive the linear LMS estimator of Θ \Theta Θ based on X X X, and calculate its mean squared error.
SOLUTION
- (a) The LMS estimator is
- (b)
- We first derive the conditional variance
E
[
(
Θ
−
g
(
X
)
)
2
∣
X
=
x
]
E[(\Theta - g(X))^2| X =x]
E[(Θ−g(X))2∣X=x].
- If
x
∈
[
0
,
1
]
x\in [0, 1]
x∈[0,1], the conditional PDF of
Θ
\Theta
Θ is uniform over the interval
[
0
,
x
]
[0, x]
[0,x], and
E [ ( Θ − g ( X ) ) 2 ∣ X = x ] = x 2 12 E[(\Theta - g(X))^2| X =x]=\frac{x^2}{12} E[(Θ−g(X))2∣X=x]=12x2 - Similarly, if
x
∈
[
1
,
2
]
x \in [1, 2]
x∈[1,2], the conditional PDF of
Θ
\Theta
Θ is uniform over the interval
[
x
−
1
,
x
]
[x-1,x]
[x−1,x], and
E [ ( Θ − g ( X ) ) 2 ∣ X = x ] = 1 12 E[(\Theta - g(X))^2| X =x]=\frac{1}{12} E[(Θ−g(X))2∣X=x]=121
- If
x
∈
[
0
,
1
]
x\in [0, 1]
x∈[0,1], the conditional PDF of
Θ
\Theta
Θ is uniform over the interval
[
0
,
x
]
[0, x]
[0,x], and
- We now evaluate the expectation and variance of
g
(
X
)
g(X)
g(X). Note that
(
Θ
,
X
)
(\Theta,X)
(Θ,X) is uniform over a region with area
3
/
2
3/2
3/2, so that the constant
c
c
c must be equal to
2
/
3
2/3
2/3. We have
E [ g ( X ) ] = E [ E [ Θ ∣ X ] ] = E [ Θ ] = ∫ ∫ θ f X , Θ ( x , θ ) d θ d x = ∫ 0 1 ∫ 0 x θ 2 3 d θ d x + ∫ 1 2 ∫ x − 1 x θ 2 3 d θ d x = 7 9 \begin{aligned}E[g(X)]&=E[E[\Theta|X]]=E[\Theta] \\&=\int\int\theta f_{X,\Theta}(x,\theta)d\theta dx \\&=\int_0^1\int_0^x\theta\frac{2}{3}d\theta dx+\int_1^2\int_{x-1}^x\theta\frac{2}{3}d\theta dx \\&=\frac{7}{9}\end{aligned} E[g(X)]=E[E[Θ∣X]]=E[Θ]=∫∫θfX,Θ(x,θ)dθdx=∫01∫0xθ32dθdx+∫12∫x−1xθ32dθdx=97 - Furthermore,
v a r ( g ( X ) ) = v a r ( E [ Θ ∣ X ] ) = E [ ( E [ Θ ∣ X ] ) 2 ] − ( E [ E [ Θ ∣ X ] ] ) 2 = ∫ 0 2 ( E [ Θ ∣ X ] ) 2 f X ( x ) d x − ( E [ Θ ] ) 2 = ∫ 0 1 ( 1 2 x ) 2 ⋅ 2 3 x d x + ∫ 1 2 ( x − 1 2 ) 2 ⋅ 2 3 d x − ( 7 9 ) 2 = 103 648 \begin{aligned}var(g(X))&=var(E[\Theta|X]) \\&=E[(E[\Theta|X])^2]-(E[E[\Theta|X]])^2 \\&=\int_0^2(E[\Theta|X])^2f_X(x)dx-(E[\Theta])^2 \\&=\int_0^1(\frac{1}{2}x)^2\cdot\frac{2}{3}xdx+\int_1^2(x-\frac{1}{2})^2\cdot\frac{2}{3}dx-(\frac{7}{9})^2 \\&=\frac{103}{648} \end{aligned} var(g(X))=var(E[Θ∣X])=E[(E[Θ∣X])2]−(E[E[Θ∣X]])2=∫02(E[Θ∣X])2fX(x)dx−(E[Θ])2=∫01(21x)2⋅32xdx+∫12(x−21)2⋅32dx−(97)2=648103
- We first derive the conditional variance
E
[
(
Θ
−
g
(
X
)
)
2
∣
X
=
x
]
E[(\Theta - g(X))^2| X =x]
E[(Θ−g(X))2∣X=x].
- (
c
c
c)
E [ v a r ( Θ ∣ X ) ] = E [ E [ ( Θ − E [ Θ ∣ X ] ) 2 ∣ X ] ] = E [ ( Θ − g ( X ) ) 2 ] = ∫ 0 1 x 2 12 ⋅ 2 3 x d x + ∫ 1 2 1 12 ⋅ 2 3 d x = 5 72 \begin{aligned}E[var(\Theta|X)]&=E[E[(\Theta-E[\Theta|X])^2|X]]=E[(\Theta - g(X))^2] \\&=\int_0^1\frac{x^2}{12}\cdot\frac{2}{3}xdx+\int_1^2\frac{1}{12}\cdot\frac{2}{3}dx=\frac{5}{72} \end{aligned} E[var(Θ∣X)]=E[E[(Θ−E[Θ∣X])2∣X]]=E[(Θ−g(X))2]=∫0112x2⋅32xdx+∫12121⋅32dx=725 - (d)
v a r ( Θ ) = E [ v a r ( Θ ∣ X ) ] + v a r ( E [ Θ ∣ X ] ) = 5 72 + 103 648 = 37 162 var(\Theta)=E[var(\Theta|X)]+var(E[\Theta|X])=\frac{5}{72}+\frac{103}{648}=\frac{37}{162} var(Θ)=E[var(Θ∣X)]+var(E[Θ∣X])=725+648103=16237 - (e) The linear LMS estimator is
Θ ^ = E [ Θ ] + c o v ( X , Θ ) σ X 2 ( X − E [ X ] ) \hat\Theta=E[\Theta]+\frac{cov(X,\Theta)}{\sigma_X^2}(X-E[X]) Θ^=E[Θ]+σX2cov(X,Θ)(X−E[X])We have
E [ X ] = ∫ 0 1 ∫ 0 x 2 3 x d θ d x + ∫ 1 2 ∫ x − 1 x 2 3 x d θ d x = 2 9 + 1 = 11 9 E [ X 2 ] = ∫ 0 1 ∫ 0 x 2 3 x 2 d θ d x + ∫ 1 2 ∫ x − 1 x 2 3 x 2 d θ d x = 1 6 + 14 9 = 31 18 v a r ( X ) = E [ X 2 ] − ( E [ X ] ) 2 = 71 162 E [ Θ ] = ∫ 0 1 ∫ 0 x 2 3 θ d θ d x + ∫ 1 2 ∫ x − 1 x 2 3 θ d θ d x = 1 9 + 2 3 = 7 9 E [ X Θ ] = ∫ 0 1 ∫ 0 x 2 3 x θ d θ d x + ∫ 1 2 ∫ x − 1 x 2 3 x θ d θ d x = 1 12 + 17 18 = 37 36 c o v ( X , Θ ) = E [ X Θ ] − E [ X ] E [ Θ ] = 37 36 − 11 9 ⋅ 7 9 E[X]=\int_0^1\int_0^x\frac{2}{3}xd\theta dx+\int_1^2\int_{x-1}^x\frac{2}{3}xd\theta dx=\frac{2}{9}+1=\frac{11}{9} \\E[X^2]=\int_0^1\int_0^x\frac{2}{3}x^2d\theta dx+\int_1^2\int_{x-1}^x\frac{2}{3}x^2d\theta dx=\frac{1}{6}+\frac{14}{9}=\frac{31}{18} \\var(X)=E[X^2]-(E[X])^2=\frac{71}{162} \\E[\Theta]=\int_0^1\int_0^x\frac{2}{3}\theta d\theta dx+\int_1^2\int_{x-1}^x\frac{2}{3}\theta d\theta dx=\frac{1}{9}+\frac{2}{3}=\frac{7}{9} \\E[X\Theta]=\int_0^1\int_0^x\frac{2}{3}x\theta d\theta dx+\int_1^2\int_{x-1}^x\frac{2}{3}x\theta d\theta dx=\frac{1}{12}+\frac{17}{18}=\frac{37}{36}\\ cov(X,\Theta)=E[X\Theta]-E[X]E[\Theta]=\frac{37}{36}-\frac{11}{9}\cdot\frac{7}{9} E[X]=∫01∫0x32xdθdx+∫12∫x−1x32xdθdx=92+1=911E[X2]=∫01∫0x32x2dθdx+∫12∫x−1x32x2dθdx=61+914=1831var(X)=E[X2]−(E[X])2=16271E[Θ]=∫01∫0x32θdθdx+∫12∫x−1x32θdθdx=91+32=97E[XΘ]=∫01∫0x32xθdθdx+∫12∫x−1x32xθdθdx=121+1817=3637cov(X,Θ)=E[XΘ]−E[X]E[Θ]=3637−911⋅97Thus, the linear LMS estimator is
Θ ^ = 7 9 + 37 36 − 11 9 ⋅ 7 9 71 162 ( X − 11 9 ) = 0.5625 + 0.1761 X \hat\Theta=\frac{7}{9}+\frac{\frac{37}{36}-\frac{11}{9}\cdot\frac{7}{9}}{\frac{71}{162}}(X-\frac{11}{9})=0.5625+0.1761X Θ^=97+162713637−911⋅97(X−911)=0.5625+0.1761XIts mean squared error is
E [ ( Θ − Θ ^ ) 2 ] = E [ ( Θ − 0.5625 − 0.1761 X ) 2 ] ≈ 0.2023 E[(\Theta-\hat\Theta)^2]=E[(\Theta-0.5625-0.1761X)^2]\approx0.2023 E[(Θ−Θ^)2]=E[(Θ−0.5625−0.1761X)2]≈0.2023
The Case of Multiple Observations and Multiple Parameters
- If there are multiple parameters
Θ
i
\Theta_i
Θi to be estimated, we may consider the criterion
E [ ( Θ 1 − Θ ^ 1 ) 2 ] + . . . + E [ ( Θ m − Θ ^ m ) 2 ] E[(\Theta_1-\hat\Theta_1)^2]+...+E[(\Theta_m-\hat\Theta_m)^2] E[(Θ1−Θ^1)2]+...+E[(Θm−Θ^m)2]and minimize it over all estimators Θ ^ 1 , . . . , Θ ^ m \hat\Theta_1,...,\hat\Theta_m Θ^1,...,Θ^m that are linear functions of the observations. This is equivalent to finding, for each i i i, a linear estimator Θ ^ i \hat\Theta_i Θ^i that minimizes E [ ( Θ i − Θ ^ i ) 2 ] E[(\Theta_i-\hat\Theta_i)^2] E[(Θi−Θ^i)2], so that we are essentially dealing with m m m decoupled linear estimation problems, one for each unknown parameter.
- In the case where there are multiple observations with a certain independence property, the formula for the linear LMS estimator simplifies as we will now describe.
- Let
Θ
\Theta
Θ be a random variable with mean
μ
μ
μ and variance
σ
0
2
\sigma_0^2
σ02, and let
X
1
,
.
.
.
,
X
n
X_1, ... , X_n
X1,...,Xn be observations of the form
X i = Θ + W i X_i =\Theta + W_i Xi=Θ+Wiwhere the W i W_i Wi are random variables with mean 0 and variance σ i 2 \sigma_i^2 σi2, which represent observation errors. Under the assumption that the random variables Θ , W 1 , . . . , W n \Theta, W_1, ... , W_n Θ,W1,...,Wn are uncorrelated, the linear LMS estimator of Θ \Theta Θ, based on the observations X 1 , . . . , X n X_1,. . . , X_n X1,...,Xn , turns out to be
Θ ^ = μ / σ 0 2 + ∑ i = 1 n X i / σ i 2 ∑ i = 0 n 1 / σ i 2 \hat\Theta=\frac{\mu/\sigma_0^2+\sum_{i=1}^nX_i/\sigma_i^2}{\sum_{i=0}^n1/\sigma_i^2} Θ^=∑i=0n1/σi2μ/σ02+∑i=1nXi/σi2The derivation involves forming the function
h ( a 1 , . . . , a n , b ) = 1 2 E [ ( Θ − a 1 X 1 − . . . − a n X n − b ) 2 ] h(a_1,...,a_n,b)=\frac{1}{2}E[(\Theta-a_1X_1-...-a_nX_n-b)^2] h(a1,...,an,b)=21E[(Θ−a1X1−...−anXn−b)2]and minimizing it by setting to zero its partial derivatives with respect to a 1 , . . . , a n , b a_1, ... , a_n , b a1,...,an,b. We will show that the minimizing values of a 1 , . . . , a n , b a_1, ... , a_n, b a1,...,an,b are
b ∗ = μ / σ 0 2 ∑ i = 0 n 1 / σ i 2 , a j ∗ = 1 / σ j 2 ∑ i = 0 n 1 / σ i 2 , j = 1 , . . . , n b^*=\frac{\mu/\sigma_0^2}{\sum_{i=0}^n1/\sigma_i^2},\ \ \ \ \ \ a_j^*=\frac{1/\sigma_j^2}{\sum_{i=0}^n1/\sigma_i^2},\ \ \ j=1,...,n b∗=∑i=0n1/σi2μ/σ02, aj∗=∑i=0n1/σi21/σj2, j=1,...,nfrom which the formula for the linear LMS estimator given earlier follows.
PROOF
- To this end, it is sufficient to show that the partial derivatives of h h h, with respect to a 1 , . . . , a n , b a_1, ... , a_n , b a1,...,an,b, are all equal to 0 when evaluated at a 1 ∗ , . . . , a n ∗ , b ∗ a_1^*, ... , a_n^*, b^* a1∗,...,an∗,b∗. (Because the quadratic function h h h is nonnegative, it can be shown that any point at which its derivatives are zero must be a minimum.)
- By differentiating
h
h
h, we obtain
∂ h ∂ b ∣ a i ∗ , b ∗ = E [ ( ∑ i = 1 n a i ∗ − 1 ) Θ + ∑ i = 1 n a i ∗ W i + b ∗ ] ∂ h ∂ a i ∣ a i ∗ , b ∗ = E [ X i ( ( ∑ i = 1 n a i ∗ − 1 ) Θ + ∑ i = 1 n a i ∗ W i + b ∗ ) ] \frac{\partial h}{\partial b}\bigg|_{a_i^*,b^*}=E\bigg[\bigg(\sum_{i=1}^na_i^*-1\bigg)\Theta+\sum_{i=1}^na_i^*W_i+b^*\bigg]\\ \frac{\partial h}{\partial a_i}\bigg|_{a_i^*,b^*}=E\bigg[X_i\bigg(\bigg(\sum_{i=1}^na_i^*-1\bigg)\Theta+\sum_{i=1}^na_i^*W_i+b^*\bigg)\bigg] ∂b∂h∣∣∣∣ai∗,b∗=E[(i=1∑nai∗−1)Θ+i=1∑nai∗Wi+b∗]∂ai∂h∣∣∣∣ai∗,b∗=E[Xi((i=1∑nai∗−1)Θ+i=1∑nai∗Wi+b∗)] - From the expressions for
b
∗
b^*
b∗ and
a
∗
a^*
a∗, we see that
∑ i = 1 n a i ∗ − 1 = − b ∗ μ \sum_{i=1}^na_i^*-1=-\frac{b^*}{\mu} i=1∑nai∗−1=−μb∗It follows that
∂ h ∂ b ∣ a i ∗ , b ∗ = E [ ( − b ∗ μ ) Θ + ∑ i = 1 n a i ∗ W i + b ∗ ] = 0 \frac{\partial h}{\partial b}\bigg|_{a_i^*,b^*}=E\bigg[\bigg(-\frac{b^*}{\mu}\bigg)\Theta+\sum_{i=1}^na_i^*W_i+b^*\bigg]=0 ∂b∂h∣∣∣∣ai∗,b∗=E[(−μb∗)Θ+i=1∑nai∗Wi+b∗]=0 - Using, in addition, the equations
E [ X i ( μ − Θ ) ] = E [ ( Θ − μ + W i + μ ) ( μ − Θ ) ] = − σ 0 2 E [ X i W i ] = E [ ( Θ + W i ) W i ] = σ i 2 , f o r a l l i E [ X j W i ] = E [ ( Θ + W j ) W i ] = 0 , f o r a l l i a n d j w i t h i ≠ j E[X_i(\mu-\Theta)]=E[(\Theta-\mu + W_i+\mu)(\mu-\Theta)]=-\sigma_0^2\\ E[X_iW_i]=E[(\Theta + W_i)W_i]=\sigma_i^2,\ \ \ \ \ \ for\ all\ i\\ E[X_jW_i]=E[(\Theta + W_j)W_i]=0,\ \ \ \ \ \ for\ all\ i\ and\ j\ with\ i\neq\ j E[Xi(μ−Θ)]=E[(Θ−μ+Wi+μ)(μ−Θ)]=−σ02E[XiWi]=E[(Θ+Wi)Wi]=σi2, for all iE[XjWi]=E[(Θ+Wj)Wi]=0, for all i and j with i= j
we obtain
∂ h ∂ a i ∣ a i ∗ , b ∗ = E [ X i ( ( − b ∗ μ ) Θ + ∑ i = 1 n a i ∗ W i + b ∗ ) ] = E [ X i ( ( μ − Θ ) b ∗ μ + ∑ i = 1 n a i ∗ W i ) ] = b ∗ μ E [ X i ( μ − Θ ) ] + ∑ i = 1 n E [ a i ∗ W i ] = − σ 0 2 b ∗ μ + a i ∗ σ i 2 = 0 \begin{aligned}\frac{\partial h}{\partial a_i}\bigg|_{a_i^*,b^*}&=E\bigg[X_i\bigg(\bigg(-\frac{b^*}{\mu}\bigg)\Theta+\sum_{i=1}^na_i^*W_i+b^*\bigg)\bigg] \\&=E\bigg[X_i\bigg((\mu-\Theta)\frac{b^*}{\mu}+\sum_{i=1}^na_i^*W_i\bigg)\bigg] \\&=\frac{b^*}{\mu}E\bigg[X_i(\mu-\Theta)\bigg]+\sum_{i=1}^nE\bigg[a_i^*W_i\bigg] \\&=-\sigma_0^2\frac{b^*}{\mu}+a_i^*\sigma_i^2 \\&=0\end{aligned} ∂ai∂h∣∣∣∣ai∗,b∗=E[Xi((−μb∗)Θ+i=1∑nai∗Wi+b∗)]=E[Xi((μ−Θ)μb∗+i=1∑nai∗Wi)]=μb∗E[Xi(μ−Θ)]+i=1∑nE[ai∗Wi]=−σ02μb∗+ai∗σi2=0
Problem 24. Properties of linear LMS estimation based on multiple observations.
Let
Θ
,
X
1
,
.
.
.
,
X
n
\Theta, X_1, ... , X_n
Θ,X1,...,Xn be random variables with given variances and covariances. Let
Θ
L
\Theta_L
ΘL be the linear LMS estimator of
Θ
\Theta
Θ based on
X
1
,
.
.
.
,
X
n
X_1 , ... , X_n
X1,...,Xn, and let
Θ
~
L
=
Θ
^
L
−
Θ
\tilde\Theta_L=\hat\Theta_L-\Theta
Θ~L=Θ^L−Θ be the associated error. Show that
E
[
Θ
~
L
]
=
0
E[\tilde\Theta_L] = 0
E[Θ~L]=0 and that
Θ
~
\tilde\Theta
Θ~ is uncorrelated with
X
i
X_i
Xi for every
i
i
i.
SOLUTION
- We start by showing that
E
[
Θ
~
L
X
i
]
=
0
E[\tilde\Theta_LX_i] = 0
E[Θ~LXi]=0, for all
i
i
i. Consider a new linear estimator of the form
Θ
^
L
+
a
X
i
\hat\Theta_L+aX_i
Θ^L+aXi, where
a
a
a is a scalar parameter. Since
Θ
^
L
\hat\Theta_L
Θ^L is a linear LMS estimator, its mean squared error
E
[
(
Θ
^
L
−
Θ
)
2
]
E[(\hat\Theta_L-\Theta)^2]
E[(Θ^L−Θ)2] is no larger than the mean squared error
h
(
a
)
=
E
[
(
Θ
^
L
+
a
X
i
−
Θ
)
2
]
h(a)=E[(\hat\Theta_L+aX_i-\Theta)^2]
h(a)=E[(Θ^L+aXi−Θ)2] of the new estimator. Therefore, the function
h
(
a
)
h(a)
h(a) attains its minimum value when
a
=
0
a= 0
a=0. Note that
h ( a ) = E [ ( Θ ^ L + a X i − Θ ) 2 ] = E [ ( Θ ~ L + a X i ) 2 ] = E [ Θ ~ L 2 ] + a 2 E [ X i 2 ] + 2 a E [ Θ ~ L X i ] h(a)=E[(\hat\Theta_L+aX_i-\Theta)^2]=E[(\tilde\Theta_L+aX_i)^2]=E[\tilde\Theta_L^2]+a^2E[X_i^2]+2aE[\tilde\Theta_LX_i] h(a)=E[(Θ^L+aXi−Θ)2]=E[(Θ~L+aXi)2]=E[Θ~L2]+a2E[Xi2]+2aE[Θ~LXi]The condition ( d h / d a ) ( 0 ) = 0 (dh/da)(0) = 0 (dh/da)(0)=0 yields E [ Θ ~ L X i ] = 0 E[\tilde\Theta_LX_i]= 0 E[Θ~LXi]=0. - Let us now repeat the above argument, but with the constant
1
1
1 replacing the random variable
X
i
X_i
Xi. Following the same steps, we obtain
E
[
Θ
~
L
]
=
0
E[\tilde\Theta_L] = 0
E[Θ~L]=0. Finally, note that
c o v ( Θ ~ L , X i ) = E [ Θ ~ L X i ] − E [ Θ ~ L ] E [ X i ] = 0 − 0 ⋅ E [ X i ] = 0 cov(\tilde\Theta_L, X_i)= E[\tilde\Theta_LX_i] - E[\tilde\Theta_L] E[X_i] = 0 - 0·E[X_i ] = 0 cov(Θ~L,Xi)=E[Θ~LXi]−E[Θ~L]E[Xi]=0−0⋅E[Xi]=0so that Θ ~ L \tilde\Theta_L Θ~L and X i X_i Xi are uncorrelated.
Linear Estimation and Normal Models
- The linear LMS estimator is generally inferior to the LMS estimator E [ Θ ∣ X 1 , . . . , X n ] E[\Theta|X_1,...,X_n] E[Θ∣X1,...,Xn]. However, if the LMS estimator be linear in the observations X 1 , . . . , X n X_1, ... , X_n X1,...,Xn, then it is also the linear LMS estimator, i.e., the two estimators coincide.
- An important example where this occurs is the estimation of a normal random variable Θ \Theta Θ on the basis of observations X i = Θ + W i X_i = \Theta + W_i Xi=Θ+Wi, where the W i W_i Wi are independent zero mean normal noise terms, independent of Θ \Theta Θ.
- This is a manifestation of a property that can be shown to hold more generally: if Θ , X 1 , . . . , X n \Theta, X_1, ... , X_n Θ,X1,...,Xn are all linear functions of a collection of independent normal random variables, then the LMS and the linear LMS estimators coincide. They also coincide with the MAP estimator, since the normal distribution is symmetric and unimodal.
- The above discussion leads to an interesting interpretation of linear LMS estimation: the estimator is the same as the one that would have been obtained if we were to pretend that the random variables involved were normal, with the given means, variances, and covariances. Thus, there are two alternative perspectives on linear LMS estimation: either as a computational shortcut (avoid the evaluation of a possibly complicated formula for E [ Θ ∣ X ] E[\Theta|X] E[Θ∣X]), or as a model simplification (replace less tractable distributions by normal ones).
Problem 20. Estimation with spherically invariant PDFs.
Let
Θ
\Theta
Θ and
X
X
X be continuous random variables with joint PDF of the form
f
Θ
,
X
(
θ
,
x
)
=
h
(
q
(
θ
,
x
)
)
f_{\Theta,X}(\theta,x)=h(q(\theta,x))
fΘ,X(θ,x)=h(q(θ,x))where
h
h
h is a nonnegative scalar function, and
q
(
θ
,
x
)
q(\theta, x)
q(θ,x) is a quadratic function of the form
q
(
θ
,
x
)
=
a
(
θ
−
θ
‾
)
2
+
b
(
x
−
x
‾
)
2
−
2
c
(
θ
−
θ
‾
)
(
x
−
x
‾
)
q(\theta, x)=a(\theta-\overline\theta)^2+b(x-\overline x)^2-2c(\theta-\overline\theta)(x-\overline x)
q(θ,x)=a(θ−θ)2+b(x−x)2−2c(θ−θ)(x−x)Here
a
,
b
,
c
,
θ
‾
,
x
‾
a, b, c, \overline\theta,\overline x
a,b,c,θ,x are some scalars with
a
≠
0
a \neq 0
a=0. Derive the LMS and linear LMS estimates, for any
x
x
x such that
E
[
Θ
∣
X
=
x
]
E[\Theta |X = x]
E[Θ∣X=x] is well-defined and finite. Assuming that
q
(
θ
,
x
)
≥
0
q(\theta, x)\geq0
q(θ,x)≥0 for all
x
,
θ
x,\theta
x,θ, and that
h
h
h is monotonically decreasing, derive the MAP estimate and show that it coincides with the LMS and linear LMS estimates.
SOLUTION
- The posterior is given by
f Θ ∣ X ( θ ∣ x ) = h ( q ( θ , x ) ) f X ( x ) f_{\Theta|X}(\theta|x)=\frac{h(q(\theta,x))}{f_X(x)} fΘ∣X(θ∣x)=fX(x)h(q(θ,x))To motivate the derivation of the LMS and linear LMS estimates, consider first the MAP estimate, assuming that q ( θ , x ≥ 0 q(\theta, x\geq0 q(θ,x≥0 for all x , θ x, \theta x,θ, and that h h h is monotonically decreasing. The MAP estimate maxinmizes h ( q ( θ , x ) ) h(q(\theta,x)) h(q(θ,x)) and, since h h h is a decreasing function, it minimizes q ( θ , x ) q(\theta,x) q(θ,x) over θ \theta θ. By setting to 0 the derivative of q ( θ , x ) q(\theta,x) q(θ,x) with respect to θ \theta θ, we obtain
θ ^ = θ ‾ + c a ( x − x ‾ ) \hat\theta=\overline\theta+\frac{c}{a}(x-\overline x) θ^=θ+ac(x−x) - We will now show that
θ
^
\hat\theta
θ^ is equal to the LMS and linear LMS estimates [without the assumption that
q
(
θ
,
x
)
≥
0
q(\theta, x)\geq0
q(θ,x)≥0 for all
x
,
θ
x, \theta
x,θ, and that
h
h
h is monotonically decreasing]. We write
θ − θ ‾ = θ − θ ^ + c a ( x − x ‾ ) \theta-\overline\theta=\theta-\hat\theta+\frac{c}{a}(x-\overline x) θ−θ=θ−θ^+ac(x−x)and substitute in the formula for q ( θ , x ) q(\theta, x) q(θ,x) to obtain after some algebra
q ( θ , x ) = a ( θ − θ ^ ) 2 + ( b − c 2 a ) ( x − x ‾ ) 2 q(\theta, x)=a(\theta-\hat\theta)^2+(b-\frac{c^2}{a})(x-\overline x)^2 q(θ,x)=a(θ−θ^)2+(b−ac2)(x−x)2Thus, for any given x x x, the posterior is a function of θ \theta θ that is symmetric around θ ^ \hat\theta θ^. This implies that θ ^ \hat\theta θ^ is equal to the conditional mean E [ Θ ∣ X = x ] E[\Theta |X = x] E[Θ∣X=x], whenever E [ Θ ∣ X = x ] E[\Theta |X = x] E[Θ∣X=x] is well-defined and finite. Furthermore, we have
E [ Θ ∣ X ] = θ ‾ + c a ( X − x ‾ ) E[\Theta |X]=\overline\theta+\frac{c}{a}(X-\overline x) E[Θ∣X]=θ+ac(X−x)Since E [ Θ ∣ X ] E[\Theta|X] E[Θ∣X] is linear in X X X, it is also the linear LMS estimator.
The Choice of Variables in Linear Estimation
- Consider an unknown random variable
Θ
\Theta
Θ, observations
X
1
,
.
.
.
,
X
n
X_1, ... ,X_n
X1,...,Xn, and transformed observations
Y
i
=
h
(
X
i
)
,
i
=
1
,
.
.
.
,
n
Y_i= h(X_i), i = 1, ... , n
Yi=h(Xi),i=1,...,n, where the function
h
h
h is one-to-one. The transformed observations
Y
i
Y_i
Yi convey the same information as the original observations
X
i
X_i
Xi, and therefore the LMS estimator based on
Y
1
,
.
.
.
,
Y
n
Y_1, ... , Y_n
Y1,...,Yn is the same as the one based on
X
1
,
.
.
.
,
X
n
X_1, ... , X_n
X1,...,Xn:
E [ Θ ∣ h ( X 1 ) , . . . , h ( X n ) ] = E [ Θ ∣ X 1 , . . . , X n ] E[\Theta|h(X_1),...,h(X_n)]=E[\Theta|X_1,...,X_n] E[Θ∣h(X1),...,h(Xn)]=E[Θ∣X1,...,Xn] - On the other hand, linear LMS estimation is based on the premise that the class of linear functions of the observations X 1 , . . . , X n X_1, ... , X_n X1,...,Xn contains reasonably good estimators of Θ \Theta Θ; this may not always be the case. For example, suppose that Θ \Theta Θ is the unknown variance of some distribution and X 1 , . . . , X n X_1, ... , X_n X1,...,Xn represent independent random variables drawn from that distribution. Then, it would be unreasonable to expect that a good estimator of Θ \Theta Θ can be obtained with a linear function of X 1 , . . . , X n X_1, ... , X_n X1,...,Xn . This suggests that it may be helpful to transform the observations so that good estimators of Θ \Theta Θ can be found within the class of linear functions of the transformed observations.
Problem 17.
Let
Θ
\Theta
Θ be a positive random variable, with known mean
μ
μ
μ and variance
σ
\sigma
σ to be estimated on the basis of a measurement
X
X
X of the form
X
=
Θ
W
X =\sqrt\Theta W
X=ΘW. We assume that
W
W
W is independent of
Θ
\Theta
Θ with zero mean, unit variance, and known fourth moment
E
[
W
4
]
E[W^4 ]
E[W4]. Thus, the conditional mean and variance of
X
X
X given
Θ
\Theta
Θ are 0 and
Θ
\Theta
Θ, respectively, so we are essentially trying to estimate the variance of
X
X
X given an observed value. Find the linear LMS estimator of
Θ
\Theta
Θ based on
X
X
X, and the linear LMS estimator of
Θ
\Theta
Θ based on
X
2
X^2
X2.
SOLUTION
- We have
c o v ( Θ , X ) = E [ Θ 3 / 2 W ] − E [ Θ ] E [ X ] = E [ Θ 3 / 2 ] E [ W ] − E [ Θ ] E [ X ] = 0 cov(\Theta,X) = E[\Theta^{3/2}W] -E[\Theta]E[X] = E[\Theta^{3/2}]E[W] - E[\Theta]E[X] = 0 cov(Θ,X)=E[Θ3/2W]−E[Θ]E[X]=E[Θ3/2]E[W]−E[Θ]E[X]=0so the linear LMS estimator of Θ \Theta Θ is simply Θ ^ = μ \hat\Theta = \mu Θ^=μ, and does not make use of the available observation. - Let us now consider the transformed observation
Y
=
X
2
=
Θ
W
2
Y = X^2 =\Theta W^2
Y=X2=ΘW2, and linear estimators of the form
Θ
^
=
a
Y
+
b
\hat\Theta = aY + b
Θ^=aY+b. We have
E [ Y ] = E [ Θ W 2 ] = E [ Θ ] E [ W 2 ] = μ ⋅ ( 1 + 0 ) = μ E [ Θ Y ] = E [ Θ 2 W 2 ] = E [ Θ 2 ] E [ W 2 ] = ( μ 2 + σ 2 ) ⋅ ( 1 + 0 ) = μ 2 + σ 2 c o v ( Θ , Y ) = E [ Θ Y ] − E [ Θ ] E [ Y ] = σ 2 v a r ( Y ) = E [ Θ 2 W 4 ] − ( [ E [ Θ W 2 ] ] ) 2 = ( μ 2 + σ 2 ) E [ W 4 ] − μ 2 E[Y]=E[\Theta W^2]=E[\Theta]E[ W^2]=\mu\cdot(1+0)=\mu \\E[\Theta Y]=E[\Theta^2W^2]=E[\Theta^2]E[W^2]=(\mu^2+\sigma^2)\cdot(1+0)=\mu^2+\sigma^2 \\cov(\Theta,Y)=E[\Theta Y]-E[\Theta]E[Y]=\sigma^2 \\var(Y)=E[\Theta^2W^4]-([E[\Theta W^2]])^2=(\mu^2+\sigma^2)E[W^4]-\mu^2 E[Y]=E[ΘW2]=E[Θ]E[W2]=μ⋅(1+0)=μE[ΘY]=E[Θ2W2]=E[Θ2]E[W2]=(μ2+σ2)⋅(1+0)=μ2+σ2cov(Θ,Y)=E[ΘY]−E[Θ]E[Y]=σ2var(Y)=E[Θ2W4]−([E[ΘW2]])2=(μ2+σ2)E[W4]−μ2Thus, the linear LMS estimator of Θ \Theta Θ based on Y Y Y is of the form
Θ ^ = μ + σ 2 ( μ 2 + σ 2 ) E [ W 4 ] − μ 2 ( Y − μ ) \hat\Theta=\mu+\frac{\sigma^2}{(\mu^2+\sigma^2)E[W^4]-\mu^2}(Y-\mu) Θ^=μ+(μ2+σ2)E[W4]−μ2σ2(Y−μ)and makes effective use of the observation: the estimate of Θ \Theta Θ, the conditional variance of X X X becomes large whenever a large value of X 2 X^2 X2 is observed.